Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: edge attributes #35

Closed
MikeB2019x opened this issue Jun 3, 2022 · 11 comments
Closed

Question: edge attributes #35

MikeB2019x opened this issue Jun 3, 2022 · 11 comments
Assignees
Labels
backend:SQLBackend question Further information is requested

Comments

@MikeB2019x
Copy link

MikeB2019x commented Jun 3, 2022

My situation is that I use networkx and have access to a postgres db. I find networkx to be quite slow and thought of using some of the alternatives esp. networkit. The challenge I have is with attributes ie. networkit seems to allow only a single numerical 'weight' for edge attributes. My graphs need pretty rich edge attributes and networkx accommodates those. So:

  1. would grand allow me to use networkx syntax/features and edge attribute functionality with networkit eg. filter edges on rich attribute set but retain algorithms running at higher speeds?

  2. you mention grand interacting with dynamo db. I'm not sure I understand that. Is grand using the db to store the graph structure and if so, could it do that with a postgres db? Note: I had a look at this and it seems like this is what I had in mind but when I read your readme.

@j6k4m8
Copy link
Member

j6k4m8 commented Jun 3, 2022

Hi @MikeB2019x!

It sounds like Grand should be great for your use-case.

Indeed, Grand will handle attributes even if Networkit can't support them natively; Grand will offload network operations to Networkit, and then will add the attributes back in as you ask for them. (In other words, using Grand like NetworkX should solve your problem without you having to think about it.)

There is a bit of overhead associated with my attribute manager, which runs as a layer on top of Networkit; depending on your use-case, this should be pretty unnoticeable, but can be larger for things like edge-attribute queries.

I am just jotting things down from my phone, so I can't run this... But you should be able to do something like this:

import grand
from grand.backends import NetworkitBackend

G = grand.Graph(backend=NetworkitBackend())

# G.nx is not really a networkx graph, but we can treat it like one:
G.nx.add_edge("A", "B", foo="bar", baz="luhrmann")
G.nx.edges(data=True)

# Can still get the secret underlying Networkit backend:
G.backend._nk_graph

In the case of postgres, we will probably need to write a new Backend to support this optimally, but you may be able to get away with the existing SQLBackend:

https://github.com/aplbrain/grand/wiki/Backends#sqlbackend

@j6k4m8 j6k4m8 added the question Further information is requested label Jun 3, 2022
@j6k4m8 j6k4m8 self-assigned this Jun 3, 2022
@j6k4m8
Copy link
Member

j6k4m8 commented Jun 3, 2022

By the way, I am also very happy to answer questions about DotMotif as well :)

@MikeB2019x
Copy link
Author

MikeB2019x commented Jun 3, 2022 via email

@MikeB2019x
Copy link
Author

MikeB2019x commented Jun 3, 2022 via email

@j6k4m8
Copy link
Member

j6k4m8 commented Jun 3, 2022

I'm super super glad to hear that, and thank you for your kind words :)

My impression is that I can have only one [backend]

Yes, that is correct — the data "live" in the backend, so to switch between backends, you need to either move the data between them or have a copy of the data in both.

but if that is the case then if I have say, the sql backend imported, how do I specify the underlying graph tool? Is the default 'networkx'?

The default is NetworkX if you create a graph without specifying a backend:

from grand import Graph

g = Graph()

This is the same as:

from grand import Graph
from grand.backends import NetworkXBackend

g = Graph(backend=NetworkXBackend())

But you can also use a different backend, like this:

import grand
from grand.backends import NetworkitBackend

g = grand.Graph(backend=NetworkitBackend())

In which case the data "live" in Networkit.

Backends are a separate idea from "dialects," which are how you talk to the data. ALL dialects are available on ALL graphs, no matter what their backend is. You can see a full list of dialects here.

For ANY of the graphs detailed above, you can talk to them as though they are NetworkX networkx.Graph objects by using the nx suffix:

g.nx # pretends to be a networkx graph
g.nx.add_edge("node-1", "node-2")

Or you can talk to them as though they were an igraph.Graph object:

g.igraph.vs

Grand handles the "rewriting" of these familiar operators into the language that the backend actually speaks. As far as you (as the user) are concerned, you are actually speaking to NetworkX, not Grand.

if I use the sql backend do I have to pre-construct a db to a particular schema or can I use an existing one eg. node table (id, attr), edge table (src, tgt, attr)

You do not have to pre-construct a database; in fact, you don't even need one to exist. Here's how I would create a SQLite graph with a few edges in it:

import grand
from grand.backends import SQLBackend

g = grand.Graph(backend=SQLBackend(db_url="sqlite:///my-file.db"))

You could also (I haven't done this before! I think it should work, though!) create a postgres connection like this:

g = grand.Graph(backend=SQLBackend(db_url="postgresql://jordan:mypassword@localhost/mydatabase"))

If you already have two database tables in your db (one for nodes and one for edges), you can tell Grand to connect to them like this:

g = grand.Graph(
    backend=SQLBackend(
        db_url="postgresql://jordan:mypassword@localhost/mydatabase",
        node_table_name="my_nodes",
        edge_table_name="my_edges",
        edge_table_source_column="src",
        edge_table_target_column="tgt",
        primary_key="id",
    )
)

This will look for a table called my_nodes and use the column id as the unique key for nodes (all other columns will be considered attributes). It will look for a table called my_edges and assume that the columns called src and tgt are source and target IDs into the nodes table; and all other columns will be treated like attributes.

This starts to get a bit untested; I've done all of these things before, but I am curious to hear your experiences, especially if you wind up using a non-sqlite database!

Contributions to documentation as you discover things would be AMAZING (even just issues saying 'this is under-documented' are helpful!); there are SO many interesting corners in this project that it's hard to tell what docs would be useful and used by people and which would just be extra work for me, without an audience.

Some further reading: #19 talks about connecting to an existing database, with some commentary as well.

@MikeB2019x
Copy link
Author

I've been able to connect to the db, use existing tables, create node/edge tables. To speed up scaling some tests I tried to read/write a .graphml file eg. G.nx.read_graphml("my.graphml") resulting in 'NetworkXDialect' object has no attribute 'write_graphml'. Is that functionality foreseen?

@j6k4m8
Copy link
Member

j6k4m8 commented Jun 6, 2022

I think read_graphml and write_graphml live at the networkx module level, not on a graph object; I've never tried this, but I think the code for this would be:

import networkx as nx
from grand import Graph

g = Graph(...)

nx.write_graphml(g.nx, "my.graphml")

I would not be surprised if this works! But then again... I would not be surprised if this doesn't work :)

One alternative would be to move the edges and nodes over to a "real" networkx object. For very large graphs this could take a while, but it might suit your purposes here:

import networkx as nx
from grand import Graph

g = Graph(...)
real_g = nx.DiGraph()

for node, attrs in g.nx.nodes(data=True):
    real_g.add_node(node, *attrs)

for u, v, attrs in g.nx.edges(data=True):
    real_g.add_edge(u, v, attrs)

nx.write_graphml(real_g, "my.graphml")

@j6k4m8
Copy link
Member

j6k4m8 commented Jun 9, 2022

How goes it, @MikeB2019x? Would it be helpful to hop on a screenshare sometime?

@MikeB2019x
Copy link
Author

@j6k4m8 screenshare not required at the moment but I may take you up on that in the future. So trying to write out a graphml as suggested throws an error (stack trace below). If I compare a networkx graph's attributes and those of G.nx, you'll see: [...,'edges', 'get_edge_data','graph','graph_attr_dict_factory','has_edge','has_node'...] for the former compared to [...'edges','get_edge_data','graph_attr_dict_factory','has_edge','has_node',...] for the latter. That is, the 'graph' attribute isn't present in G.nx. I'm guessing that's intentional?

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [35], in <module>
      1 graphml_file_name = 'graphtools.graphml'
----> 3 nx.write_graphml(G.nx, graphml_file_name)

File <class 'networkx.utils.decorators.argmap'> compilation 17:5, in argmap_write_graphml_lxml_13(G, path, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
      3 from contextlib import contextmanager
      4 from pathlib import Path
----> 5 import warnings
      7 import networkx as nx
      8 from networkx.utils import create_random_state, create_py_random_state

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:171, in write_graphml_lxml(G, path, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
    160 except ImportError:
    161     return write_graphml_xml(
    162         G,
    163         path,
   (...)
    168         edge_id_from_attribute,
    169     )
--> 171 writer = GraphMLWriterLxml(
    172     path,
    173     graph=G,
    174     encoding=encoding,
    175     prettyprint=prettyprint,
    176     infer_numeric_types=infer_numeric_types,
    177     named_key_ids=named_key_ids,
    178     edge_id_from_attribute=edge_id_from_attribute,
    179 )
    180 writer.dump()

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:729, in GraphMLWriterLxml.__init__(self, path, graph, encoding, prettyprint, infer_numeric_types, named_key_ids, edge_id_from_attribute)
    726 self.attribute_types = defaultdict(set)
    728 if graph is not None:
--> 729     self.add_graph_element(graph)

File ~/opt/miniconda3/envs/graph_analytics/lib/python3.8/site-packages/networkx/readwrite/graphml.py:740, in GraphMLWriterLxml.add_graph_element(self, G)
    737 else:
    738     default_edge_type = "undirected"
--> 740 graphid = G.graph.pop("id", None)
    741 if graphid is None:
    742     graph_element = self._xml.element("graph", edgedefault=default_edge_type)

AttributeError: 'NetworkXDialect' object has no attribute 'graph'

@j6k4m8
Copy link
Member

j6k4m8 commented Jun 22, 2022

Interesting — do you mind if I migrate this to a new issue to address? This would be a good capability for us to have in the Grand library, thank you for bringing it yup!

@j6k4m8
Copy link
Member

j6k4m8 commented Jul 6, 2022

@MikeB2019x — what is the status of this issue? Happy to discuss graph export separately in #39 if that's helpful; want to make sure edge attributes are working for you now!

@j6k4m8 j6k4m8 closed this as completed Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:SQLBackend question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants