Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method for hashing connected graphs #14

Open
JackAtOmenApps opened this issue Jan 12, 2022 · 1 comment
Open

Add method for hashing connected graphs #14

JackAtOmenApps opened this issue Jan 12, 2022 · 1 comment

Comments

@JackAtOmenApps
Copy link
Collaborator

In preparation for some future work, it would be convenient to be able to calculate the hash of a connected graph, assuming no directivity in edges. That is, if we query for all nodes connected to a node using django-postgresql-dag's connected_graph() method, we should be able to calculate a hash value of the current state of all connected elements. While potentially expensive to compute †, this should allow guaranteed identification of changes to the graph.

It is possible to have multiple DAGs using django-postgresql-dag. For this issue, we want to calculate the hash of a particular DAG, not all nodes/graphs within the project. To do this, we might approach the problem in the following manner:

  1. Initially treat the DAG as an undirected graph (which is done with the connected_graph() query) to retrieve all nodes in the current graph.
  2. For each identified node within the QuerySet, annotate the node with a sorted list of the parent id values of that node.
  3. Sort the QuerySet by node id.
  4. Convert the resulting QuerySet to a tuple of tuples (which contains the id and parents for each node).
  5. Iterate through the tuple, calculating the combined hash all elements.
  6. Return the calculated hash value.

Recommendations for improvement are welcome. There's likely a more memory-friendly approach that will still get the job done.

Alternative approaches considered: NetworkX Provides a weisfeiler_lehman_graph_hash() function, but it doesn't work well on graphs that are not isomorphic. The graphs produced with django-postgresql-dag are likely to be non-isomorphic.

† Potentially expensive in time and memory. The tuple we use to calculate the hash contains the id of every node in the graph, and for each node, the ids of the parent nodes. Using hashllib's blake2s hash will probably be a speedy approach, but the entire process of querying, converting, and iterating to calculate the hash will take a certain amount of time.

@migueltorrescosta
Copy link

The graphs produced with django-postgresql-dag are likely to be non-isomorphic.

What is meant by this sentence? isomorphism AFAIK is a property of a pair of graphs ( graph X is isomorphic to graph Y ), and not a stand alone property of a graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants