Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphIndex] Detects duplicate edges on simple graphs #340

Closed
wants to merge 18 commits into from

Conversation

BarclayII
Copy link
Collaborator

@BarclayII BarclayII commented Jan 5, 2019

Description

Detects and fails on duplicate edges when adding to simple graphs (fixes #112). The solution comes from igraph where the successor vertex IDs are always sorted.

The performance for the original and proposed implementation is in https://hackmd.io/lVM6zI6VRIOq4-fzE8vuNw?both

If the performance is not satisfactory then I can try using hashmaps instead.

EDIT: also fixes compiler warnings

@BarclayII BarclayII closed this Jan 6, 2019
@BarclayII BarclayII reopened this Jan 6, 2019
@jermainewang jermainewang self-requested a review January 7, 2019 21:54
@jermainewang
Copy link
Member

How about let's hold this for a while, wait for the immutable graph PR?

@BarclayII
Copy link
Collaborator Author

Sure. Closing this for now.

@BarclayII BarclayII closed this Jan 8, 2019
@BarclayII BarclayII changed the title [GraphIndex] Detects duplicate edges on simple graphs [WIP][GraphIndex] Detects duplicate edges on simple graphs Jan 17, 2019
@BarclayII BarclayII reopened this Jan 17, 2019
@BarclayII BarclayII changed the title [WIP][GraphIndex] Detects duplicate edges on simple graphs [GraphIndex] Detects duplicate edges on simple graphs Jan 18, 2019
@BarclayII
Copy link
Collaborator Author

Reopening this since immutable graph is merged.

CMakeLists.txt Outdated Show resolved Hide resolved
@zheng-da
Copy link
Collaborator

so there isn't performance impact?
could you put the performance results side by side so we can easily see the performance impact. I think we only need to compare the performance for modifying 1,000,000 nodes/edges.

@BarclayII
Copy link
Collaborator Author

BarclayII commented Jan 20, 2019

EDIT: updating with a reference implementation of maintaining an STL unordered_set for each node.

Modifying an existing graph of 1M nodes and 5M edges is significantly slower:

# nodes # edges Old implementation New implementation Per-node unordered_set
+100 +500 0.0004 0.0090 0.0008
+1K +5K 0.0028 0.0141 0.0066
+10K +50K 0.0306 0.0501 0.0654
+100K +500K 0.3362 0.5594 0.7404
+1M +5M 2.5600 4.0326 6.1902

Adding edges to an empty graph:

# nodes # edges Old implementation New implementation Per-node unordered_set
+100 +500 0.0001 0.0001 0.0002
+1K +5K 0.0018 0.0003 0.0020
+10K +50K 0.0112 0.0049 0.0410
+100K +500K 0.2431 0.1223 0.5657
+1M +5M 2.9525 3.1864 5.9034

Checking whether an edge exist between two nodes for 5% of all edges

# nodes # edges Old implementation (linear search) New implementation (binary search) Per-node unordered_set
+100 +500 0.0000 0.0000 0.0000
+1K +5K 0.0000 0.0000 0.0000
+10K +50K 0.0001 0.0002 0.0002
+100K +500K 0.0013 0.0017 0.0034
+1M +5M 0.0219 0.0227 0.0450

STL unordered_set and unordered_map are known to be cache-unfriendly. I guess that's the main reason why even sorting & binary search is faster than STL.

@jermainewang

@jermainewang
Copy link
Member

What do you think of the solution in #419 ? namely having dgl.to_simple to remove multi-edges. I feel this could avoid the performance/usability dilemma by having the call controlled by users.

@BarclayII
Copy link
Collaborator Author

Sure. What if someone wants to add edges batch by batch while keeping it a simple graph (in the case of streaming graphs)? Just a theoretical question though - I don't know which situation is more common

@jermainewang
Copy link
Member

Sure. What if someone wants to add edges batch by batch while keeping it a simple graph (in the case of streaming graphs)? Just a theoretical question though - I don't know which situation is more common

That's good consideration. For streaming graph, my guess is some new solution is needed. We may need to read some papers about streaming graph for some solutions. For the time being, let's use a naive solution.

@BarclayII
Copy link
Collaborator Author

Agreed.

@BarclayII BarclayII closed this Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Forbid multiple edges when multigraph is False in DGLGraph
3 participants