-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Friendster dataset in Benchmark #3967
Comments
Hi, thanks for reporting. I'd like to understand how severe is the problem. Do you know how many nodes in the constructed graph are isolated (no in- and out- edges)? |
According to the description in the friendster website (http://snap.stanford.edu/data/com-Friendster.html), the dataset contains While the current benchmark script generates a graph with So approximately 50% isolated nodes. |
I see. The inflation of node ID space may harm locality for reading node features because they will be more scattered. It won't influence the computation amount which depends on the number of edges. I think it is worth improving. Would you like to help? The |
Sure! I will follow up and modify related scripts soon. |
Hi, I have launched a PR at #4009 |
🐛 Bug
There is a potential problem about the
friendster
dataset used in the benchmark scripts:dgl/benchmarks/benchmarks/utils.py
Lines 131 to 140 in ae7e3db
the node ID in the
com-friendster.ungraph.txt
does not increase from 0 and continuously, so the returned graph is also wrong (in terms of number of nodes and edges). Maybe it will affect the benchmark results?To Reproduce
Steps to reproduce the behavior:
Expected behavior
Environment
conda
,pip
, source):Additional context
The text was updated successfully, but these errors were encountered: