New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the dataset in GPT_GNN #24
Comments
For the full dataset, you can refer to: https://www.openacademic.ai/oag/ |
Thanks for your reply. Can you provide the code for building the MAG_0919_CS.zip and SeqName_CS_20190919.tsv. The url you provided does‘t contain the way how to build it. Thanks. |
This part I don't have it. I directly get these files from the MSR team.
…On Sun, Nov 29, 2020, 18:52 SKD621 ***@***.***> wrote:
Thanks for your reply. Can you provide the code for building the
MAG_0919_CS.zip and SeqName_CS_20190919.tsv. The url you provided does‘t
contain the way how to build it. Thanks.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#24 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHREXR453QUZFBQE7ETKEPDSSMCHLANCNFSM4UGOYQIQ>
.
|
Thanks for your reply |
I am sorry to bother you again. It seems that the dataset (CS paper) you provide only contain ~1 million nodes, which is different from the statics the paper HGT stated (11,732,027 nodes) |
The statistics is for the whole complete CS graph. While pre-processing, we filter our some papers that has less than 5 citations to make the graph denser. The complete raw dataset can be found at: https://drive.google.com/drive/folders/1yDdVaartOCOSsQlUZs8cJcAUhmvRiBSz |
I notice that you actually have three categories (CS/Med/NN) in OAG, which is available in the preprocessed graphs. I am interested in the whole datasets about the three categories. Maybe, you can provide the raw data about the Med and NN like CS. Thanks for your help in advance.
The text was updated successfully, but these errors were encountered: