Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sparse] A hetero-relational GCN example #6157

Merged
merged 13 commits into from
Aug 23, 2023
Merged

Conversation

xiangyuzhi
Copy link
Collaborator

@xiangyuzhi xiangyuzhi commented Aug 14, 2023

Description

Add an example of Relational Graph Convolutional Network (R-GCN) model for node classification on the heterograph.
The training log is as follows:

python examples/sparse/hetero-rgcn.py -d aifb
Namespace(dataset='aifb')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.0982 | Train Loss: 1.6857 | Valid Acc: 0.0357 | Valid loss: 2.0964 
Epoch 00001 | Train Acc: 0.3482 | Train Loss: 1.3300 | Valid Acc: 0.5357 | Valid loss: 1.2303 
Epoch 00002 | Train Acc: 0.6339 | Train Loss: 1.0971 | Valid Acc: 0.6429 | Valid loss: 0.8458 
Epoch 00003 | Train Acc: 0.7857 | Train Loss: 0.9242 | Valid Acc: 0.7500 | Valid loss: 0.6349 
Epoch 00004 | Train Acc: 0.8304 | Train Loss: 0.8019 | Valid Acc: 0.9643 | Valid loss: 0.5227 
Epoch 00005 | Train Acc: 0.8304 | Train Loss: 0.7032 | Valid Acc: 0.9643 | Valid loss: 0.4620 
Epoch 00006 | Train Acc: 0.8304 | Train Loss: 0.6094 | Valid Acc: 0.9286 | Valid loss: 0.3989 
Epoch 00007 | Train Acc: 0.8393 | Train Loss: 0.5206 | Valid Acc: 0.9643 | Valid loss: 0.3272 
Epoch 00008 | Train Acc: 0.8393 | Train Loss: 0.4414 | Valid Acc: 0.9643 | Valid loss: 0.2617 
Epoch 00009 | Train Acc: 0.9107 | Train Loss: 0.3744 | Valid Acc: 0.9643 | Valid loss: 0.2084 

Test Acc: 0.8611 | Test loss: 0.4359

Checklist

Please feel free to remove inapplicable items for your PR.

  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 14, 2023

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 14, 2023

Commit ID: 14c0728

Build ID: 1

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
@czkkkkkk czkkkkkk changed the title A hetero-relational GCN example [Sparse] A hetero-relational GCN example Aug 16, 2023
@frozenbugs
Copy link
Collaborator

@dgl-bot

1 similar comment
@frozenbugs
Copy link
Collaborator

@dgl-bot

@frozenbugs
Copy link
Collaborator

Overall LGTM, please copy paste the model result to the PR description.

@xiangyuzhi
Copy link
Collaborator Author

Overall LGTM, please copy paste the model result to the PR description.

@frozenbugs I'm not sure what the results include. Is it the performance or the size of the model parameters?

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 18, 2023

Commit ID: 4b1bde2743cf398256d2c7b4ae05c8a6f2bd6b97

Build ID: 9

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 18, 2023

Commit ID: 444b85baf60f1037be51240b0c8958cf38811f65

Build ID: 10

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@frozenbugs
Copy link
Collaborator

Overall LGTM, please copy paste the model result to the PR description.

@frozenbugs I'm not sure what the results include. Is it the performance or the size of the model parameters?

Just the output of the model, e.g. acc loss e.t.c. for example: #6163

@dmlc dmlc deleted a comment from dgl-bot Aug 18, 2023
@dmlc dmlc deleted a comment from dgl-bot Aug 18, 2023
@dmlc dmlc deleted a comment from dgl-bot Aug 18, 2023
@dmlc dmlc deleted a comment from dgl-bot Aug 18, 2023
@dmlc dmlc deleted a comment from dgl-bot Aug 18, 2023
@dmlc dmlc deleted a comment from dgl-bot Aug 18, 2023
@frozenbugs
Copy link
Collaborator

@czkkkkkk @jermainewang do know whether Test Acc: 0.4091 is reasonable?

@czkkkkkk
Copy link
Collaborator

czkkkkkk commented Aug 20, 2023

@czkkkkkk @jermainewang do know whether Test Acc: 0.4091 is reasonable?

The train loss looks abnormal. What is the loss of the message passing example? https://github.com/dmlc/dgl/blob/master/examples/core/rgcn/hetero_rgcn.py#L285

@xiangyuzhi
Copy link
Collaborator Author

@czkkkkkk @jermainewang do know whether Test Acc: 0.4091 is reasonable?

The train loss looks abnormal. What is the loss of the message passing example? https://github.com/dmlc/dgl/blob/master/examples/core/rgcn/hetero_rgcn.py#L285

I find the problem is the missing of matrix normalization, so I add it and new result is updated. The loss now is the same as previous exmaples. However, the accuracy is still a bit lower, we think this is because the Graphconv operator in dgl use more fine-grained optimization.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 21, 2023

Commit ID: 06cb205adca56f6b911038c01af8dd26668f133e

Build ID: 13

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Copy link
Collaborator

@czkkkkkk czkkkkkk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please fix some minor issues.

examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved
examples/sparse/hetero-rgcn.py Show resolved Hide resolved
@jermainewang
Copy link
Member

It's strange that validation accuracy is much lower than test accuracy.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 21, 2023

Commit ID: 31045c816a491c5404ec5ba8fbe7ab450cbe24c3

Build ID: 14

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@frozenbugs frozenbugs requested review from frozenbugs and removed request for frozenbugs August 22, 2023 08:03
@frozenbugs
Copy link
Collaborator

I think it is overfit, can you double check?

@dgl-bot
Copy link
Collaborator

dgl-bot commented Aug 22, 2023

Commit ID: 098ed15ce7ae6fc8825f02718cfb655dfaaca8c3

Build ID: 15

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@xiangyuzhi
Copy link
Collaborator Author

I think it is overfit, can you double check?

It seems not overfit. I test different numbers of epoch, follows are the results of epoch = 5 and 10:

python examples/sparse/hetero-rgcn.py -d am
Namespace(dataset='am')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.0452 | Train Loss: 2.7244 | Valid Acc: 0.0437 | Valid loss: 2.6268 
Epoch 00001 | Train Acc: 0.1308 | Train Loss: 2.3101 | Valid Acc: 0.0813 | Valid loss: 2.4318 
Epoch 00002 | Train Acc: 0.4424 | Train Loss: 1.9616 | Valid Acc: 0.1812 | Valid loss: 2.3291 
Epoch 00003 | Train Acc: 0.4657 | Train Loss: 1.6822 | Valid Acc: 0.1688 | Valid loss: 2.3107 
Epoch 00004 | Train Acc: 0.4595 | Train Loss: 1.4807 | Valid Acc: 0.1750 | Valid loss: 2.3611 

Test Acc: 0.3485 | Test loss: 1.8214
python examples/sparse/hetero-rgcn.py -d am
Namespace(dataset='am')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.0592 | Train Loss: 2.4887 | Valid Acc: 0.0813 | Valid loss: 2.5531 
Epoch 00001 | Train Acc: 0.4315 | Train Loss: 2.1208 | Valid Acc: 0.1625 | Valid loss: 2.4388 
Epoch 00002 | Train Acc: 0.4221 | Train Loss: 1.8412 | Valid Acc: 0.1500 | Valid loss: 2.4075 
Epoch 00003 | Train Acc: 0.4470 | Train Loss: 1.6381 | Valid Acc: 0.1750 | Valid loss: 2.4399 
Epoch 00004 | Train Acc: 0.4486 | Train Loss: 1.4952 | Valid Acc: 0.1812 | Valid loss: 2.4964 
Epoch 00005 | Train Acc: 0.4626 | Train Loss: 1.3706 | Valid Acc: 0.1812 | Valid loss: 2.5386 
Epoch 00006 | Train Acc: 0.5109 | Train Loss: 1.2319 | Valid Acc: 0.1812 | Valid loss: 2.5417 
Epoch 00007 | Train Acc: 0.5935 | Train Loss: 1.0780 | Valid Acc: 0.1812 | Valid loss: 2.5101 
Epoch 00008 | Train Acc: 0.6963 | Train Loss: 0.9242 | Valid Acc: 0.2062 | Valid loss: 2.4596 
Epoch 00009 | Train Acc: 0.7866 | Train Loss: 0.7796 | Valid Acc: 0.2313 | Valid loss: 2.4064 

Test Acc: 0.4747 | Test loss: 1.6184

However, I observe an intersting phenomenon that the results of other three datasets seem to be normal.
This is the result of 'aifb' dataset:

python examples/sparse/hetero-rgcn.py -d aifb
Namespace(dataset='aifb')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.0982 | Train Loss: 1.6857 | Valid Acc: 0.0357 | Valid loss: 2.0964 
Epoch 00001 | Train Acc: 0.3482 | Train Loss: 1.3300 | Valid Acc: 0.5357 | Valid loss: 1.2303 
Epoch 00002 | Train Acc: 0.6339 | Train Loss: 1.0971 | Valid Acc: 0.6429 | Valid loss: 0.8458 
Epoch 00003 | Train Acc: 0.7857 | Train Loss: 0.9242 | Valid Acc: 0.7500 | Valid loss: 0.6349 
Epoch 00004 | Train Acc: 0.8304 | Train Loss: 0.8019 | Valid Acc: 0.9643 | Valid loss: 0.5227 
Epoch 00005 | Train Acc: 0.8304 | Train Loss: 0.7032 | Valid Acc: 0.9643 | Valid loss: 0.4620 
Epoch 00006 | Train Acc: 0.8304 | Train Loss: 0.6094 | Valid Acc: 0.9286 | Valid loss: 0.3989 
Epoch 00007 | Train Acc: 0.8393 | Train Loss: 0.5206 | Valid Acc: 0.9643 | Valid loss: 0.3272 
Epoch 00008 | Train Acc: 0.8393 | Train Loss: 0.4414 | Valid Acc: 0.9643 | Valid loss: 0.2617 
Epoch 00009 | Train Acc: 0.9107 | Train Loss: 0.3744 | Valid Acc: 0.9643 | Valid loss: 0.2084 

Test Acc: 0.8611 | Test loss: 0.4359

And follow is the result of 'mutag' dataset:

python examples/sparse/hetero-rgcn.py -d mutag
Namespace(dataset='mutag')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.5275 | Train Loss: 0.7372 | Valid Acc: 0.5370 | Valid loss: 0.7311 
Epoch 00001 | Train Acc: 0.6239 | Train Loss: 0.7495 | Valid Acc: 0.5556 | Valid loss: 0.9664 
Epoch 00002 | Train Acc: 0.6284 | Train Loss: 0.5782 | Valid Acc: 0.5556 | Valid loss: 0.8043 
Epoch 00003 | Train Acc: 0.9312 | Train Loss: 0.4874 | Valid Acc: 0.5370 | Valid loss: 0.7278 
Epoch 00004 | Train Acc: 0.9587 | Train Loss: 0.4223 | Valid Acc: 0.4259 | Valid loss: 0.7278 
Epoch 00005 | Train Acc: 0.9908 | Train Loss: 0.3206 | Valid Acc: 0.4074 | Valid loss: 0.7154 
Epoch 00006 | Train Acc: 1.0000 | Train Loss: 0.2157 | Valid Acc: 0.4444 | Valid loss: 0.6919 
Epoch 00007 | Train Acc: 1.0000 | Train Loss: 0.1378 | Valid Acc: 0.6667 | Valid loss: 0.6790 
Epoch 00008 | Train Acc: 1.0000 | Train Loss: 0.0861 | Valid Acc: 0.6296 | Valid loss: 0.6775 
Epoch 00009 | Train Acc: 1.0000 | Train Loss: 0.0528 | Valid Acc: 0.5926 | Valid loss: 0.6814 
Epoch 00010 | Train Acc: 1.0000 | Train Loss: 0.0313 | Valid Acc: 0.6296 | Valid loss: 0.6866 
Epoch 00011 | Train Acc: 1.0000 | Train Loss: 0.0176 | Valid Acc: 0.6296 | Valid loss: 0.6884 
Epoch 00012 | Train Acc: 1.0000 | Train Loss: 0.0096 | Valid Acc: 0.6296 | Valid loss: 0.6880 
Epoch 00013 | Train Acc: 1.0000 | Train Loss: 0.0052 | Valid Acc: 0.6296 | Valid loss: 0.6861 
Epoch 00014 | Train Acc: 1.0000 | Train Loss: 0.0028 | Valid Acc: 0.6481 | Valid loss: 0.6833 
Epoch 00015 | Train Acc: 1.0000 | Train Loss: 0.0016 | Valid Acc: 0.6667 | Valid loss: 0.6798 
Epoch 00016 | Train Acc: 1.0000 | Train Loss: 0.0009 | Valid Acc: 0.6667 | Valid loss: 0.6761 
Epoch 00017 | Train Acc: 1.0000 | Train Loss: 0.0005 | Valid Acc: 0.6667 | Valid loss: 0.6718 
Epoch 00018 | Train Acc: 1.0000 | Train Loss: 0.0003 | Valid Acc: 0.6296 | Valid loss: 0.6677 
Epoch 00019 | Train Acc: 1.0000 | Train Loss: 0.0002 | Valid Acc: 0.6481 | Valid loss: 0.6644 

Test Acc: 0.6912 | Test loss: 0.6529

The result of 'bgs' dataset.

ython examples/sparse/hetero-rgcn.py -d bgs
Namespace(dataset='bgs')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.6170 | Train Loss: 0.6688 | Valid Acc: 0.5652 | Valid loss: 0.6651 
Epoch 00001 | Train Acc: 0.6489 | Train Loss: 0.6068 | Valid Acc: 0.6087 | Valid loss: 0.6248 
Epoch 00002 | Train Acc: 0.7766 | Train Loss: 0.5307 | Valid Acc: 0.8261 | Valid loss: 0.5307 
Epoch 00003 | Train Acc: 0.8723 | Train Loss: 0.4607 | Valid Acc: 0.8696 | Valid loss: 0.4660 
Epoch 00004 | Train Acc: 0.9149 | Train Loss: 0.3835 | Valid Acc: 0.8261 | Valid loss: 0.4066 
Epoch 00005 | Train Acc: 0.9149 | Train Loss: 0.3190 | Valid Acc: 0.7826 | Valid loss: 0.3613 
Epoch 00006 | Train Acc: 0.9149 | Train Loss: 0.2599 | Valid Acc: 0.8696 | Valid loss: 0.3072 
Epoch 00007 | Train Acc: 0.9149 | Train Loss: 0.2085 | Valid Acc: 0.9130 | Valid loss: 0.2479 
Epoch 00008 | Train Acc: 0.9255 | Train Loss: 0.1694 | Valid Acc: 0.9565 | Valid loss: 0.1955 
Epoch 00009 | Train Acc: 0.9255 | Train Loss: 0.1413 | Valid Acc: 0.9565 | Valid loss: 0.1565 

Test Acc: 0.8621 | Test loss: 0.3641

@xiangyuzhi
Copy link
Collaborator Author

I guess there is something wrong with the last dataset, since we find the similar result from current hetero-rgcn implementation: https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify.py

@dmlc dmlc deleted a comment from czkkkkkk Aug 23, 2023
@czkkkkkk czkkkkkk merged commit 4663cb0 into dmlc:master Aug 23, 2023
2 checks passed
peizhou001 pushed a commit to peizhou001/dgl that referenced this pull request Nov 27, 2023
Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
DominikaJedynak pushed a commit to DominikaJedynak/dgl that referenced this pull request Mar 12, 2024
Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants