[Sparse] A hetero-relational GCN example #6157

xiangyuzhi · 2023-08-14T09:53:37Z

Description

Add an example of Relational Graph Convolutional Network (R-GCN) model for node classification on the heterograph.
The training log is as follows:

python examples/sparse/hetero-rgcn.py -d aifb
Namespace(dataset='aifb')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.0982 | Train Loss: 1.6857 | Valid Acc: 0.0357 | Valid loss: 2.0964 
Epoch 00001 | Train Acc: 0.3482 | Train Loss: 1.3300 | Valid Acc: 0.5357 | Valid loss: 1.2303 
Epoch 00002 | Train Acc: 0.6339 | Train Loss: 1.0971 | Valid Acc: 0.6429 | Valid loss: 0.8458 
Epoch 00003 | Train Acc: 0.7857 | Train Loss: 0.9242 | Valid Acc: 0.7500 | Valid loss: 0.6349 
Epoch 00004 | Train Acc: 0.8304 | Train Loss: 0.8019 | Valid Acc: 0.9643 | Valid loss: 0.5227 
Epoch 00005 | Train Acc: 0.8304 | Train Loss: 0.7032 | Valid Acc: 0.9643 | Valid loss: 0.4620 
Epoch 00006 | Train Acc: 0.8304 | Train Loss: 0.6094 | Valid Acc: 0.9286 | Valid loss: 0.3989 
Epoch 00007 | Train Acc: 0.8393 | Train Loss: 0.5206 | Valid Acc: 0.9643 | Valid loss: 0.3272 
Epoch 00008 | Train Acc: 0.8393 | Train Loss: 0.4414 | Valid Acc: 0.9643 | Valid loss: 0.2617 
Epoch 00009 | Train Acc: 0.9107 | Train Loss: 0.3744 | Valid Acc: 0.9643 | Valid loss: 0.2084 

Test Acc: 0.8611 | Test loss: 0.4359

Checklist

Please feel free to remove inapplicable items for your PR.

I've leverage the tools to beautify the python and c++ code.
The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Related issue is referred in this PR
If the PR is for a new model/paper, I've updated the example index here.

Changes

dgl-bot · 2023-08-14T09:54:02Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot · 2023-08-14T10:58:53Z

Commit ID: 14c0728

Build ID: 1

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

examples/sparse/hetero-rgcn.py

frozenbugs · 2023-08-17T01:43:53Z

@dgl-bot

frozenbugs · 2023-08-17T03:29:51Z

@dgl-bot

examples/sparse/hetero-rgcn.py

frozenbugs · 2023-08-17T03:33:34Z

Overall LGTM, please copy paste the model result to the PR description.

xiangyuzhi · 2023-08-18T01:57:03Z

Overall LGTM, please copy paste the model result to the PR description.

@frozenbugs I'm not sure what the results include. Is it the performance or the size of the model parameters?

dgl-bot · 2023-08-18T02:25:48Z

Commit ID: 4b1bde2743cf398256d2c7b4ae05c8a6f2bd6b97

Build ID: 9

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

dgl-bot · 2023-08-18T03:22:53Z

Commit ID: 444b85baf60f1037be51240b0c8958cf38811f65

Build ID: 10

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

frozenbugs · 2023-08-18T04:03:27Z

Overall LGTM, please copy paste the model result to the PR description.

@frozenbugs I'm not sure what the results include. Is it the performance or the size of the model parameters?

Just the output of the model, e.g. acc loss e.t.c. for example: #6163

frozenbugs · 2023-08-19T04:52:00Z

@czkkkkkk @jermainewang do know whether Test Acc: 0.4091 is reasonable?

czkkkkkk · 2023-08-20T03:04:21Z

@czkkkkkk @jermainewang do know whether Test Acc: 0.4091 is reasonable?

The train loss looks abnormal. What is the loss of the message passing example? https://github.com/dmlc/dgl/blob/master/examples/core/rgcn/hetero_rgcn.py#L285

xiangyuzhi · 2023-08-21T03:12:09Z

@czkkkkkk @jermainewang do know whether Test Acc: 0.4091 is reasonable?

The train loss looks abnormal. What is the loss of the message passing example? https://github.com/dmlc/dgl/blob/master/examples/core/rgcn/hetero_rgcn.py#L285

I find the problem is the missing of matrix normalization, so I add it and new result is updated. The loss now is the same as previous exmaples. However, the accuracy is still a bit lower, we think this is because the Graphconv operator in dgl use more fine-grained optimization.

dgl-bot · 2023-08-21T03:51:41Z

Commit ID: 06cb205adca56f6b911038c01af8dd26668f133e

Build ID: 13

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

czkkkkkk

LGTM. Please fix some minor issues.

examples/sparse/hetero-rgcn.py

jermainewang · 2023-08-21T07:17:59Z

It's strange that validation accuracy is much lower than test accuracy.

dgl-bot · 2023-08-21T09:26:45Z

Commit ID: 31045c816a491c5404ec5ba8fbe7ab450cbe24c3

Build ID: 14

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

frozenbugs · 2023-08-22T08:04:17Z

I think it is overfit, can you double check?

dgl-bot · 2023-08-22T08:30:43Z

Commit ID: 098ed15ce7ae6fc8825f02718cfb655dfaaca8c3

Build ID: 15

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

xiangyuzhi · 2023-08-22T08:32:25Z

I think it is overfit, can you double check?

It seems not overfit. I test different numbers of epoch, follows are the results of epoch = 5 and 10:

python examples/sparse/hetero-rgcn.py -d am
Namespace(dataset='am')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.0452 | Train Loss: 2.7244 | Valid Acc: 0.0437 | Valid loss: 2.6268 
Epoch 00001 | Train Acc: 0.1308 | Train Loss: 2.3101 | Valid Acc: 0.0813 | Valid loss: 2.4318 
Epoch 00002 | Train Acc: 0.4424 | Train Loss: 1.9616 | Valid Acc: 0.1812 | Valid loss: 2.3291 
Epoch 00003 | Train Acc: 0.4657 | Train Loss: 1.6822 | Valid Acc: 0.1688 | Valid loss: 2.3107 
Epoch 00004 | Train Acc: 0.4595 | Train Loss: 1.4807 | Valid Acc: 0.1750 | Valid loss: 2.3611 

Test Acc: 0.3485 | Test loss: 1.8214

python examples/sparse/hetero-rgcn.py -d am
Namespace(dataset='am')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.0592 | Train Loss: 2.4887 | Valid Acc: 0.0813 | Valid loss: 2.5531 
Epoch 00001 | Train Acc: 0.4315 | Train Loss: 2.1208 | Valid Acc: 0.1625 | Valid loss: 2.4388 
Epoch 00002 | Train Acc: 0.4221 | Train Loss: 1.8412 | Valid Acc: 0.1500 | Valid loss: 2.4075 
Epoch 00003 | Train Acc: 0.4470 | Train Loss: 1.6381 | Valid Acc: 0.1750 | Valid loss: 2.4399 
Epoch 00004 | Train Acc: 0.4486 | Train Loss: 1.4952 | Valid Acc: 0.1812 | Valid loss: 2.4964 
Epoch 00005 | Train Acc: 0.4626 | Train Loss: 1.3706 | Valid Acc: 0.1812 | Valid loss: 2.5386 
Epoch 00006 | Train Acc: 0.5109 | Train Loss: 1.2319 | Valid Acc: 0.1812 | Valid loss: 2.5417 
Epoch 00007 | Train Acc: 0.5935 | Train Loss: 1.0780 | Valid Acc: 0.1812 | Valid loss: 2.5101 
Epoch 00008 | Train Acc: 0.6963 | Train Loss: 0.9242 | Valid Acc: 0.2062 | Valid loss: 2.4596 
Epoch 00009 | Train Acc: 0.7866 | Train Loss: 0.7796 | Valid Acc: 0.2313 | Valid loss: 2.4064 

Test Acc: 0.4747 | Test loss: 1.6184

However, I observe an intersting phenomenon that the results of other three datasets seem to be normal.
This is the result of 'aifb' dataset:

python examples/sparse/hetero-rgcn.py -d aifb
Namespace(dataset='aifb')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.0982 | Train Loss: 1.6857 | Valid Acc: 0.0357 | Valid loss: 2.0964 
Epoch 00001 | Train Acc: 0.3482 | Train Loss: 1.3300 | Valid Acc: 0.5357 | Valid loss: 1.2303 
Epoch 00002 | Train Acc: 0.6339 | Train Loss: 1.0971 | Valid Acc: 0.6429 | Valid loss: 0.8458 
Epoch 00003 | Train Acc: 0.7857 | Train Loss: 0.9242 | Valid Acc: 0.7500 | Valid loss: 0.6349 
Epoch 00004 | Train Acc: 0.8304 | Train Loss: 0.8019 | Valid Acc: 0.9643 | Valid loss: 0.5227 
Epoch 00005 | Train Acc: 0.8304 | Train Loss: 0.7032 | Valid Acc: 0.9643 | Valid loss: 0.4620 
Epoch 00006 | Train Acc: 0.8304 | Train Loss: 0.6094 | Valid Acc: 0.9286 | Valid loss: 0.3989 
Epoch 00007 | Train Acc: 0.8393 | Train Loss: 0.5206 | Valid Acc: 0.9643 | Valid loss: 0.3272 
Epoch 00008 | Train Acc: 0.8393 | Train Loss: 0.4414 | Valid Acc: 0.9643 | Valid loss: 0.2617 
Epoch 00009 | Train Acc: 0.9107 | Train Loss: 0.3744 | Valid Acc: 0.9643 | Valid loss: 0.2084 

Test Acc: 0.8611 | Test loss: 0.4359

And follow is the result of 'mutag' dataset:

python examples/sparse/hetero-rgcn.py -d mutag
Namespace(dataset='mutag')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.5275 | Train Loss: 0.7372 | Valid Acc: 0.5370 | Valid loss: 0.7311 
Epoch 00001 | Train Acc: 0.6239 | Train Loss: 0.7495 | Valid Acc: 0.5556 | Valid loss: 0.9664 
Epoch 00002 | Train Acc: 0.6284 | Train Loss: 0.5782 | Valid Acc: 0.5556 | Valid loss: 0.8043 
Epoch 00003 | Train Acc: 0.9312 | Train Loss: 0.4874 | Valid Acc: 0.5370 | Valid loss: 0.7278 
Epoch 00004 | Train Acc: 0.9587 | Train Loss: 0.4223 | Valid Acc: 0.4259 | Valid loss: 0.7278 
Epoch 00005 | Train Acc: 0.9908 | Train Loss: 0.3206 | Valid Acc: 0.4074 | Valid loss: 0.7154 
Epoch 00006 | Train Acc: 1.0000 | Train Loss: 0.2157 | Valid Acc: 0.4444 | Valid loss: 0.6919 
Epoch 00007 | Train Acc: 1.0000 | Train Loss: 0.1378 | Valid Acc: 0.6667 | Valid loss: 0.6790 
Epoch 00008 | Train Acc: 1.0000 | Train Loss: 0.0861 | Valid Acc: 0.6296 | Valid loss: 0.6775 
Epoch 00009 | Train Acc: 1.0000 | Train Loss: 0.0528 | Valid Acc: 0.5926 | Valid loss: 0.6814 
Epoch 00010 | Train Acc: 1.0000 | Train Loss: 0.0313 | Valid Acc: 0.6296 | Valid loss: 0.6866 
Epoch 00011 | Train Acc: 1.0000 | Train Loss: 0.0176 | Valid Acc: 0.6296 | Valid loss: 0.6884 
Epoch 00012 | Train Acc: 1.0000 | Train Loss: 0.0096 | Valid Acc: 0.6296 | Valid loss: 0.6880 
Epoch 00013 | Train Acc: 1.0000 | Train Loss: 0.0052 | Valid Acc: 0.6296 | Valid loss: 0.6861 
Epoch 00014 | Train Acc: 1.0000 | Train Loss: 0.0028 | Valid Acc: 0.6481 | Valid loss: 0.6833 
Epoch 00015 | Train Acc: 1.0000 | Train Loss: 0.0016 | Valid Acc: 0.6667 | Valid loss: 0.6798 
Epoch 00016 | Train Acc: 1.0000 | Train Loss: 0.0009 | Valid Acc: 0.6667 | Valid loss: 0.6761 
Epoch 00017 | Train Acc: 1.0000 | Train Loss: 0.0005 | Valid Acc: 0.6667 | Valid loss: 0.6718 
Epoch 00018 | Train Acc: 1.0000 | Train Loss: 0.0003 | Valid Acc: 0.6296 | Valid loss: 0.6677 
Epoch 00019 | Train Acc: 1.0000 | Train Loss: 0.0002 | Valid Acc: 0.6481 | Valid loss: 0.6644 

Test Acc: 0.6912 | Test loss: 0.6529

The result of 'bgs' dataset.

ython examples/sparse/hetero-rgcn.py -d bgs
Namespace(dataset='bgs')
Done loading data from cached files.
[W TensorAdvancedIndexing.cpp:1615] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
start training...
Epoch 00000 | Train Acc: 0.6170 | Train Loss: 0.6688 | Valid Acc: 0.5652 | Valid loss: 0.6651 
Epoch 00001 | Train Acc: 0.6489 | Train Loss: 0.6068 | Valid Acc: 0.6087 | Valid loss: 0.6248 
Epoch 00002 | Train Acc: 0.7766 | Train Loss: 0.5307 | Valid Acc: 0.8261 | Valid loss: 0.5307 
Epoch 00003 | Train Acc: 0.8723 | Train Loss: 0.4607 | Valid Acc: 0.8696 | Valid loss: 0.4660 
Epoch 00004 | Train Acc: 0.9149 | Train Loss: 0.3835 | Valid Acc: 0.8261 | Valid loss: 0.4066 
Epoch 00005 | Train Acc: 0.9149 | Train Loss: 0.3190 | Valid Acc: 0.7826 | Valid loss: 0.3613 
Epoch 00006 | Train Acc: 0.9149 | Train Loss: 0.2599 | Valid Acc: 0.8696 | Valid loss: 0.3072 
Epoch 00007 | Train Acc: 0.9149 | Train Loss: 0.2085 | Valid Acc: 0.9130 | Valid loss: 0.2479 
Epoch 00008 | Train Acc: 0.9255 | Train Loss: 0.1694 | Valid Acc: 0.9565 | Valid loss: 0.1955 
Epoch 00009 | Train Acc: 0.9255 | Train Loss: 0.1413 | Valid Acc: 0.9565 | Valid loss: 0.1565 

Test Acc: 0.8621 | Test loss: 0.3641

xiangyuzhi · 2023-08-22T08:43:16Z

I guess there is something wrong with the last dataset, since we find the similar result from current hetero-rgcn implementation: https://github.com/dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify.py

Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>

Add a hetero-relational GCN example

14c0728

xiangyuzhi requested review from frozenbugs and czkkkkkk August 14, 2023 09:53

czkkkkkk reviewed Aug 15, 2023

View reviewed changes

examples/sparse/hetero-rgcn.py Show resolved Hide resolved

czkkkkkk requested changes Aug 15, 2023

View reviewed changes

frozenbugs requested a review from keli-wen August 15, 2023 08:32

keli-wen reviewed Aug 15, 2023

View reviewed changes

frozenbugs reviewed Aug 16, 2023

View reviewed changes

examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved

examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved

examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved

frozenbugs reviewed Aug 16, 2023

View reviewed changes

examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved

modify accroding to review

4304b5b

czkkkkkk changed the title ~~A hetero-relational GCN example~~ [Sparse] A hetero-relational GCN example Aug 16, 2023

xiangyuzhi added 2 commits August 16, 2023 06:41

add lintrunner

4a6050a

code polish

6f3ab54

xiangyuzhi requested review from frozenbugs, czkkkkkk and keli-wen August 16, 2023 10:20

frozenbugs reviewed Aug 17, 2023

View reviewed changes

examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved

Merge branch 'master' into master

415baed

xiangyuzhi added 3 commits August 18, 2023 01:58

code format polish

3e6bfe5

Merge branch 'master' of https://github.com/xiangyuzhi/dgl

6bff225

code format polish

bf024bd

dmlc deleted a comment from dgl-bot Aug 18, 2023

Fix API and add sparse matrix normalize.

2387980

czkkkkkk approved these changes Aug 21, 2023

View reviewed changes

examples/sparse/hetero-rgcn.py Outdated Show resolved Hide resolved

examples/sparse/hetero-rgcn.py Show resolved Hide resolved

update

dbf2769

Simplify the API, and solve the accuracy problem.

5b4a043

frozenbugs requested review from frozenbugs and removed request for frozenbugs August 22, 2023 08:03

xiangyuzhi requested review from czkkkkkk and frozenbugs August 22, 2023 08:47

czkkkkkk approved these changes Aug 22, 2023

View reviewed changes

frozenbugs approved these changes Aug 23, 2023

View reviewed changes

dmlc deleted a comment from czkkkkkk Aug 23, 2023

czkkkkkk merged commit 4663cb0 into dmlc:master Aug 23, 2023
2 checks passed

peizhou001 pushed a commit to peizhou001/dgl that referenced this pull request Nov 27, 2023

[Sparse] A hetero-relational GCN example (dmlc#6157)

a5d4d6a

Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>

DominikaJedynak pushed a commit to DominikaJedynak/dgl that referenced this pull request Mar 12, 2024

[Sparse] A hetero-relational GCN example (dmlc#6157)

cdce548

Co-authored-by: Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sparse] A hetero-relational GCN example #6157

[Sparse] A hetero-relational GCN example #6157

xiangyuzhi commented Aug 14, 2023 •

edited

dgl-bot commented Aug 14, 2023

dgl-bot commented Aug 14, 2023

frozenbugs commented Aug 17, 2023

frozenbugs commented Aug 17, 2023

frozenbugs commented Aug 17, 2023

xiangyuzhi commented Aug 18, 2023

dgl-bot commented Aug 18, 2023

dgl-bot commented Aug 18, 2023

frozenbugs commented Aug 18, 2023

frozenbugs commented Aug 19, 2023

czkkkkkk commented Aug 20, 2023 •

edited

xiangyuzhi commented Aug 21, 2023

dgl-bot commented Aug 21, 2023

czkkkkkk left a comment

jermainewang commented Aug 21, 2023

dgl-bot commented Aug 21, 2023

frozenbugs commented Aug 22, 2023

dgl-bot commented Aug 22, 2023

xiangyuzhi commented Aug 22, 2023

xiangyuzhi commented Aug 22, 2023

[Sparse] A hetero-relational GCN example #6157

[Sparse] A hetero-relational GCN example #6157

Conversation

xiangyuzhi commented Aug 14, 2023 • edited

Description

Checklist

Changes

dgl-bot commented Aug 14, 2023

dgl-bot commented Aug 14, 2023

frozenbugs commented Aug 17, 2023

frozenbugs commented Aug 17, 2023

frozenbugs commented Aug 17, 2023

xiangyuzhi commented Aug 18, 2023

dgl-bot commented Aug 18, 2023

dgl-bot commented Aug 18, 2023

frozenbugs commented Aug 18, 2023

frozenbugs commented Aug 19, 2023

czkkkkkk commented Aug 20, 2023 • edited

xiangyuzhi commented Aug 21, 2023

dgl-bot commented Aug 21, 2023

czkkkkkk left a comment

Choose a reason for hiding this comment

jermainewang commented Aug 21, 2023

dgl-bot commented Aug 21, 2023

frozenbugs commented Aug 22, 2023

dgl-bot commented Aug 22, 2023

xiangyuzhi commented Aug 22, 2023

xiangyuzhi commented Aug 22, 2023

xiangyuzhi commented Aug 14, 2023 •

edited

czkkkkkk commented Aug 20, 2023 •

edited