Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Correct the loss function in RGCN model #1217

Merged
merged 6 commits into from Feb 13, 2020
Merged

Conversation

@yzh119
Copy link
Member

yzh119 commented Jan 19, 2020

Description

As mentioned in #1204 , the loss function in our RGCN implementation is problematic. This PR fix the issue.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the my best knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • Related issue is referred in this PR
yzh119 and others added 3 commits Jan 19, 2020
upd
fix
@jermainewang

This comment has been minimized.

Copy link
Member

jermainewang commented Jan 20, 2020

Would you mind also fixing the tensorflow example too?

@yzh119

This comment has been minimized.

Copy link
Member Author

yzh119 commented Jan 20, 2020

The accuracy become worse, we need to tune the hyper parameters again...

  • AIFB: 94.44
  • MUTAG: 72.06
  • BGS: 75.86
@jermainewang

This comment has been minimized.

Copy link
Member

jermainewang commented Jan 20, 2020

@pbloem We are trying to fix the double-softmax problem. However, we found that the accuracy drops quite significantly. Have you observed the same issue?

@VoVAllen

This comment has been minimized.

Copy link
Collaborator

VoVAllen commented Jan 20, 2020

@yzh119

This comment has been minimized.

Copy link
Member Author

yzh119 commented Jan 20, 2020

@VoVAllen have you tuned hyper-parameters accordingly?

@VoVAllen

This comment has been minimized.

Copy link
Collaborator

VoVAllen commented Jan 20, 2020

@yzh119 It's unstable but the highest accuracy could reach the original result.

@pbloem

This comment has been minimized.

Copy link

pbloem commented Jan 21, 2020

@jermainewang Nope. I get the following mean accuracies over 10 repeats (standard deviation in brackets):

aifb: 0.961 (0.0194)
bgs: 0.834 (0.0317)
am: 0.892 (0.0167)

BGS is a little below what we had in the original paper, but the variance is high, so I think we just got lucky. All hyperparameters are the same as in the paper, and the weight initialization is taken from your code (ie. uniform xavier with relu gain everywhere).

You can find my implementation here, and the training loop here (ignore all the other code, the RGCNClassic class is pretty self-contained).

I strongly recommend averaging over 10 repeats, since the test sets are so small, but none of my runs on BGS are as low as .75.

@yzh119

This comment has been minimized.

Copy link
Member Author

yzh119 commented Jan 21, 2020

@pbloem thanks for your instructions!

@VoVAllen VoVAllen merged commit 7a80faf into dmlc:master Feb 13, 2020
1 check passed
1 check passed
continuous-integration/jenkins/pr-merge This commit looks good
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.