Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix for atom meassages concerning bond feature vector #138

Merged
merged 1 commit into from
Feb 9, 2021

Conversation

hesther
Copy link
Contributor

@hesther hesther commented Feb 8, 2021

Just confirmed a bug when using atom messages:

We set up the bond features in the order atom(length 133)-bond(length 14):

self.f_bonds.append(self.f_atoms[a1] + f_bond)
self.f_bonds.append(self.f_atoms[a2] + f_bond)

but then cut out the first (!) 14 values, instead of the last:

if atom_messages:
f_bonds = self.f_bonds[:, :get_bond_fdim(atom_messages=atom_messages)]
else:
f_bonds = self.f_bonds

Have added a few print statements, and we indeed used the wrong features for atom messages (which are the first 14 one-hot encoded elements, and thus rather meaningless). I have corrected this bug by simply cutting out the last values instead of the first (alternatively, we could change the order of setting up the bond vectors, but this would not be backwards compatible even if atom-messages were not used).

A quick check of how this affects performance:

  • Freesolv dataset: With wrong atom messages: RMSE=0.96, corrected atom messages 0.93
  • Delaney dataset: With wrong atom messages: RMSE=1.02, corrected atom messages 0.82 (!!!)

@hesther hesther linked an issue Feb 8, 2021 that may be closed by this pull request
Copy link
Contributor

@cjmcgill cjmcgill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good by me. I can't find any unintended effects as a result.

@hesther hesther mentioned this pull request Feb 8, 2021
@cjmcgill
Copy link
Contributor

cjmcgill commented Feb 8, 2021

@hesther, somewhat related, do we also need to add an atom_messages input to the get_fdim reference in BatchMolGraph.__init__? It doesn't have one presently. It seems to be there for a zero-padding purpose that I don't understand. If the pad needs to be equal to the length of real entries, then we need to add it. If the pad needs to be >= to the length of later real entries, then it's fine as is.

@hesther
Copy link
Contributor Author

hesther commented Feb 8, 2021

@cjmcgill Just checked - in init each block is designed to have one trailing empty vector, which needs to be the same size as the real atom and bond features used within this function (which is the full vector, before cutting). The bond vectors (including the zero-padded first one) are then cut to their appropriate size in get_components, so we should be good!

Copy link
Contributor

@mliu49 mliu49 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Seems like a pretty big bug.

@hesther
Copy link
Contributor Author

hesther commented Feb 8, 2021

Indeed - I hope this doesn't made anyone's model obsolete...

@mliu49 mliu49 merged commit 96efc6d into master Feb 9, 2021
@mliu49 mliu49 deleted the bugfix_atom_messages branch February 9, 2021 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect behavior for atom_messages = True
3 participants