bugfix for atom meassages concerning bond feature vector #138

hesther · 2021-02-08T20:18:59Z

Just confirmed a bug when using atom messages:

We set up the bond features in the order atom(length 133)-bond(length 14):

chemprop/chemprop/features/featurization.py

Lines 171 to 172 in 8258fcb

    
           self.f_bonds.append(self.f_atoms[a1] + f_bond) 
        
           self.f_bonds.append(self.f_atoms[a2] + f_bond)

but then cut out the first (!) 14 values, instead of the last:

chemprop/chemprop/features/featurization.py

Lines 267 to 270 in 8258fcb

    
           if atom_messages: 
        
               f_bonds = self.f_bonds[:, :get_bond_fdim(atom_messages=atom_messages)] 
        
           else: 
        
               f_bonds = self.f_bonds

Have added a few print statements, and we indeed used the wrong features for atom messages (which are the first 14 one-hot encoded elements, and thus rather meaningless). I have corrected this bug by simply cutting out the last values instead of the first (alternatively, we could change the order of setting up the bond vectors, but this would not be backwards compatible even if atom-messages were not used).

A quick check of how this affects performance:

Freesolv dataset: With wrong atom messages: RMSE=0.96, corrected atom messages 0.93
Delaney dataset: With wrong atom messages: RMSE=1.02, corrected atom messages 0.82 (!!!)

cjmcgill

Change looks good by me. I can't find any unintended effects as a result.

cjmcgill · 2021-02-08T21:54:53Z

@hesther, somewhat related, do we also need to add an atom_messages input to the get_fdim reference in BatchMolGraph.__init__? It doesn't have one presently. It seems to be there for a zero-padding purpose that I don't understand. If the pad needs to be equal to the length of real entries, then we need to add it. If the pad needs to be >= to the length of later real entries, then it's fine as is.

hesther · 2021-02-08T22:07:14Z

@cjmcgill Just checked - in init each block is designed to have one trailing empty vector, which needs to be the same size as the real atom and bond features used within this function (which is the full vector, before cutting). The bond vectors (including the zero-padded first one) are then cut to their appropriate size in get_components, so we should be good!

mliu49

This looks good to me! Seems like a pretty big bug.

hesther · 2021-02-08T22:59:59Z

Indeed - I hope this doesn't made anyone's model obsolete...

hesther linked an issue Feb 8, 2021 that may be closed by this pull request

Incorrect behavior for atom_messages = True #133

Closed

hesther requested review from cjmcgill, fhvermei, mliu49 and swansonk14 February 8, 2021 21:03

cjmcgill approved these changes Feb 8, 2021

View reviewed changes

hesther mentioned this pull request Feb 8, 2021

atom and bond features #137

Merged

mliu49 approved these changes Feb 8, 2021

View reviewed changes

bugfix for atom meassages concerning bond feature vector

cb7caae

mliu49 force-pushed the bugfix_atom_messages branch from 7a03843 to cb7caae Compare February 9, 2021 18:33

mliu49 merged commit 96efc6d into master Feb 9, 2021

mliu49 deleted the bugfix_atom_messages branch February 9, 2021 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix for atom meassages concerning bond feature vector #138

bugfix for atom meassages concerning bond feature vector #138

hesther commented Feb 8, 2021

cjmcgill left a comment

cjmcgill commented Feb 8, 2021 •

edited

hesther commented Feb 8, 2021

mliu49 left a comment

hesther commented Feb 8, 2021

	self.f_bonds.append(self.f_atoms[a1] + f_bond)
	self.f_bonds.append(self.f_atoms[a2] + f_bond)

	if atom_messages:
	f_bonds = self.f_bonds[:, :get_bond_fdim(atom_messages=atom_messages)]
	else:
	f_bonds = self.f_bonds

bugfix for atom meassages concerning bond feature vector #138

bugfix for atom meassages concerning bond feature vector #138

Conversation

hesther commented Feb 8, 2021

cjmcgill left a comment

Choose a reason for hiding this comment

cjmcgill commented Feb 8, 2021 • edited

hesther commented Feb 8, 2021

mliu49 left a comment

Choose a reason for hiding this comment

hesther commented Feb 8, 2021

cjmcgill commented Feb 8, 2021 •

edited