Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atom and bond features #137

Merged
merged 24 commits into from
Feb 18, 2021
Merged

atom and bond features #137

merged 24 commits into from
Feb 18, 2021

Conversation

fhvermei
Copy link
Contributor

@fhvermei fhvermei commented Feb 5, 2021

Enable addition of bond features on top of atom features (as implemented by Yanfei). Also add the option to implement those for the validation and test set. Additionally have an option to overwrite the default atom or bond features used for MPNN. Default scaling of the features with an option to omit scaling.

These additional options were requested by one of the members of the consortium.

@fhvermei
Copy link
Contributor Author

fhvermei commented Feb 5, 2021

2 doubts I have:

  • will this still allow old checkpoint files to be read correctly?
  • does this still run ok with cuda?

@lgtm-com
Copy link

lgtm-com bot commented Feb 5, 2021

This pull request introduces 1 alert when merging 4efedc6 into 8258fcb - view on LGTM.com

new alerts:

  • 1 for Unused import

chemprop/data/data.py Outdated Show resolved Hide resolved
chemprop/data/data.py Outdated Show resolved Hide resolved
chemprop/data/data.py Outdated Show resolved Hide resolved
chemprop/data/data.py Outdated Show resolved Hide resolved
chemprop/data/utils.py Outdated Show resolved Hide resolved
chemprop/data/utils.py Outdated Show resolved Hide resolved
chemprop/utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@hesther hesther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Florence, these are some great changes! I have just reviewed the code: I think it is mostly fine, I have commented on a number of minor issues on:

  • variable nomenclature (I think we should use bond_features instead of bond_descriptors in all variable and function names)
  • a suggestion to ensure backwards compatibility reading from checkpoints concerning the scalers, as well as preventing the error of using scaling in training but not during prediction
  • a few minor typos in doc strings or error messages

We should probably also update the readme and docs files to describe the new functionalities.

I have furthermore ran a number of tests (which ran fine apart from the comments above), and will rerun them once we have the mentioned changes implemented. I will also test this with CUDA then.

chemprop/args.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Feb 9, 2021

This pull request introduces 1 alert when merging 0083ca0 into 96efc6d - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Feb 10, 2021

This pull request introduces 1 alert when merging 969ea46 into 96efc6d - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Feb 10, 2021

This pull request introduces 1 alert when merging e4c5bc1 into 96efc6d - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Feb 10, 2021

This pull request introduces 1 alert when merging c19136d into 96efc6d - view on LGTM.com

new alerts:

  • 1 for Unused import

chemprop/features/utils.py Outdated Show resolved Hide resolved
@fhvermei
Copy link
Contributor Author

fhvermei commented Feb 10, 2021

I think I fixed all the comments for now :) Let me know if there are more changes required!
(I did some tests but not as extensively as I did the first time)

And thank you for taking the time to look through this!

@fhvermei
Copy link
Contributor Author

I also adjusted the ReadMe. Did not add info on using atom/bond features paths for separate validation and test sets, is that necessary?
Are the docs updated automatically?

@hesther
Copy link
Contributor

hesther commented Feb 11, 2021

The docs update automatically only in parts namely for all the function and class descriptions. However, there is a 'tutorial' part that would need to be updated manually (it is similar to what's in the README, but rst instead of md format): docs/source/tutorial.rst. I will run some tests on the updated code now.

Copy link
Contributor

@hesther hesther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested on cpu and gpu, seems to run fine.

@cjmcgill
Copy link
Contributor

cjmcgill commented Feb 13, 2021

@fhvermei sorry, my changes to smiles_processing has caused a conflict for you in run_training. When you resolve the conflict, please make sure to add smiles_columns=args.smiles_columns to the arguments you use for get_data.

The new arguments allow to have seperate atom and bond features paths
in case separate test or validation data sets are used
The separate bond fetaures are read in the same manner as atom features.
The order of the bonds in the separate file must match the RDKit bond
order. The new bond features are added to the default bond features that
are used for the D-MPNN.
This check validates if the number of extra atom or
bond features matches the number of atoms or bonds
in the molecules and throws an error if this is not
the case. This check is done in data.py and not before
to avoid the computational cost of making additional
RDKit molecule graph structures.
To validate if the length of the atom descriptors matched the number of
atoms in the molecule. An error is thrown if this is not the case.
Make consistently use of bond_descriptors for now. Even though
bond_features would make more sense, there are some places in the code
where this would seem weird. (might fix this later)
Validate if extra atom or bond features are needed and provided
if the test and validation data are given with a separate path.
Validate if extra atom or bond features are needed and provided
if the validation and test data are provided by a separate path.
This new feature allows to overwrite the default atom or bond features
that are used in the D-MPNN. Before, without this argument, the extra
provided features are used to concatenate to the default fetaures.
The extra atom and bond features that are provided by the user are now
scaled by default. Arguments can be used to disable the scaling.
The data reading in interpret.py is not yet adjusted to allow extra atom
or bond features. An error is thrown when the interpret.py function is
used with additional atom or bond features.
After previous bug fix it makes more sense to use the naming bond
features instead of bond descriptors. Also make changes to arguments
that are only valid for atom features and not for both features and
descriptors.
The changes will result in conflicts for reading of old model checkpoint
files. Extra checks are added such that old and new checkpoint files can
be used.
Check if scaling is done the same way for training and prediction.
Arguments were not passes correctly in case there are only atom
messages.
Some arguments that only apply to training are changed to TrainArgs.
Extra checks that were present for predicting are now removed.
@mliu49
Copy link
Contributor

mliu49 commented Feb 18, 2021

Thanks for rebasing and rewording the commits! I will go ahead and merge since it has been reviewed and tested by others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants