-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
atom and bond features #137
Conversation
2 doubts I have:
|
This pull request introduces 1 alert when merging 4efedc6 into 8258fcb - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Florence, these are some great changes! I have just reviewed the code: I think it is mostly fine, I have commented on a number of minor issues on:
- variable nomenclature (I think we should use bond_features instead of bond_descriptors in all variable and function names)
- a suggestion to ensure backwards compatibility reading from checkpoints concerning the scalers, as well as preventing the error of using scaling in training but not during prediction
- a few minor typos in doc strings or error messages
We should probably also update the readme and docs files to describe the new functionalities.
I have furthermore ran a number of tests (which ran fine apart from the comments above), and will rerun them once we have the mentioned changes implemented. I will also test this with CUDA then.
This pull request introduces 1 alert when merging 0083ca0 into 96efc6d - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging 969ea46 into 96efc6d - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging e4c5bc1 into 96efc6d - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging c19136d into 96efc6d - view on LGTM.com new alerts:
|
I think I fixed all the comments for now :) Let me know if there are more changes required! And thank you for taking the time to look through this! |
I also adjusted the ReadMe. Did not add info on using atom/bond features paths for separate validation and test sets, is that necessary? |
The docs update automatically only in parts namely for all the function and class descriptions. However, there is a 'tutorial' part that would need to be updated manually (it is similar to what's in the README, but rst instead of md format): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tested on cpu and gpu, seems to run fine.
@fhvermei sorry, my changes to smiles_processing has caused a conflict for you in |
The new arguments allow to have seperate atom and bond features paths in case separate test or validation data sets are used
The separate bond fetaures are read in the same manner as atom features. The order of the bonds in the separate file must match the RDKit bond order. The new bond features are added to the default bond features that are used for the D-MPNN.
This check validates if the number of extra atom or bond features matches the number of atoms or bonds in the molecules and throws an error if this is not the case. This check is done in data.py and not before to avoid the computational cost of making additional RDKit molecule graph structures.
To validate if the length of the atom descriptors matched the number of atoms in the molecule. An error is thrown if this is not the case.
Make consistently use of bond_descriptors for now. Even though bond_features would make more sense, there are some places in the code where this would seem weird. (might fix this later)
Validate if extra atom or bond features are needed and provided if the test and validation data are given with a separate path.
Validate if extra atom or bond features are needed and provided if the validation and test data are provided by a separate path.
This new feature allows to overwrite the default atom or bond features that are used in the D-MPNN. Before, without this argument, the extra provided features are used to concatenate to the default fetaures.
The extra atom and bond features that are provided by the user are now scaled by default. Arguments can be used to disable the scaling.
The data reading in interpret.py is not yet adjusted to allow extra atom or bond features. An error is thrown when the interpret.py function is used with additional atom or bond features.
After previous bug fix it makes more sense to use the naming bond features instead of bond descriptors. Also make changes to arguments that are only valid for atom features and not for both features and descriptors.
The changes will result in conflicts for reading of old model checkpoint files. Extra checks are added such that old and new checkpoint files can be used.
Check if scaling is done the same way for training and prediction.
Arguments were not passes correctly in case there are only atom messages.
Some arguments that only apply to training are changed to TrainArgs. Extra checks that were present for predicting are now removed.
Thanks for rebasing and rewording the commits! I will go ahead and merge since it has been reviewed and tested by others. |
Enable addition of bond features on top of atom features (as implemented by Yanfei). Also add the option to implement those for the validation and test set. Additionally have an option to overwrite the default atom or bond features used for MPNN. Default scaling of the features with an option to omit scaling.
These additional options were requested by one of the members of the consortium.