Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Function - Save the latent representation for a molecules #119

Merged
merged 2 commits into from
Feb 19, 2021

Conversation

cjmcgill
Copy link
Contributor

I added a new function that saves the fingerprint vector for a set of provided molecules. The function mirrors what is going on in predictions, except that the output is what is provided by the MPN before it runs through the FFN (excludes features concatenated onto the vector).

The function works through the command line (when pip install -e . is run again) and when using two molecules.

Brief explanation of the file changes:

  • Readme - Add explanation for the new function
  • chemprop.data.utils - Saves the names of the smiles columns used in the get_data step even if the columns are not provided in the initial arguments.
  • chemprop.models.model - Adds a class function to MoleculeModel that returns the MPNN output without passing it through the FFN.
  • chemprop.train.molecule_fingerprint - The main function file for encoding fingerprints. This is analogous to chemprop.train.make_predictions and chemprop.train.predict. The entry function is chemprop_fingerprint.
  • chemprop.train.init - Adds import for the chemprop_fingerprint function.
  • chemprop.train.make_predictions - an addition that isn't part of the fingerprint function but is the same as an assertion that was necessary in chemprop.train.molecule_fingerprint. Prevents people from training a model with two molecules and then predicting (or making a fingerprint) with just one.
  • fingerprint - the top level file for this function. Calls chemprop_fingerprint.
  • setup - added the command line function for chemprop_fingerprint.

@lgtm-com
Copy link

lgtm-com bot commented Dec 30, 2020

This pull request introduces 5 alerts when merging a2927bf into 1e7d122 - view on LGTM.com

new alerts:

  • 4 for Unused import
  • 1 for Module is imported more than once

Copy link
Contributor

@hesther hesther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well, have tested it also on single molecule predictions and manually checked that the outputted fingerprints are actually the MPNN output.

There are some minor errors in the description of the model_fingerprint function, seems like a copy-paste error, or possible a relic from a previous function.

Seems good to me otherwise!

Copy link
Contributor

@hesther hesther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you!

chemprop/train/make_predictions.py Outdated Show resolved Hide resolved
chemprop/train/molecule_fingerprint.py Outdated Show resolved Hide resolved
chemprop/data/utils.py Outdated Show resolved Hide resolved
@kevingreenman
Copy link
Member

I just realized that this is attempting to add the same functionality as PR#105 from Simon. I think the PR#105 implementation is cleaner, but it doesn't update the README. Sorry for not realizing this earlier @cjmcgill

@hesther
Copy link
Contributor

hesther commented Feb 8, 2021

Kevin, I briefly fell for the same, but Simon actually implemented the last hidden layer of the FNN to be saved, not the MPNN output (which are two different use cases, one outputting the last latent representation before outputting the target values, and the other outputting the convoluted, aggregated atomic fingerprints after message passing). So I think we are good (and should have both options)

@kevingreenman
Copy link
Member

Thanks for pointing that out @hesther! I agree, both of these should be available options.

@cjmcgill
Copy link
Contributor Author

@hesther @kevingreenman @mliu49 following pr #135 I was able to remove a little bit of weird trickiness from this one about how it handles smiles_columns. I think it's all good now and it runs successfully for one mol, two mols, and by command line.

Copy link
Contributor

@hesther hesther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and tested a bit - works fine from my point of view! Only issue that might arise: Once PR 137 is merged, we will need to update the arguments checking/reading/scaler section in chemprop/train/molecule_fingerprint.py according to chemprop/train/make_predictions.py

@cjmcgill
Copy link
Contributor Author

@mliu49 can we merge this?

@mliu49
Copy link
Contributor

mliu49 commented Feb 18, 2021

I just merged #137. Could you update the arguments to molecule_fingerprint.py as suggested by Esther, and then rebase on master? (no need to squash the two commits)

Make compatible with multiple molecules and newer chemprop


Enable fingerprint command line function and adds description to readme


Clean up import line


Removing scratch work comments and fixing function description


Change molecule number assertion to a raised error


Change the assert for a fingerprint from multiple checkpoints to error
@cjmcgill
Copy link
Contributor Author

@hesther thanks for noticing what needed to change following #137.
@mliu49 we should be good to go now

@mliu49
Copy link
Contributor

mliu49 commented Feb 19, 2021

I think this looks good. Somehow I missed the fact that there's significant code overlap with make_predictions. It would be really nice if some overlap could be split off into individual functions to be imported by molecule_fingerprints in the future.

@mliu49 mliu49 merged commit 2abd3b3 into master Feb 19, 2021
@mliu49 mliu49 deleted the output_fingerprints branch February 19, 2021 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants