Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output options for make_predictions #235

Merged
merged 2 commits into from
Jan 23, 2022
Merged

Output options for make_predictions #235

merged 2 commits into from
Jan 23, 2022

Conversation

cjmcgill
Copy link
Contributor

User noted that for the make_predictions function, invalid SMILES can be included in the arguments but the returned values will exclude the invalid entries without any notation of which ones are invalid. The default behavior before for the values saved to file has always been to include the entries with invalid SMILES with prediction values of "Invalid SMILES". Following the refactoring of make_predictions in #200, it's more accessible than before to use this function directly and so it's now more obvious that the saved behavior and the function return behavior do not match.

I've added two new options to the make_predictions function.

  • return_invalid_smiles. This option indicates whether to include predictions for invalid datapoints, with the value "Invalid SMILES" in the function return. This makes it consistent with the saved value behavior. This option has been set default to True. The previous behavior before the option would have been False.
  • return_index_dict. This option indicates whether to return a dictionary keyed with the index of the initial dataset or smiles list for the predictions rather than a list of list of values. This option defaults to False. If return_invalid_smiles is set to False, the invalid index values will be missing from the returned dictionary.

Copy link

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I added some minor comments

full_preds = []
for full_index in range(len(full_data)):
valid_index = full_to_valid_indices.get(full_index, None)
preds = avg_preds[valid_index] if valid_index is not None else ['Invalid SMILES'] * num_tasks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to also return the invalid SMILES themselves, or is it not really beneficial?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that it's beneficial. But if people do want it, the right place is in the load_data function where these are getting split up in the first place. I've added it there as one of the outputs (unused in the full workflow). Is that a reasonable return structure do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alongd I've actually rethought this and I don't want to add another output to load_data where it would break existing code for people. The right place would be in a separate data/utils function. I've added new functions for that from both files and lists.

@@ -259,6 +272,8 @@ def make_predictions(args: PredictArgs, smiles: List[List[str]] = None,
loading data and a model and making predictions.
:param smiles: List of list of SMILES to make predictions on.
:param model_objects: Tuple of output of load_model function which can be called separately.
:param return_invalid_smiles: Whether to return None values for invalid SMILES, otherwise will skip them in returned predictions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make the docstring description of return_invalid_smiles identical in both predict_and_save() and make_predictions(), or is it different on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unintentional, I will make them the same.

@@ -122,6 +123,7 @@ def predict_and_save(args: PredictArgs, train_args: TrainArgs, test_data: Molecu
:param full_to_valid_indices: A dictionary dictionary mapping full to valid indices.
:param models: A list or generator object of :class:`~chemprop.models.MoleculeModel`\ s.
:param scalers: A list or generator object of :class:`~chemprop.features.scaler.StandardScaler` objects.
:param return_invalid_smiles: Whether to include invalid SMILES with a value None in the returned predictions.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably misunderstood, but it looks like the description says that a None value is returned, when in practice an 'Invalid SMILES' string is returned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right here. This is a mistake in the docstring and I will fix.

@alongd
Copy link

alongd commented Jan 22, 2022

Thanks for the modifications, @cjmcgill!
I agree with your view that invalid SMILES shouldn't be returned by default.
Feel free to squash the "Fix doc strings" commit.
I'm new to this repo, what are the guidelines re adding tests for new functions? Or is it complex to test these functions since an input is required?
Also, a very minor stylistic comment, perhaps put all PEP8 modifications made to make_predictions.py in a separate minor commit?

@cjmcgill
Copy link
Contributor Author

@alongd I refactored the commits so the spacing changes were grouped in a separate commit.

Also you are right on about testing. Thus far, we have not had unit tests in this repo, only relying on only integration tests. I am adding unit tests in the in-progress #232, and tests on these functions will be included in it.

Copy link

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants