Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding grover pretrain model as ModularTorchModel #3272

Merged
merged 18 commits into from
Apr 7, 2023

Conversation

arunppsg
Copy link
Contributor

Pull Request Template

Description

The PR adds GroverPretrainer to train embeddings. The main contribution is the self-supervised pretraining task style pretraining for training embeddings.

Type of change

Please check the option that is related to your PR.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • In this case, we recommend to discuss your modification on GitHub issues before creating the PR
  • Documentations (modification for documents)

Checklist

  • My code follows the style guidelines of this project
    • Run yapf -i <modified file> and check no errors (yapf version must be 0.32.0)
    • Run mypy -p deepchem and check no errors
    • Run flake8 <modified file> --count and check no errors
    • Run python -m doctest <modified file> and check no errors
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New unit tests pass locally with my changes
  • I have checked my code and corrected any misspellings

Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first review pass. This is a good start! Some room for cleanup here. Let's discuss questions offline since there are some design questions open here

deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is still in draft stage and not ready for full review so I just did a quick pass and pointed out a couple things that jumped out at me. Let me know once this is ready for full review

deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
@arunppsg arunppsg changed the title added grover pretrainer as a ModularTorchModel added grover model as ModularTorchModel Mar 21, 2023
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capturing the offline discussion, we will want to split this into multiple PRs:

  1. Fixing GraphData, BatchGraphData to support Grover (and DMPNN)
  2. Adding Grover's loss to DeepChem's losses
  3. Adding Grover as a modular torch model.

We should also discuss the proposed save/restore fix offline and see if we want to make that a separate PR

@tonydavis629
Copy link
Collaborator

What is the issue with GraphData not supporting DMPNN and Grover?

Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is waiting on the earlier graph data handling PR so not doing a thorugh review, but a comment that we still need save/reload/overfit tests)

@@ -0,0 +1,34 @@
import pytest
import deepchem as dc
from deepchem.feat.graph_data import BatchGraphData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a reminder, we need to add the save/reload/overfit tests before we can merge this PR in

@arunppsg arunppsg changed the title added grover model as ModularTorchModel adding grover pretrain model as ModularTorchModel Mar 30, 2023
@arunppsg arunppsg requested a review from rbharath March 30, 2023 18:31
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code here that I think will go away once we rebase. I'll hold off on doing a full review till we merge in the previous PR and rebase since it'll get messy otherwise

@arunppsg
Copy link
Contributor Author

What is the issue with GraphData not supporting DMPNN and Grover?

It's not that it does not support - I didn't know how to make DMPNN and Grover graph batching work with GraphData. To add a bit of context, DMPNN and Grover models use same input representations.

@arunppsg arunppsg force-pushed the grover_pretrain branch 2 times, most recently from 9f6b27d to 4ce9c43 Compare April 1, 2023 03:48
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core implementation is looking solid, but I have some requests for improved documentation and additional unit tests below.

deepchem/models/torch_models/grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/grover.py Show resolved Hide resolved
deepchem/models/torch_models/grover.py Show resolved Hide resolved
deepchem/models/torch_models/grover.py Show resolved Hide resolved
return self._prepare_batch_for_finetuning(data)

def _prepare_batch_for_pretraining(self, batch: Tuple[Any, Any, Any]):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a description to the docstring of the preparation required?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added description

deepchem/models/torch_models/grover.py Show resolved Hide resolved

This layer is a simple wrapper over GroverTransEncoder layer for retrieving the embeddings from the GroverTransEncoder corresponding to the `embedding_output_type` chosen by the user.
class Grover(ModularTorchModel):
"""Grove model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this model to the docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick reminder to add to the docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to docs

deepchem/models/torch_models/tests/test_grover.py Outdated Show resolved Hide resolved
deepchem/models/torch_models/tests/test_grover.py Outdated Show resolved Hide resolved
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will re-review this PR later after we merge the earlier PRs and rebase

@arunppsg arunppsg force-pushed the grover_pretrain branch 2 times, most recently from 5e407b7 to d72e192 Compare April 5, 2023 18:30
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nearly ready but there are still a number of missing docs. Can you do a pass and make sure all comments are addressed?


This layer is a simple wrapper over GroverTransEncoder layer for retrieving the embeddings from the GroverTransEncoder corresponding to the `embedding_output_type` chosen by the user.
class Grover(ModularTorchModel):
"""Grove model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick reminder to add to the docs

deepchem/models/torch_models/grover.py Show resolved Hide resolved
return self._prepare_batch_for_finetuning(data)

def _prepare_batch_for_pretraining(self, batch: Tuple[Any, Any, Any]):
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder here as well

deepchem/models/torch_models/grover.py Show resolved Hide resolved
deepchem/models/torch_models/grover.py Show resolved Hide resolved
deepchem/models/torch_models/grover.py Show resolved Hide resolved
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please go ahead and merge in once CI is clear.

Nice working getting this to the finish line! It's a big one

@arunppsg
Copy link
Contributor Author

arunppsg commented Apr 7, 2023

The ci failures are not related to these changes. Going ahead and merging this in.

@arunppsg arunppsg merged commit 81d3832 into deepchem:master Apr 7, 2023
@arunppsg arunppsg deleted the grover_pretrain branch April 11, 2023 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants