adding grover pretrain model as ModularTorchModel #3272

arunppsg · 2023-03-10T18:19:01Z

Pull Request Template

Description

The PR adds GroverPretrainer to train embeddings. The main contribution is the self-supervised pretraining task style pretraining for training embeddings.

Type of change

Please check the option that is related to your PR.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
- In this case, we recommend to discuss your modification on GitHub issues before creating the PR
Documentations (modification for documents)

Checklist

rbharath

Did a first review pass. This is a good start! Some room for cleanup here. Let's discuss questions offline since there are some design questions open here

deepchem/models/torch_models/grover.py

rbharath

I think this PR is still in draft stage and not ready for full review so I just did a quick pass and pointed out a couple things that jumped out at me. Let me know once this is ready for full review

deepchem/models/torch_models/grover.py

rbharath

Capturing the offline discussion, we will want to split this into multiple PRs:

Fixing GraphData, BatchGraphData to support Grover (and DMPNN)
Adding Grover's loss to DeepChem's losses
Adding Grover as a modular torch model.

We should also discuss the proposed save/restore fix offline and see if we want to make that a separate PR

tonydavis629 · 2023-03-30T15:19:16Z

What is the issue with GraphData not supporting DMPNN and Grover?

rbharath

I think this PR is waiting on the earlier graph data handling PR so not doing a thorugh review, but a comment that we still need save/reload/overfit tests)

rbharath · 2023-03-30T17:56:45Z

deepchem/utils/test/test_grover.py

@@ -0,0 +1,34 @@
+import pytest
+import deepchem as dc
+from deepchem.feat.graph_data import BatchGraphData


As a reminder, we need to add the save/reload/overfit tests before we can merge this PR in

rbharath

There's a lot of code here that I think will go away once we rebase. I'll hold off on doing a full review till we merge in the previous PR and rebase since it'll get messy otherwise

arunppsg · 2023-03-31T10:28:11Z

What is the issue with GraphData not supporting DMPNN and Grover?

It's not that it does not support - I didn't know how to make DMPNN and Grover graph batching work with GraphData. To add a bit of context, DMPNN and Grover models use same input representations.

rbharath

The core implementation is looking solid, but I have some requests for improved documentation and additional unit tests below.

deepchem/models/torch_models/grover.py

rbharath · 2023-04-02T02:57:44Z

deepchem/models/torch_models/grover.py

+            return self._prepare_batch_for_finetuning(data)
+
+    def _prepare_batch_for_pretraining(self, batch: Tuple[Any, Any, Any]):
+        """


Can you add a description to the docstring of the preparation required?

Reminder here as well

added description

deepchem/models/torch_models/grover.py

rbharath · 2023-04-02T02:58:22Z

deepchem/models/torch_models/grover.py

-
-    This layer is a simple wrapper over GroverTransEncoder layer for retrieving the embeddings from the GroverTransEncoder corresponding to the `embedding_output_type` chosen by the user.
+class Grover(ModularTorchModel):
+    """Grove model


Can you add this model to the docs?

Quick reminder to add to the docs

added to docs

deepchem/models/torch_models/tests/test_grover.py

rbharath

Will re-review this PR later after we merge the earlier PRs and rebase

rbharath

This is nearly ready but there are still a number of missing docs. Can you do a pass and make sure all comments are addressed?

rbharath · 2023-04-05T22:21:48Z

deepchem/models/torch_models/grover.py

-
-    This layer is a simple wrapper over GroverTransEncoder layer for retrieving the embeddings from the GroverTransEncoder corresponding to the `embedding_output_type` chosen by the user.
+class Grover(ModularTorchModel):
+    """Grove model


Quick reminder to add to the docs

deepchem/models/torch_models/grover.py

rbharath · 2023-04-05T22:23:21Z

deepchem/models/torch_models/grover.py

+            return self._prepare_batch_for_finetuning(data)
+
+    def _prepare_batch_for_pretraining(self, batch: Tuple[Any, Any, Any]):
+        """


Reminder here as well

deepchem/models/torch_models/grover.py

deepchem/models/torch_models/tests/test_grover_layers.py

minor fixes in finetuning

rbharath

LGTM, please go ahead and merge in once CI is clear.

Nice working getting this to the finish line! It's a big one

arunppsg · 2023-04-07T14:19:26Z

The ci failures are not related to these changes. Going ahead and merging this in.

rbharath reviewed Mar 14, 2023

View reviewed changes

arunppsg mentioned this pull request Mar 14, 2023

Adding more grover layers #3277

Merged

15 tasks

arunppsg force-pushed the grover_pretrain branch from 636a16a to 71b11f2 Compare March 16, 2023 14:52

rbharath reviewed Mar 17, 2023

View reviewed changes

arunppsg force-pushed the grover_pretrain branch from 71b11f2 to 61d549b Compare March 21, 2023 06:34

arunppsg changed the title ~~added grover pretrainer as a ModularTorchModel~~ added grover model as ModularTorchModel Mar 21, 2023

rbharath reviewed Mar 21, 2023

View reviewed changes

arunppsg force-pushed the grover_pretrain branch from 73234b3 to 694bf23 Compare March 30, 2023 14:59

rbharath reviewed Mar 30, 2023

View reviewed changes

arunppsg changed the title ~~added grover model as ModularTorchModel~~ adding grover pretrain model as ModularTorchModel Mar 30, 2023

arunppsg requested a review from rbharath March 30, 2023 18:31

rbharath reviewed Mar 30, 2023

View reviewed changes

arunppsg force-pushed the grover_pretrain branch 2 times, most recently from 9f6b27d to 4ce9c43 Compare April 1, 2023 03:48

rbharath reviewed Apr 2, 2023

View reviewed changes

arunppsg mentioned this pull request Apr 4, 2023

adding grover finetune model #3335

Merged

15 tasks

arunppsg force-pushed the grover_pretrain branch from 4ce9c43 to 4c876d2 Compare April 4, 2023 14:32

rbharath reviewed Apr 4, 2023

View reviewed changes

arunppsg force-pushed the grover_pretrain branch 2 times, most recently from 5e407b7 to d72e192 Compare April 5, 2023 18:30

rbharath reviewed Apr 5, 2023

View reviewed changes

gusty1g added 7 commits April 6, 2023 14:22

add max size feature to grover vocabulary

6b0118b

grover model initial commit

ab266b5

losses minor fix

c152c7e

fixmes in modular.py

4efd316

grover layer tests

f773ac6

moved fixture from test_grover_layers to conftest

1a5a25d

test for atom and bond random masking

95904f0

gusty1g added 4 commits April 6, 2023 15:45

docstrings and improvements to atom/bond masking

5640c10

test overfit for grover pretraining

77fe698

added support for classification in grover-finetune

97214c0

minor fixes in finetuning

added overfit test for grover finetune model

fc848bd

arunppsg force-pushed the grover_pretrain branch from d72e192 to 64be63b Compare April 6, 2023 10:16

gusty1g added 6 commits April 6, 2023 19:27

save restore test for grover model

ac339ca

custom restore

7b74620

adding example and docstring

e435b8a

grover model to docs.rst

bc6c14d

doctest fixes

ff283e5

minor fix on model loading

9994cca

arunppsg force-pushed the grover_pretrain branch from 64be63b to 9994cca Compare April 6, 2023 13:59

rbharath approved these changes Apr 6, 2023

View reviewed changes

minor improvements to grover testing

2f8bfe4

arunppsg merged commit 81d3832 into deepchem:master Apr 7, 2023

arunppsg mentioned this pull request Apr 9, 2023

add max size feature to grover vocabulary #3322

Closed

15 tasks

arunppsg deleted the grover_pretrain branch April 11, 2023 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding grover pretrain model as ModularTorchModel #3272

adding grover pretrain model as ModularTorchModel #3272

arunppsg commented Mar 10, 2023

rbharath left a comment

rbharath left a comment

rbharath left a comment

tonydavis629 commented Mar 30, 2023

rbharath left a comment

rbharath Mar 30, 2023

rbharath left a comment

arunppsg commented Mar 31, 2023

rbharath left a comment

rbharath Apr 2, 2023

rbharath Apr 5, 2023

arunppsg Apr 6, 2023

rbharath Apr 2, 2023

rbharath Apr 5, 2023

arunppsg Apr 6, 2023

rbharath left a comment

rbharath left a comment

rbharath Apr 5, 2023

rbharath Apr 5, 2023

rbharath left a comment

arunppsg commented Apr 7, 2023

adding grover pretrain model as ModularTorchModel #3272

adding grover pretrain model as ModularTorchModel #3272

Conversation

arunppsg commented Mar 10, 2023

Pull Request Template

Description

Type of change

Checklist

rbharath left a comment

Choose a reason for hiding this comment

rbharath left a comment

Choose a reason for hiding this comment

rbharath left a comment

Choose a reason for hiding this comment

tonydavis629 commented Mar 30, 2023

rbharath left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbharath left a comment

Choose a reason for hiding this comment

arunppsg commented Mar 31, 2023

rbharath left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbharath left a comment

Choose a reason for hiding this comment

rbharath left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbharath left a comment

Choose a reason for hiding this comment

arunppsg commented Apr 7, 2023