Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement sample GAT model for working PyG with DeepChem #2109

Merged
merged 16 commits into from Sep 2, 2020

Conversation

nissy-dev
Copy link
Member

@nissy-dev nissy-dev commented Aug 21, 2020

This PR is a part of #1942

What I did

  • Create the new molecule graph convolution featurizer using GraphData.
    • The new featurizer return a basic atom features and bond features like GCN, MPNN.
      • The present graph conv features have some values of nodes and edges which the paper like WeaveNet, MEGNet, MPNN didn't mention
    • Create utils/graph_conv_utils.py and the featurizer is more readable and customizable for users
  • Create a sample GAT model
    • This model is an example of working DeepChem with PyTorch Geometric

TODO

  • Add more docstrings
  • Add more tests (mainly, all functions in utils/graph_conv_utils.py)

I will make another PR.

  • Debug the problem that my GAT model show the too low score of the overfit tests compared with present TF GraphConvModel
  • Add mode args for GATModel like Add mode args for CGCNNModel #2123

@nissy-dev nissy-dev changed the title [WIP] Implement sample GAT models for working PyG with DeepChem [WIP] Implement sample GAT model for working PyG with DeepChem Aug 21, 2020
@nissy-dev nissy-dev changed the title [WIP] Implement sample GAT model for working PyG with DeepChem Implement sample GAT model for working PyG with DeepChem Aug 24, 2020
@nissy-dev
Copy link
Member Author

This PR is ready to review!

Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really neat! Well written, well documented code :)

I've done a first pass with a couple of comments

get_bond_stereo_one_hot


def constrcut_atom_feature(atom: RDKitAtom, h_bond_infos: List[Tuple[int, str]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Should be construct_atom_feature.



def constrcut_atom_feature(atom: RDKitAtom, h_bond_infos: List[Tuple[int, str]],
sssr: List[Sequence]) -> List[float]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps make this return an numpy array instead of a list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reasons? I think this function are basically not used in other place. (Currently, this I added the underscore to this function name like _ construct_atom_feature )

class MolGraphConvFeaturizer(MolecularFeaturizer):
"""This class is a featurizer of gerneral graph convolution networks for molecules.

The default node(atom) and edge(bond) representations are based on WeaveNet paper.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a hyperlink to the weave paper in the references section below

The default node(atom) and edge(bond) representations are based on WeaveNet paper.
If you want to use your own representations, you could use this class as a guide
to define your original Featurizer. In many cases, it's enough to modify return values of
`constrcut_atom_feature` or `constrcut_bond_feature`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Should be construct for both of these

@coveralls
Copy link

coveralls commented Aug 25, 2020

Coverage Status

Coverage increased (+0.4%) to 77.861% when pulling b7b56fa on nd-02110114:gat-pyg-2 into bc05e7c on deepchem:master.

@rbharath
Copy link
Member

One more quick ask, could you update the model cheatsheet to add GATModel and CGCNNModel? https://deepchem.readthedocs.io/en/latest/models.html#model-cheatsheet

@nissy-dev
Copy link
Member Author

One more quick ask, could you update the model cheatsheet to add GATModel and CGCNNModel? https://deepchem.readthedocs.io/en/latest/models.html#model-cheatsheet

I will add a lot of documentations related to TorchModel, CGCNN, GAT in #2124

- Chirality: A one-hot vector of the chirality, "R" or "S".
- Formal charge: Integer electronic charge.
- Partial charge: Calculated partial charge.
- Ring sizes: A one-hot vector of the number of rings (3-8) that include this atom.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean the size of the ring? Not many atoms belong to three rings, much less eight!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I fixed.

"""
Parameters
----------
add_self_loop: bool, default False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add_self_edges would be clearer? This isn't really about loops, so the name could be confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agreed! I fixed.

)

# construct edge (bond) information
src, dist, bond_features = [], [], []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want the variable to be called dest (short for destination?), not dist which sounds like it's short for "distance"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is short for destination. I fixed!

Comment on lines 131 to 132
>> dataset_config = {"reload": False, "featurizer": featurizer, "transformers": []}
>> tasks, datasets, transformers = dc.molnet.load_tox21(**dataset_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to just write this as

tasks, datasets, transformers = dc.molnet.load_tox21(reload=False, featurizer=featurizer, transformers=[])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed!

in_node_dim: int = 38,
hidden_node_dim: int = 64,
heads: int = 4,
dropout_rate: float = 0.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with other models, this should be just dropout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed!

heads: int = 4,
dropout_rate: float = 0.0,
num_conv: int = 3,
predicator_hidden_feats: int = 32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was that supposed to be "predictor"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I fixed!

@@ -0,0 +1,512 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file doesn't really have anything to do with convolutions. How about calling it molecule_feature_utils.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed!

args_info += arg_name + '=' + str(self.__dict__[arg_name]) + ', '
return self.__class__.__name__ + '[' + args_info[:-2] + ']'

def __str__(self) -> str:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is needed to resolve the Windows CI. This is referred #1829. repr function shows all arguments when instantiating a class. str function shows just updated arguments when instantiating a class.

@nissy-dev
Copy link
Member Author

This PR is ready to a second review

Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! I have a couple minor comments. Once those are merged, I think this is good to merge in

deepchem/feat/base_classes.py Show resolved Hide resolved
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Feel free to merge in whenever ready :)

@nissy-dev
Copy link
Member Author

Thanks! I merge in

@nissy-dev nissy-dev merged commit 3d257a0 into deepchem:master Sep 2, 2020
@nissy-dev nissy-dev deleted the gat-pyg-2 branch September 3, 2020 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants