Molecule featurizer and molecule graph#484
Conversation
|
/build-ci |
1 similar comment
|
/build-ci |
trvachov
left a comment
There was a problem hiding this comment.
Looks fine to me, except I think the tests (especially the golden values) need a little more description. I worry we'll end up in a situation where the tests fail and some other engineer will be digging through the tests trying to understand what the golden values mean. For example, one of them appears to be an embedding? (float array)...maybe it's ok if this one changes a little bit? Others maybe not so much?
I have added descriptions for the golden values tests. One golden value test had fixed data types: int and float. I separated them out and checked for exact match for int vals and torch.allclose match for float vals @trvachov could I have an approval if everything looks good to you? |
DejunL
left a comment
There was a problem hiding this comment.
Have some comments related to the design of the MoleculeGraph class and minor ones on the tests. As Guoqing pointed out during the sync, couldn't the graph data be of just Data? so our method can enjoy a flat design, e.g., via function calls rather than relying on graph classes?
trvachov
left a comment
There was a problem hiding this comment.
Thanks for the prompt changes.
Agreed. I have removed the |
DejunL
left a comment
There was a problem hiding this comment.
LGTM. Have some non-blocking comments about adding docstring and example
| ) | ||
|
|
||
|
|
||
| class RDkit2DDescriptorFeaturizer(BaseMoleculeFeaturizer): |
There was a problem hiding this comment.
Would recommend adding docstring on the get_molecule_features() with more details about what the requirements are for the input and what to expect in the output. Would be great if an example is give in the class's docstring
There was a problem hiding this comment.
Is there a useful link to the list of RDKit features we're computing here? Or at least the Descriptors RDKit docstring. If not, at least a hint for the user to say, print(Descriptors.DescList) for the list
|
/build-ci |
| rf_feats = rf(test_mol2) | ||
|
|
||
| # Reference is a list of tuples | ||
| # Each tuple contains the sizes of the rings the bond is present it |
| ) | ||
|
|
||
|
|
||
| class RDkit2DDescriptorFeaturizer(BaseMoleculeFeaturizer): |
There was a problem hiding this comment.
Is there a useful link to the list of RDKit features we're computing here? Or at least the Descriptors RDKit docstring. If not, at least a hint for the user to say, print(Descriptors.DescList) for the list
Implements RDkit 2D descriptor molecule featurizer. Implements MoleculeGraph, a data object for storing and processing molecular graphs.
RDkit2DDescriptorFeaturizerandMoleculeGraphMoleculeGraphdata object is used to store all molecular graph related attibutes.from bionemo.geometric.molecule_featurizers import RDkit2DDescriptorFeaturizerUsage
Testing
Tests for these changes can be run via: