Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some molecules fail to generate BDE values in the pre-trained model, but works in the web API #12

Closed
LuckyLittleMonster opened this issue Aug 16, 2022 · 8 comments · Fixed by #13

Comments

@LuckyLittleMonster
Copy link

Empty DataFrame
Columns: [bond_type, fragment1, fragment2, is_valid_stereo, bde_pred, bdfe_pred, is_valid, molecule, bond_index, bde, bdfe, set]
Index: []

@pstjohn
Copy link
Collaborator

pstjohn commented Aug 16, 2022

do you have an example for a molecule this happens for?

@den-run-ai
Copy link

@pstjohn is it possible to add more logging to Alfabet to find out why empty result is returned?

@pstjohn
Copy link
Collaborator

pstjohn commented Aug 18, 2022

Possibly, are you not able to share an example molecule that gives an empty output?

The best way to see the bonds the model is seeing is to just call the fragmentation function directly:

from alfabet.fragment import get_fragments

get_fragments(smiles_string)

@LuckyLittleMonster
Copy link
Author

I found 3 example molecules.

CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2
CC1=C\C/C=C(\C)CC[C@H]2C(C)(C)C@@HCC[C@]2(C)O
CN1C[C@H]2CN(C)CC@H[C@]2(O)c1ccccc1


mol = "CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2"
model.predict([mol])
1/1 [==============================] - 6s 6s/step
Empty DataFrame
Columns: [bond_type, fragment1, fragment2, is_valid_stereo, bde_pred, bdfe_pred, is_valid, molecule, bond_index, bde, bdfe, set]
Index: []
mol2 = "CC1=C\C/C=C(\C)CC[C@H]2C(C)(C)C@@HCC[C@]2(C)O"
model.predict([mol2])
1/1 [==============================] - 0s 123ms/step
Empty DataFrame
Columns: [bond_type, fragment1, fragment2, is_valid_stereo, bde_pred, bdfe_pred, is_valid, molecule, bond_index, bde, bdfe, set]
Index: []
mol3 = "CN1C[C@H]2CN(C)CC@H[C@]2(O)c1ccccc1"
model.predict([mol3])
1/1 [==============================] - 0s 126ms/step
Empty DataFrame
Columns: [bond_type, fragment1, fragment2, is_valid_stereo, bde_pred, bdfe_pred, is_valid, molecule, bond_index, bde, bdfe, set]
Index: []

@pstjohn
Copy link
Collaborator

pstjohn commented Aug 23, 2022

Where did you get those SMILES strings? I'm getting rdkit parse errors:

>>> rdkit.Chem.MolFromSmiles('CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2')
[13:07:52] SMILES Parse Error: syntax error while parsing: CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2
[13:07:52] SMILES Parse Error: Failed parsing SMILES 'CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2' for input: 'CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2'

https://bde.ml.nrel.gov/result?name=%27CC%28%3DO%29OCC1%3DC%5CCC%2FC%28C%29%3DC%2FCC%5BC%40%40%5D2%28C%29CCC%40%40%28%2FC%3DC%2F1%29O2

I'm not sure why alfabet isn't raising those errors directly, that's probably something I can fix

@LuckyLittleMonster
Copy link
Author

The SMILES strings are in ChEMBL database.
https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_31_chemreps.txt.gz

It seems that there are some errors while I copy the smiles string to github webpage. Because there "@" in the strings.
Empty smiles strings:
CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CC[C@@](C(C)C)(/C=C/1)O2
CC1=C\C/C=C(\C)CC[C@H]2C(C)(C)[C@@H](\C=C/1)CC[C@]2(C)O
CN1C[C@H]2CN(C)C[C@H](C1)[C@]2(O)c1ccccc1

The webpage (e.g. https://bde.ml.nrel.gov/result?name=CN1C%5BC%40H%5D2CN%28C%29C%5BC%40H%5D%28C1%29%5BC%40%40%5D2%28O%29c1ccccc1) will give the result but the python will return empty dataframe.

pstjohn added a commit that referenced this issue Aug 24, 2022
Fixes #12, likely introduced in 0.4.0
@pstjohn
Copy link
Collaborator

pstjohn commented Aug 24, 2022

Thanks for this, I see the issue now. I likely introduced this is 0.4.0; I have a fix and test (i think) in #13

@pstjohn
Copy link
Collaborator

pstjohn commented Aug 24, 2022

Releasing a new patch version 0.4.1 to handle this -- let me know if you still have issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants