some molecules fail to generate BDE values in the pre-trained model, but works in the web API #12

LuckyLittleMonster · 2022-08-16T14:55:26Z

Empty DataFrame
Columns: [bond_type, fragment1, fragment2, is_valid_stereo, bde_pred, bdfe_pred, is_valid, molecule, bond_index, bde, bdfe, set]
Index: []

pstjohn · 2022-08-16T18:59:18Z

do you have an example for a molecule this happens for?

den-run-ai · 2022-08-18T15:35:57Z

@pstjohn is it possible to add more logging to Alfabet to find out why empty result is returned?

pstjohn · 2022-08-18T16:12:49Z

Possibly, are you not able to share an example molecule that gives an empty output?

The best way to see the bonds the model is seeing is to just call the fragmentation function directly:

from alfabet.fragment import get_fragments

get_fragments(smiles_string)

LuckyLittleMonster · 2022-08-23T18:09:32Z

I found 3 example molecules.

CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2
CC1=C\C/C=C(\C)CC[C@H]2C(C)(C)C@@HCC[C@]2(C)O
CN1C[C@H]2CN(C)CC@H[C@]2(O)c1ccccc1

mol = "CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2"
model.predict([mol])
1/1 [==============================] - 6s 6s/step
Empty DataFrame
Columns: [bond_type, fragment1, fragment2, is_valid_stereo, bde_pred, bdfe_pred, is_valid, molecule, bond_index, bde, bdfe, set]
Index: []
mol2 = "CC1=C\C/C=C(\C)CC[C@H]2C(C)(C)C@@HCC[C@]2(C)O"
model.predict([mol2])
1/1 [==============================] - 0s 123ms/step
Empty DataFrame
Columns: [bond_type, fragment1, fragment2, is_valid_stereo, bde_pred, bdfe_pred, is_valid, molecule, bond_index, bde, bdfe, set]
Index: []
mol3 = "CN1C[C@H]2CN(C)CC@H[C@]2(O)c1ccccc1"
model.predict([mol3])
1/1 [==============================] - 0s 126ms/step
Empty DataFrame
Columns: [bond_type, fragment1, fragment2, is_valid_stereo, bde_pred, bdfe_pred, is_valid, molecule, bond_index, bde, bdfe, set]
Index: []

pstjohn · 2022-08-23T19:09:07Z

Where did you get those SMILES strings? I'm getting rdkit parse errors:

>>> rdkit.Chem.MolFromSmiles('CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2')
[13:07:52] SMILES Parse Error: syntax error while parsing: CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2
[13:07:52] SMILES Parse Error: Failed parsing SMILES 'CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2' for input: 'CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CCC@@(/C=C/1)O2'

https://bde.ml.nrel.gov/result?name=%27CC%28%3DO%29OCC1%3DC%5CCC%2FC%28C%29%3DC%2FCC%5BC%40%40%5D2%28C%29CCC%40%40%28%2FC%3DC%2F1%29O2

I'm not sure why alfabet isn't raising those errors directly, that's probably something I can fix

LuckyLittleMonster · 2022-08-24T01:31:15Z

The SMILES strings are in ChEMBL database.
https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_31_chemreps.txt.gz

It seems that there are some errors while I copy the smiles string to github webpage. Because there "@" in the strings.
Empty smiles strings:
CC(=O)OCC1=C\CC/C(C)=C/CC[C@@]2(C)CC[C@@](C(C)C)(/C=C/1)O2
CC1=C\C/C=C(\C)CC[C@H]2C(C)(C)[C@@H](\C=C/1)CC[C@]2(C)O
CN1C[C@H]2CN(C)C[C@H](C1)[C@]2(O)c1ccccc1

The webpage (e.g. https://bde.ml.nrel.gov/result?name=CN1C%5BC%40H%5D2CN%28C%29C%5BC%40H%5D%28C1%29%5BC%40%40%5D2%28O%29c1ccccc1) will give the result but the python will return empty dataframe.

Fixes #12, likely introduced in 0.4.0

pstjohn · 2022-08-24T17:20:49Z

Thanks for this, I see the issue now. I likely introduced this is 0.4.0; I have a fix and test (i think) in #13

pstjohn · 2022-08-24T17:27:55Z

Releasing a new patch version 0.4.1 to handle this -- let me know if you still have issues

pstjohn mentioned this issue Aug 16, 2022

verbose is not working #11

Closed

pstjohn added a commit that referenced this issue Aug 24, 2022

handle non-canonical smiles inputs

a951545

Fixes #12, likely introduced in 0.4.0

pstjohn mentioned this issue Aug 24, 2022

handle non-canonical smiles inputs #13

Merged

pstjohn closed this as completed in #13 Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some molecules fail to generate BDE values in the pre-trained model, but works in the web API #12

some molecules fail to generate BDE values in the pre-trained model, but works in the web API #12

LuckyLittleMonster commented Aug 16, 2022

pstjohn commented Aug 16, 2022

den-run-ai commented Aug 18, 2022

pstjohn commented Aug 18, 2022 •

edited

Loading

LuckyLittleMonster commented Aug 23, 2022

pstjohn commented Aug 23, 2022

LuckyLittleMonster commented Aug 24, 2022

pstjohn commented Aug 24, 2022

pstjohn commented Aug 24, 2022

some molecules fail to generate BDE values in the pre-trained model, but works in the web API #12

some molecules fail to generate BDE values in the pre-trained model, but works in the web API #12

Comments

LuckyLittleMonster commented Aug 16, 2022

pstjohn commented Aug 16, 2022

den-run-ai commented Aug 18, 2022

pstjohn commented Aug 18, 2022 • edited Loading

LuckyLittleMonster commented Aug 23, 2022

pstjohn commented Aug 23, 2022

LuckyLittleMonster commented Aug 24, 2022

pstjohn commented Aug 24, 2022

pstjohn commented Aug 24, 2022

pstjohn commented Aug 18, 2022 •

edited

Loading