How to correctly load PaRoutes models? #1

AustinT · 2022-03-15T16:46:32Z

Your benchmark is very interesting and I would like to do some experiments with it, but I haven't found instructions on how to use your pre-trained models.
Would you mind telling me whether the following code correctly loads and uses your models?

import numpy as np
import pandas as pd
import h5py
from tensorflow import keras
from rdkit.Chem import AllChem
from rdchiral.main import rdchiralRunText

def get_fingerprint(smiles: str) -> np.ndarray:
    mol = AllChem.MolFromSmiles(smiles)
    assert mol is not None
    fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048)  # QUESTION: is this the right fingerprint?
    return np.array(fp, dtype=float)

# Load templates
df_templates = pd.read_hdf("./data/uspto_rxn_n5_unique_templates.hdf5", key="table")

# Load model, defining custom metrics because without these it gave an error...
model = keras.models.load_model(
    "./data/uspto_rxn_n5_keras_model.hdf5", 
    custom_objects={
        "top10_acc": keras.metrics.TopKCategoricalAccuracy(k=10, name="top10_acc"),
        "top50_acc": keras.metrics.TopKCategoricalAccuracy(k=10, name="top50_acc"),
    }
)

# Example use case: run the best reaction for the first 2 targets
test_smiles = ["O=C(O)COCCOCCOCCOCCOCCOCCOCC(F)(F)F", "COc1cc(N)c(Cl)cc1C(=O)NCCCC1CN(Cc2ccccc2)CCO1"]
x = np.stack([get_fingerprint(s) for s in test_smiles])
template_probs = model(x).numpy()
most_likely_reactions = template_probs.argmax(axis=1)

for i, sm in enumerate(test_smiles):
    reactants = rdchiralRunText(df_templates["retro_template"].values[most_likely_reactions[i]], sm)
    print(f"{i}: {reactants} >> {sm}")

This code runs and produces the following output (in particular, the second reaction fails). Is this the output that you would expect?

0: ['CCOC(=O)CBr.OCCOCCOCCOCCOCCOCCOCC(F)(F)F'] >> O=C(O)COCCOCCOCCOCCOCCOCCOCC(F)(F)F
1: [] >> COc1cc(N)c(Cl)cc1C(=O)NCCCC1CN(Cc2ccccc2)CCO1

Thank you in advance for answering my question. Great manuscript and keep up the good open source work! 💯

The text was updated successfully, but these errors were encountered:

SGenheden · 2022-03-16T07:48:17Z

Hello

And thanks for trying out PaRoutes and coming with feedback.

We chose not to provide extensive documentation at this time because we cannot foresee all the possible use-cases that might come up. This is an excellent question from you and I will make some useful notes on this.

I believe you have managed to produce code that reproduce my procedure to do this, which is to use them together with the aizynthfinder package. I will explain this procedure but first I would like to emphasize that yes the predicted reactants are what you would obtain from the first predicted template. However, we typically look at top-20 or maybe even top-50 of the templates just so avoid situations like your second example where the first predicted template is not applicable. The second one would produce these reactants: COc1cc(N)c(Cl)cc1C(=O)Cl.NCCCC1CN(Cc2ccccc2)CCO1.

My solution as mentioned is to use the aizynthfinder package and the Python interface to the one-step model, documented here: https://molecularai.github.io/aizynthfinder/python_interface.html

This is my code (after having installed the package):

from aizynthfinder.aizynthfinder import AiZynthExpander
expander = AiZynthExpander(
    configdict={
        "policy": {
            "files": {
                "paroutes": [
                    "./data/uspto_rxn_n5_keras_model.hdf5", 
                    "./data/uspto_rxn_n5_unique_templates.hdf5"
                ]
            }
        }
    }
)
expander.expansion_policy.select("paroutes")
test_smiles = ["O=C(O)COCCOCCOCCOCCOCCOCCOCC(F)(F)F", "COc1cc(N)c(Cl)cc1C(=O)NCCCC1CN(Cc2ccccc2)CCO1"]
for smi in test_smiles:
    predictions = expander.do_expansion(smi)
    print(predictions[0][0].reaction_smiles())

The output of the do_expansion method is a list of tuples of reactions. Each tuple represents a unique set of precursors that could have arisen from different templates. So what I am printing in the example is the first predicted set of precursors from the first template.

Hopefully this helps and you will be able to use the trained model from PaRoutes.

SGenheden self-assigned this Mar 16, 2022

SGenheden added the documentation Improvements or additions to documentation label Mar 16, 2022

AustinT closed this as completed Mar 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to correctly load PaRoutes models? #1

How to correctly load PaRoutes models? #1

AustinT commented Mar 15, 2022

SGenheden commented Mar 16, 2022

How to correctly load PaRoutes models? #1

How to correctly load PaRoutes models? #1

Comments

AustinT commented Mar 15, 2022

SGenheden commented Mar 16, 2022