### Before Starting the Demo: Adjust the Path
To execute the demo successfully, update the placeholders "/path/to" within the cells and scripts and ensure that they reflect the correct paths on your system. Adjust the paths accordingly to match the directory structure on your machine (see `README.md`).

### Single-step Retrosynthesis Evaluation (Table 1)
READRetro adopts the ensemble of Retroformer and Graph2SMILES as the single-step retrosyntheis model.<br>
To prepare the trained single-step retrosynthesis model, choose one of the following:

* We provide the trained models through [Zenodo](https://zenodo.org/records/10495132).
* You can use your own models trained using the official codes (https://github.com/yuewan2/Retroformer and https://github.com/coleygroup/Graph2SMILES). You have to download the official codes and set conda environment for training in other datasets.

In this section of the demo, we provide the details to train the models from scratch including the baselines.

In [None]:
# 1 Preprocessing
# Preprocess the other dataset for training
# Use a script READRetro/scripts/preprocessing/preprocessing.ipynb

In [None]:
# 2 Training

# 2-1 BioNavi-NP (in BioChem + USPTO_NPL clean)
# Key scripts: config.yaml
# You may train your data by changing the /path/to in config.yaml and wd in scripts to your data path
wd = '/path/to/READRetro/scripts/singlestep_eval/bionavi/clean'
! cd $wd && \
    mkdir run && \
    sed -i 's|/path/to|/your/path|g' config.yaml && \
    onmt_build_vocab -config config.yaml && \
    onmt_train -config config.yaml

In [None]:
# 2-2 Graph2SMILES (in BioChem + USPTO_NPL clean)
# Key scripts: clean_preprocess.sh, clean_train_g2s.sh
g2s = '/path/to/Graph2SMILES'
wd = '/path/to/READRetro/scripts/singlestep_eval/g2s/clean'
! cd $g2s && \
    bash $wd/clean_preprocess.sh && \
    bash $wd/clean_train_g2s.sh

In [None]:
# 2-3 Retroformer (in BioChem + USPTO_NPL clean)
# Key scripts: train.sh
retroformer = '/path/to/Retroformer'
wd = '/path/to/READRetro/scripts/singlestep_eval/retroformer/clean'
! cd $retroformer && \
    bash $wd/train.sh

In [None]:
# 3 Running & Evaluation

# 3-1 BioNavi-NP
# Key scripts: singlestep_eval.py (fxn: evaluate)
from rdkit import RDLogger
lg = RDLogger.logger()
lg.setLevel(4)
from scripts.singlestep_eval.singlestep_eval import read_txt, remove_chiral, evaluate

# BioChem + USPTO_NPL (clean)
wd = '/path/to/READRetro/scripts/singlestep_eval/bionavi/clean'
! onmt_translate -model $wd/model_step_30000.pt $wd/model_step_50000.pt $wd/model_step_80000.pt $wd/model_step_100000.pt \
    -output $wd/results.txt \
    -src $wd/src-test.txt \
    --batch_size 64 --max_length 200 --replace_unk -beam_size 10 -n_best 10 -gpu 0
predP = f'{wd}/results.txt'
tgt_path = f'{wd}/tgt-test.txt'
tgt = read_txt(tgt_path)
tgt = [remove_chiral(each.replace(' ', ''),atomMap=False) for each in tgt]
print(evaluate(tgt,predP, AM=False))

# BioChem + USPTO_NPL
wd = '/path/to/READRetro/scripts/singlestep_eval/bionavi/biochem'
! onmt_translate -model $wd/model_step_30000.pt $wd/model_step_50000.pt $wd/model_step_80000.pt $wd/model_step_100000.pt \
    -output $wd/results.txt \
    -src $wd/src-test.txt \
    --batch_size 64 --max_length 200 --replace_unk -beam_size 10 -n_best 10 -gpu 0

predP = f'{wd}/results.txt'
tgt_path = f'{wd}/tgt-test.txt'
tgt = read_txt(tgt_path)
tgt = [remove_chiral(each.replace(' ', ''),atomMap=False) for each in tgt]
print(evaluate(tgt,predP, AM=False))

In [None]:
# 3-2 Graph2SMILES
# Key scripts: eval.sh

# BioChem + USPTO_NPL (clean)
base = '/path/to/READRetro'
wd = '/path/to/READRetro/scripts/singlestep_eval/g2s'
! cd $base && \
    export path=$wd/clean/checkpoints/clean_g2s_series_rel_dgcn.1/model.72000_0.pt && \
    export vocab_path=$wd/clean/vocab_smiles.txt && \
    bash $wd/eval.sh

# BioChem + USPTO_NPL
! cd $base && \
    export path=$wd/biochem/model.84000_0.pt && \
    export vocab_path=$wd/biochem/vocab_smiles.txt && \
    bash $wd/eval.sh

In [None]:
# 3-3 Retroformer
# Key scripts: eval.sh

# BioChem + USPTO_NPL (clean)
wd = '/path/to/READRetro/scripts/singlestep_eval/retroformer'
! export path=$wd/clean/ckpt_untyped/model_1600000.pt && \
    export vocab_path=$wd/clean/intermediates/vocab_share.pk && \
    bash $wd/eval.sh

# BioChem + USPTO_NPL
! export path=$wd/biochem/ckpt_untyped/model_1600000.pt && \
    export vocab_path=$wd/biochem/intermediates/vocab_share.pk && \
    bash $wd/eval.sh

In [None]:
# 3-4 Retroformer + Graph2SMILES
# Key scripts: eval.sh

# BioChem + USPTO_NPL (clean)
wd = '/path/to/READRetro/scripts/singlestep_eval/ensemble'
g2s = '/path/to/READRetro/scripts/singlestep_eval/g2s/clean'
rf = '/path/to/READRetro/scripts/singlestep_eval/retroformer/clean'
! export path=$rf/ckpt_untyped/model_1600000.pt,$g2s/checkpoints/clean_g2s_series_rel_dgcn.1/model.72000_0.pt && \
    export vocab_path=$rf/intermediates/vocab_share.pk,$g2s/vocab_smiles.txt && \
    bash $wd/eval.sh

# BioChem + USPTO_NPL
g2s = '/path/to/READRetro/scripts/singlestep_eval/g2s/biochem'
rf = '/path/to/READRetro/scripts/singlestep_eval/retroformer/biochem'
! export path=$rf/ckpt_untyped/model_1600000.pt,$g2s/model.84000_0.pt && \
    export vocab_path=$rf/intermediates/vocab_share.pk,$g2s/vocab_smiles.txt && \
    bash $wd/eval.sh

### Multi-step Retrosynthesis Evaluation (Table 2)

Place the checkpoints of the single-step retrosynthesis models (either downloaded from Zenodo or trained from scratch) under the folders `retroformer/saved_models` and `g2s/saved_models`.<br>
Before prediction, set the `save_file` argument in `run_mp.py` properly.<br>
Adjust the `num_thread` argument in `run_mp.py` according to the capacity of your GPU.

In [None]:
# 1 Prediction

# 1-1 Retroformer
# BioChem + USPTO_NPL (clean)
wd = '/path/to/READRetro/scripts/multistep_eval'
! export model_type=retroformer && \
    export model_path='retroformer/saved_models/clean.pt' && \
    bash $wd/predict.sh

# BioChem + USPTO_NPL
! export model_type=retroformer && \
    export model_path='retroformer/saved_models/biochem.pt' && \
    bash $wd/predict.sh

In [None]:
# 1-2 Graph2SMILES
# BioChem + USPTO_NPL (clean)
wd = '/path/to/READRetro/scripts/multistep_eval'
! export model_type=g2s && \
    export model_path='g2s/saved_models/clean.pt' && \
    bash $wd/predict.sh

# BioChem + USPTO_NPL
! export model_type=g2s && \
    export model_path='g2s/saved_models/biochem.pt' && \
    bash $wd/predict.sh

In [None]:
# 1-3 READRetro w/o reaction retriever
# BioChem + USPTO_NPL
wd = '/path/to/READRetro/scripts/multistep_eval'
! bash $wd/predict_wo_retriever.sh

In [None]:
# 1-4 READRetro
# BioChem + USPTO_NPL (clean)
wd = '/path/to/READRetro/scripts/multistep_eval'
! export model_type=ensemble && \
    export model_path='retroformer/saved_models/clean.pt,g2s/saved_models/clean.pt' && \
    bash $wd/predict.sh

# BioChem + USPTO_NPL
! export model_type=ensemble && \
    export model_path='retroformer/saved_models/biochem.pt,g2s/saved_models/biochem.pt' && \
    bash $wd/predict.sh

In [None]:
# 2 Evaluation
# Before evaluation, adjust the `save_file` argument below.

wd = '/path/to/READRetro/scripts/multistep_eval'
! export save_file='result/debug.txt' && \
    export product_class='all' && \
    bash $wd/eval.sh

### Multi-step Retrosynthesis Evaluation by Chemical Classes (Figure 2)

In [None]:
# Before evaluation, adjust the `save_file` argument below.
# Adjust the `product_class` argument below.

wd = '/path/to/READRetro/scripts/multistep_eval'
! export save_file='result/debug.txt' && \
    export product_class='Amino' && \
    bash $wd/eval.sh

### Case Examples of READRetro (Figure 3, Supplementary Figures 4, 7, and 9)

To conduct case studies using READRetro, follow these steps:

Run the provided script for the case study, `READRetro/scripts/casestudy.sh` </br>
Note: Change line 2 of casestudy.sh to your Anaconda3 directory.

Alternatively, run each line separately by executing the commands manually.

Draw the pathways using a chemical sketch tool such as ChemDraw.<br>
You can use web-based chemical sketch tools like [RSCB Chemical Sketch Tool](https://www.rcsb.org/chemical-sketch).<br>
These tools may draw pathways as dot-separated chemicals (e.g., CCCC.CCOC.CCC). You can represent pathways by converting the pathway results into dot-separated SMILES.

In [None]:
! scripts/casestudy.sh

### Single-step Retrosynthesis Evaluation of Various Models (Supplementary Table 1)
Model: BioNavi-NP, R-SMILES, Graph2SMILES, GraphRetro, MEGAN, MHNreact, Retroformer, Retroformer + Graph2SMILES</br>
You must download the official codes and set virtual environments for running [GraphRetro](https://github.com/vsomnath/graphretro), [Megan](https://github.com/molecule-one/megan), and [MHNreact](https://github.com/ml-jku/mhn-react/tree/main).

Note: Evaluation of BioNavi-NP, Graph2SMILES, Retroformer, Retroforemr was decribed above (Single-step Retrosynthesis Evaluation (Table 1)).

In [None]:
from rdkit import RDLogger
lg = RDLogger.logger()
lg.setLevel(4)

In [None]:
# R-SMILES
wd = '/path/to/READRetro/scripts/singlestep_eval/rsmiles'
from scripts.singlestep_eval.singlestep_eval import read_txt, remove_chiral, evaluate

! onmt_translate -model $wd/model_step_80000.pt \
    -output $wd/results.txt \
    -src $wd/src-test.txt \
    --batch_size 64 --max_length 200 --replace_unk -beam_size 10 -n_best 10 -gpu 0

predP = f'{wd}/results.txt'
tgt_path = f'{wd}/tgt-test.txt'
tgt = read_txt(tgt_path)
tgt = [remove_chiral(each.replace(' ', ''),atomMap=False) for each in tgt]
evaluate(tgt,predP, AM=False)

In [None]:
# GraphRetro
wd='/path/to/READRetro/scripts/singlestep_eval/graphretro'
graphretro='/path/to/graphretro'
! source /path/to/anaconda3/etc/profile.d/conda.sh && \
    conda activate graphretro && \
    cd $graphretro && \
    python scripts/eval/single_edit_lg.py \
        --exp_dir . --edits_exp $wd/edit_prediction \
        --edits_step best_model --lg_exp $wd/lg_clasifier --lg_step best_model

In [None]:
# MEGAN
wd='/path/to/READRetro/scripts/singlestep_eval/megan'
megan='/path/to/megan'
! source /path/to/anaconda3/etc/profile.d/conda.sh && \
    conda activate megan && \
    cd $wd && \
    source env.sh && \
    python bin/eval.py $wd/past_biochem --beam-size 10 \
        --dataset-key biochem --dataset-path $wd/past_biochem --ckpt model_best.pt

In [None]:
# MHNreact
wd='/path/to/READRetro/scripts/singlestep_eval/mhnreact'
# change the wd, and mhn_react in the evaluation.py
! source /path/to/anaconda3/etc/profile.d/conda.sh && \
    conda activate mhnreact_env && \
    cd $wd && \
    python evaluation.py

### Single-step Retrosynthesis Evaluation in five train-test splits (Supplementary Table 2)
Model: BioNavi-NP, Graph2SMILES, Retroformer, and Ensemble </br>
Training and Evaluating method are same to above. </br>
The dataset splits (biochem_star_1, 2, 3, and 4) and the checkpoints are in `/path/to/READRetro/scripts/crossval`.

### The Average Number of Pathways (Supplementary Table 4)
Use the scripts eval_npath.py with argument: paths of a predicted result file and a ground truth file.

In [None]:
!cd /path/to/READRetro/scripts/pathnum && \
    python eval_npath.py debug.txt test_gt.txt && \
    python eval_npath.py retroformer_pathnum test_gt.txt && \
    python eval_npath.py g2s_pathnum test_gt.txt

### Evaluation with the LASER dataset (Supplementary Table 5)

Note: You can evaluate Graph2SMILES and Retroformer on the LASER dataset by setting the `model_type` and `model_path` arguments as described in Multi-step Retrosynthesis Evaluation (Table 2).

In [None]:
wd = '/path/to/READRetro/scripts/multistep_eval'
! bash $wd/predict_laser.sh