# Retrosynthetic Planning Using Quantum Reinforcement Learning

In this notebook, we demonstrate how to solve the problem of retrosynthetic planning using quantum reinforcement learning. This problem can be modeled as a one-player game. The chemist works backwards from a molecular target to simpler materials. The choice should come from the reliable reactions between materials [1].

![retro-plan](./images/retro-plan.png)

This problem can be challenging since there may be tens of thousands of combinations. In addition, the value of
each choice remains uncertain until the whole synthesis plan is completed. Some researchers tried to solve this problem using deep reinforcement learning. They modeled the problem this way:

![model-retro](./images/model-retro.png)

Chemical modelcules are illustrated as circles where different color represents different type of molecules.
Orange circle represents the molecular target. The reactions need to be found so that the chemist can start 
from commercially available substrate which are red circles there. Then, after some reactions and intermediate molecules, the molecular target can be synthesized. Reactions are squares in this graph. For example, with substrates $m_9$ and $m_{10}$, the intermediate $ m_4 $ and be synthesized through reaction $ r_3 $.

However, there are some rules to follow in this scenario:

* The synthesis cost should be minimized. According to the following cost equation, the cost $ c_{tot} $
equals the cost of chosen reactions, $ c_{rxn}(r) $, plus the cost of chosen intermediate molecules 
and substrate molecules, $ c_{sub}(m) $. The number inside circles or squares is cost. In this example, the
target molecule $ m_0 $ is synthesized at the cost of 5 because of reaction $ r_0 $, intermediate molecules $ m_1 $ and $ m_3 $. The cost of $ m_2 $ comes from the sum of
 reaction $ r_1 $ and intermediate molecule $ m_4 $.

<center>

$ c_{tot} = \sum \limits _{r} c_{rxn}(r) + \sum \limits _{m} c_{sub}(m) $

</center>

* All commercially available substrates are assigned zero cost.
* If the molecules without possible reactions are reached, a cost penalty of 100 is assigned. These 
dead-end molecule are represented as red circles there. 
* The reaction steps are represented as depth $ d $ there. If a molecule with maximum depth is reached, a cost penalty of 10 is assigned. In this example, the molecule with $ d_{max} = 10 $ is represented as purple circl.

All the above modeling method comes from [2], these parameters can be adjusted according to actual application.
The core part in [2] is a multilayer neural network illustrated schematically in the following image. In this 
notebook we will show how to use quantum neural network with fewer parameters to achieve the similar results in an open source dataset.

![cc-nn](./images/cc-nn.png)


[1] Wiki [Retrosynthetic analysis](https://en.wikipedia.org/wiki/Retrosynthetic_analysis).

[2] [Schreck, John S., Connor W. Coley, and Kyle JM Bishop. "Learning retrosynthetic planning through simulated experience." ACS central science 5.6 (2019): 970-981.](https://pubs.acs.org/doi/10.1021/acscentsci.9b00055).

### Data preparation

USPTO (United States Patent and Trademark Office) 50K consists of 50K extracted atom-mapped reactions with 10 reaction types.
The dataset [USPTO-50K](https://tdcommons.ai/generation_tasks/retrosyn/#uspto-50k) is used in this experiment. The whole dataset is put together and used for plannning task.

In [1]:
# Load the uspto-50k data
from tdc.generation import RetroSyn

data = RetroSyn(name = 'USPTO-50K')
split = data.get_split()

Found local copy...
Loading...
Done!


In [2]:
# Let's explore how this data looks like
split['train'].head(2)

Unnamed: 0,input,output
0,COC(=O)CCC(=O)c1ccc(OC2CCCCO2)cc1O,C1=COCCC1.COC(=O)CCC(=O)c1ccc(O)cc1O
1,COC(=O)c1cccc(-c2nc3cccnc3[nH]2)c1,COC(=O)c1cccc(C(=O)O)c1.Nc1cccnc1N


In [3]:
# The data only contains input and output columns since this is for prediction task. We decide to put all these 
# data together to make something like target, intermediate, substrate and dead-end molecules.

In [3]:
from braket.experimental.algorithms.qc_qrl.utility.DataPrepare import Prepare

In [4]:
# data_path = 'data'
# # download dateset
# !mkdir $data_path
# !mkdir $data_path\smiles
# !wget https://d1o8djwwk7diqy.cloudfront.net/retrosynthetic-plannin-dataset.zip
# !unzip -o retrosynthetic-plannin-dataset.zip
# # # windows
# # !copy retrosynthetic-planning-dataset $data_path
# # !copy data\smiles_map.npy  data\smiles\smiles_map.npy

# # linux
# !cp -r retrosynthetic-planning-dataset/* $data_path
# !cp data/smiles_map.npy  data/smiles
# !rm retrosynthetic-plannin-dataset.zip 

## Prepare parameters for classical and quantum experiments

In [5]:
# !pip install -r requirement.txt

In [4]:
from braket.experimental.algorithms.qc_qrl.utility.RetroRLAgent import RetroRLAgent

In [5]:
agent_param = {}
# initial the RetroRLModel object
init_param = {}
method = ['retro-rl', 'retro-qrl']

for mt in method:
    if mt == 'retro-rl':
        init_param[mt] = {}
        init_param[mt]['param'] = ['inputsize', 'middlesize', 'outputsize']
    elif mt == 'retro-qrl':
        init_param[mt] = {}
        init_param[mt]['param'] = ['n_qubits', 'device', 'framework', 'shots', 'layers']
    
# retro_rl_model = RetroRLModel(data=None, method=method, **init_param)
agent_param['init_param'] = init_param

In [6]:
# train_mode can be: "local-instance", "local-job", "hybrid-job"
train_mode = "hybrid-job"

data_path = 'data'
# please change the following s3 bucket to the one you can upload and download data
s3_data_path = None
if train_mode == "local-job" or train_mode == "hybrid-job":
    s3_bucket_name = "s3://amazon-braket-us-west-1-002224604296"
    # s3_bucket_name = "s3://xxx"
    s3_data_path = f"{s3_bucket_name}/data"
    import os
    os.system(f"aws s3 sync {data_path} {s3_data_path}")

agent_param["data_path"] = data_path
agent_param["s3_data_path"]=s3_data_path
agent_param["train_mode"] = train_mode
agent_param["episodes"] = 2

# retro_model = None

In [7]:
import json
 
agent_param_format = json.dumps(agent_param, indent=4)
print("The agent parameters : \n", agent_param_format)

The agent parameters : 
 {
    "init_param": {
        "retro-rl": {
            "param": [
                "inputsize",
                "middlesize",
                "outputsize"
            ]
        },
        "retro-qrl": {
            "param": [
                "n_qubits",
                "device",
                "framework",
                "shots",
                "layers"
            ]
        }
    },
    "data_path": "data",
    "s3_data_path": "s3://amazon-braket-us-west-1-002224604296/data",
    "train_mode": "hybrid-job",
    "episodes": 2
}


### Compare quatum circuit and classical circuit

First, we use the local instance to see how the classical and quantum model are different

In [8]:
model_param={}
method = 'retro-qrl'
model_param[method] = {}
model_param[method]['n_qubits'] = [8]
# model_param[method]['device'] = ['local', 'sv1', 'aspen-m-3', 'aria-2']
model_param[method]['device'] = ['local']
model_param[method]['framework'] = ['pennylane']
model_param[method]['shots'] = [100]
model_param[method]['layers'] = [1]

agent_param['model_param'] = model_param

n_qubits = model_param[method]['n_qubits'][0]
device = model_param[method]['device'][0]
framework = model_param[method]['framework'][0]
shots = model_param[method]['shots'][0]
layers = model_param[method]['layers'][0]

model_name = "{}_{}_{}_{}_{}".format(n_qubits, device, framework, shots, layers)
agent_param["model_name"] = model_name

agent_param["train_mode"]="local-instance"

if agent_param["train_mode"] == "local-instance":
    retro_qrl_agent = RetroRLAgent(build_model=True, method=method, **agent_param)
else:
    retro_qrl_agent = RetroRLAgent(build_model=False, method=method, **agent_param)

INFO:root:initial reinforcement learning for retrosynthetic-planning
INFO:root:initial quantum reinforcement learning for retrosynthetic-planning
INFO:root:load data...
INFO:root:build_model is True
INFO:root:Construct model for n_qubits:8,device:local,framework:pennylane,layers:1 0.0007688482602437337 min


initial a new agent...
model_param is {'retro-qrl': {'n_qubits': [8], 'device': ['local'], 'framework': ['pennylane'], 'shots': [100], 'layers': [1]}}


In [9]:
# let's see how many parameters are in this circuit model
quantum_param_sum = 0
for param in retro_qrl_agent.NN.parameters():
    quantum_param_sum = quantum_param_sum + param.numel()
print(f"the whole parameters of quantum circuit is {quantum_param_sum}")

the whole parameters of quantum circuit is 8


In [10]:
model_param={}
method = 'retro-rl'
model_param[method] = {}
model_param[method]['inputsize'] = [256]
model_param[method]['middlesize'] = [256]
model_param[method]['outputsize'] = [1]

agent_param['model_param'] = model_param
model_name = f"{model_param[method]['inputsize'][0]}_{model_param[method]['middlesize'][0]}_{model_param[method]['outputsize'][0]}"
agent_param["model_name"] = model_name

agent_param["train_mode"]="local-instance"

if agent_param["train_mode"] == "local-instance":
    retro_crl_agent = RetroRLAgent(build_model=True, method=method, **agent_param)
else:
    retro_crl_agent = RetroRLAgent(build_model=False, method=method, **agent_param)

INFO:root:initial reinforcement learning for retrosynthetic-planning
INFO:root:initial quantum reinforcement learning for retrosynthetic-planning
INFO:root:load data...
INFO:root:build_model is True
INFO:root:Construct model for inputsize:256,middlesize:256,outputsize:1 1.1316935221354167e-05 min


initial a new agent...
model_param is {'retro-rl': {'inputsize': [256], 'middlesize': [256], 'outputsize': [1]}}


In [11]:
# let's see how many parameters are in this circuit model
classical_param_sum = 0
for param in retro_crl_agent.NN.parameters():
    classical_param_sum = classical_param_sum + param.numel()
print(f"the whole parameters of classical circuit is {classical_param_sum}")

the whole parameters of classical circuit is 66049


### Run reinforcement learning

We now can run the reinforcement learning using classical and quantum job

In [12]:
retro_qrl_agent.game_job()

episode 1
epsiode 1 training...
finish epoch 0 for 0.028112188975016276 minutes
finish epoch 1 for 0.028143652280171714 minutes


In [13]:
retro_crl_agent.game_job()

episode 1
epsiode 1 training...
finish epoch 0 for 3.8794676462809245e-05 minutes
finish epoch 1 for 1.4277299245198568e-05 minutes
