# Running REINVENT 4 in Sampling Mode


_Please update the following code block._

In [1]:
import os
import re
import json

# This is relative to the root directory of this repository
outdir = "out/sampling"

# get the notebook's root path
try: ipynb_path
except NameError: ipynb_path = os.getcwd()

root = os.path.abspath(os.path.join(ipynb_path, '..'))
outpath = os.path.join(root, outdir)

os.makedirs(outpath, exist_ok=True)

In [2]:
# All models we have

os.listdir(os.path.join(root, 'priors'))

['libinvent.prior',
 'linkinvent.prior',
 'mol2mol_high_similarity.prior',
 'mol2mol_medium_similarity.prior',
 'mol2mol_mmp.prior',
 'mol2mol_scaffold.prior',
 'mol2mol_scaffold_generic.prior',
 'mol2mol_similarity.prior',
 'reinvent.prior']

_Currently there are 9 models provided by REINVENT4_

In [3]:
# Config will be written in out dir.
config = {
    "run_type": "sampling",
    "parameters": {
        "model_file": os.path.join(root, "priors/reinvent.prior"),
        "unique_molecules": True,
        "randomize_smiles": True
    }
}



def dump_config_and_return_path(use_cuda=True, num_smiles=157):
    prefix = "cuda" if use_cuda else "cpu"
    
    config["use_cuda"] = use_cuda
    config["parameters"]["num_smiles"] = num_smiles
    config["parameters"]["output_file"] = os.path.join(outpath, f"{prefix}_sampling_{num_smiles}.csv")
    
    dump_path = os.path.join(outpath, f"{prefix}_generate_{num_smiles}_sampling_config.json")
    
    
    with open(dump_path, 'w') as f:
        json.dump(config, f, indent=4)
        
    return dump_path

In [4]:
# execute REINVENT using GPU and generate 100 molecules

config_path = dump_config_and_return_path(True, 100)
!reinvent {config_path} -f json

00:56:45 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-19
00:56:45 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cuda_generate_100_sampling_config.json -f json
00:56:45 <INFO> Environment loaded from dotenv file
00:56:45 <INFO> User root on host Ank
00:56:45 <INFO> Python version 3.10.13
00:56:45 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:56:45 <INFO> PyTorch compiled with CUDA version 11.3
00:56:45 <INFO> RDKit version 2022.09.5
00:56:45 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:56:47 <INFO> Number of PyTorch CUDA devices 1
00:56:47 <INFO> Using CUDA device:0 NVIDIA GeForce RTX 3060 Laptop GPU
00:56:47 <INFO> GPU memory: 5136 MiB free, 6143 MiB total
00:56:47 <INFO> Starting Sampling
Running sampler /mnt/d/projects/github/REINVENT4_NOTEBOOKS/priors/reinvent.prior device cuda
00:56:51 <INFO> Using generator Reinven

In [20]:
# execute REINVENT using GPU and generate 1000 molecules

config_path = dump_config_and_return_path(True, 1000)
!reinvent {config_path} -f json

00:01:59 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
00:01:59 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cuda_generate_1000_sampling_config.json -f json
00:01:59 <INFO> Environment loaded from dotenv file
00:01:59 <INFO> User root on host Ank
00:01:59 <INFO> Python version 3.10.13
00:01:59 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:01:59 <INFO> PyTorch compiled with CUDA version 11.3
00:01:59 <INFO> RDKit version 2022.09.5
00:01:59 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:02:00 <INFO> Number of PyTorch CUDA devices 1
00:02:00 <INFO> Using CUDA device:0 NVIDIA GeForce RTX 3060 Laptop GPU
00:02:00 <INFO> GPU memory: 5136 MiB free, 6143 MiB total
00:02:00 <INFO> Starting Sampling
00:02:03 <INFO> Using generator Reinvent
00:02:03 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_NOTEBO

In [21]:
# execute REINVENT using GPU and generate 10,000 molecules

config_path = dump_config_and_return_path(True, 10000)
!reinvent {config_path} -f json

00:02:10 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
00:02:10 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cuda_generate_10000_sampling_config.json -f json
00:02:10 <INFO> Environment loaded from dotenv file
00:02:10 <INFO> User root on host Ank
00:02:10 <INFO> Python version 3.10.13
00:02:10 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:02:10 <INFO> PyTorch compiled with CUDA version 11.3
00:02:10 <INFO> RDKit version 2022.09.5
00:02:10 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:02:10 <INFO> Number of PyTorch CUDA devices 1
00:02:10 <INFO> Using CUDA device:0 NVIDIA GeForce RTX 3060 Laptop GPU
00:02:10 <INFO> GPU memory: 5136 MiB free, 6143 MiB total
00:02:10 <INFO> Starting Sampling
00:02:13 <INFO> Using generator Reinvent
00:02:13 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_NOTEB

In [22]:
# execute REINVENT using GPU and generate 1,00,000 molecules

config_path = dump_config_and_return_path(True, 100000)
!reinvent {config_path} -f json

00:02:34 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
00:02:34 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cuda_generate_100000_sampling_config.json -f json
00:02:34 <INFO> Environment loaded from dotenv file
00:02:34 <INFO> User root on host Ank
00:02:34 <INFO> Python version 3.10.13
00:02:34 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:02:34 <INFO> PyTorch compiled with CUDA version 11.3
00:02:34 <INFO> RDKit version 2022.09.5
00:02:34 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:02:34 <INFO> Number of PyTorch CUDA devices 1
00:02:34 <INFO> Using CUDA device:0 NVIDIA GeForce RTX 3060 Laptop GPU
00:02:35 <INFO> GPU memory: 5136 MiB free, 6143 MiB total
00:02:35 <INFO> Starting Sampling
00:02:38 <INFO> Using generator Reinvent
00:02:38 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_NOTE

In [23]:
# execute REINVENT using CPU and generate 100 molecules

config_path = dump_config_and_return_path(False, 100)
!reinvent {config_path} -f json

00:05:37 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
00:05:37 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_generate_100_sampling_config.json -f json
00:05:37 <INFO> Environment loaded from dotenv file
00:05:37 <INFO> User root on host Ank
00:05:37 <INFO> Python version 3.10.13
00:05:37 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:05:37 <INFO> PyTorch compiled with CUDA version 11.3
00:05:37 <INFO> RDKit version 2022.09.5
00:05:37 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:05:38 <INFO> Number of PyTorch CUDA devices 1
00:05:38 <INFO> Using CPU x86_64
00:05:38 <INFO> Starting Sampling
00:05:38 <INFO> Using generator Reinvent
00:05:38 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_sampling_100.csv
00:05:38 <INFO> Sampling 100 SMILES from model /mnt/d/projec

In [24]:
# execute REINVENT using CPU and generate 1000 molecules

config_path = dump_config_and_return_path(False, 1000)
!reinvent {config_path} -f json

00:05:44 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
00:05:44 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_generate_1000_sampling_config.json -f json
00:05:44 <INFO> Environment loaded from dotenv file
00:05:44 <INFO> User root on host Ank
00:05:44 <INFO> Python version 3.10.13
00:05:44 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:05:44 <INFO> PyTorch compiled with CUDA version 11.3
00:05:44 <INFO> RDKit version 2022.09.5
00:05:44 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:05:44 <INFO> Number of PyTorch CUDA devices 1
00:05:44 <INFO> Using CPU x86_64
00:05:44 <INFO> Starting Sampling
00:05:44 <INFO> Using generator Reinvent
00:05:44 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_sampling_1000.csv
00:05:44 <INFO> Sampling 1000 SMILES from model /mnt/d/pro

In [25]:
# execute REINVENT using CPU and generate 10,000 molecules

config_path = dump_config_and_return_path(False, 10000)
!reinvent {config_path} -f json

00:06:03 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
00:06:03 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_generate_10000_sampling_config.json -f json
00:06:03 <INFO> Environment loaded from dotenv file
00:06:03 <INFO> User root on host Ank
00:06:03 <INFO> Python version 3.10.13
00:06:03 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:06:03 <INFO> PyTorch compiled with CUDA version 11.3
00:06:03 <INFO> RDKit version 2022.09.5
00:06:03 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:06:03 <INFO> Number of PyTorch CUDA devices 1
00:06:03 <INFO> Using CPU x86_64
00:06:03 <INFO> Starting Sampling
00:06:03 <INFO> Using generator Reinvent
00:06:03 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_sampling_10000.csv
00:06:03 <INFO> Sampling 10000 SMILES from model /mnt/d/

In [26]:
# execute REINVENT using CPU and generate 1,00,000 molecules

config_path = dump_config_and_return_path(False, 100000)
!reinvent {config_path} -f json

00:08:24 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
00:08:24 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_generate_100000_sampling_config.json -f json
00:08:24 <INFO> Environment loaded from dotenv file
00:08:24 <INFO> User root on host Ank
00:08:24 <INFO> Python version 3.10.13
00:08:24 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:08:24 <INFO> PyTorch compiled with CUDA version 11.3
00:08:24 <INFO> RDKit version 2022.09.5
00:08:24 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:08:24 <INFO> Number of PyTorch CUDA devices 1
00:08:24 <INFO> Using CPU x86_64
00:08:24 <INFO> Starting Sampling
00:08:24 <INFO> Using generator Reinvent
00:08:24 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_sampling_100000.csv
00:08:24 <INFO> Sampling 100000 SMILES from model /mnt

In [5]:
# Find 200 molecules similar to the provided molecules using GPU

mol_2_mol_config = config.copy()

mol_2_mol_config['use_cuda'] = True
mol_2_mol_config['parameters']['model_file'] = os.path.join(root, 'priors', 'mol2mol_medium_similarity.prior')
mol_2_mol_config['parameters']['smiles_file'] = os.path.join(root, 'configs', 'toml', 'mol2mol.smi')
mol_2_mol_config['parameters']['sample_strategy'] = "beamsearch"
mol_2_mol_config['parameters']['temperature'] = 1.0 
mol_2_mol_config['parameters']['tb_logdir'] = "tb_logs"

prefix = "cuda"
num_smiles = 100
mol_2_mol_config["parameters"]["num_smiles"] = num_smiles
mol_2_mol_config["parameters"]["output_file"] = os.path.join(outpath, f"{prefix}_sampling_similar_{num_smiles}.csv")

dump_path = os.path.join(outpath, f"{prefix}_generate_{num_smiles}_sampling_similar_config.json")


with open(dump_path, 'w') as f:
    json.dump(mol_2_mol_config, f, indent=4)
    

!reinvent {dump_path} -f json

00:57:38 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-19
00:57:38 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cuda_generate_100_sampling_similar_config.json -f json
00:57:38 <INFO> Environment loaded from dotenv file
00:57:38 <INFO> User root on host Ank
00:57:38 <INFO> Python version 3.10.13
00:57:38 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:57:38 <INFO> PyTorch compiled with CUDA version 11.3
00:57:38 <INFO> RDKit version 2022.09.5
00:57:38 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:57:39 <INFO> Number of PyTorch CUDA devices 1
00:57:39 <INFO> Using CUDA device:0 NVIDIA GeForce RTX 3060 Laptop GPU
00:57:40 <INFO> GPU memory: 5136 MiB free, 6143 MiB total
00:57:40 <INFO> Starting Sampling
--- Logging error ---
Traceback (most recent call last):
  File "/root/miniconda3/envs/re/lib/python3.10/logging/__init__.py

In [44]:
# Find 2000 molecules similar to the provided molecules using GPU

mol_2_mol_config['use_cuda'] = True
num_smiles = 1000
mol_2_mol_config["parameters"]["num_smiles"] = num_smiles
mol_2_mol_config["parameters"]["output_file"] = os.path.join(outpath, f"{prefix}_sampling_similar_{num_smiles}.csv")

dump_path = os.path.join(outpath, f"{prefix}_generate_{num_smiles}_sampling_similar_config.json")


with open(dump_path, 'w') as f:
    json.dump(mol_2_mol_config, f, indent=4)
    

!reinvent {dump_path} -f json

20:37:56 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
20:37:56 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_generate_1000_sampling_similar_config.json -f json
20:37:56 <INFO> Environment loaded from dotenv file
20:37:56 <INFO> User root on host Ank
20:37:56 <INFO> Python version 3.10.13
20:37:56 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
20:37:56 <INFO> PyTorch compiled with CUDA version 11.3
20:37:56 <INFO> RDKit version 2022.09.5
20:37:56 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
20:37:56 <INFO> Number of PyTorch CUDA devices 1
20:37:56 <INFO> Using CUDA device:0 NVIDIA GeForce RTX 3060 Laptop GPU
20:37:57 <INFO> GPU memory: 5136 MiB free, 6143 MiB total
20:37:57 <INFO> Starting Sampling
20:37:59 <INFO> Using generator Mol2Mol
20:37:59 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_

In [38]:
# Find 200 molecules similar to the provided molecules using CPU

mol_2_mol_config['use_cuda'] = False
num_smiles = 100

prefix = "cpu"
mol_2_mol_config["parameters"]["output_file"] = os.path.join(outpath, f"{prefix}_sampling_similar_{num_smiles}.csv")

dump_path = os.path.join(outpath, f"{prefix}_generate_{num_smiles}_sampling_similar_config.json")


with open(dump_path, 'w') as f:
    json.dump(mol_2_mol_config, f, indent=4)
    

!reinvent {dump_path} -f json

00:56:41 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
00:56:41 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_generate_100_sampling_similar_config.json -f json
00:56:41 <INFO> Environment loaded from dotenv file
00:56:41 <INFO> User root on host Ank
00:56:41 <INFO> Python version 3.10.13
00:56:41 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
00:56:41 <INFO> PyTorch compiled with CUDA version 11.3
00:56:41 <INFO> RDKit version 2022.09.5
00:56:41 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
00:56:41 <INFO> Number of PyTorch CUDA devices 1
00:56:41 <INFO> Using CPU x86_64
00:56:41 <INFO> Starting Sampling
00:56:41 <INFO> Using generator Mol2Mol
00:56:41 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_sampling_similar_100.csv
00:56:41 <WARN> randomize_smiles set to false

In [42]:
# Find 2000 molecules similar to the provided molecules using CPU

mol_2_mol_config['use_cuda'] = False
num_smiles = 1000
prefix = "cpu"
mol_2_mol_config["parameters"]["output_file"] = os.path.join(outpath, f"{prefix}_sampling_similar_{num_smiles}.csv")

dump_path = os.path.join(outpath, f"{prefix}_generate_{num_smiles}_sampling_similar_config.json")


with open(dump_path, 'w') as f:
    json.dump(mol_2_mol_config, f, indent=4)
    

!reinvent {dump_path} -f json

19:56:47 <INFO> Started REINVENT 4.0.35 (C) AstraZeneca 2017, 2023 on 2024-02-05
19:56:47 <INFO> Command line: /root/miniconda3/envs/re/bin/reinvent /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_generate_1000_sampling_similar_config.json -f json
19:56:47 <INFO> Environment loaded from dotenv file
19:56:47 <INFO> User root on host Ank
19:56:47 <INFO> Python version 3.10.13
19:56:47 <INFO> PyTorch version 1.12.1+cu113, git 664058fa83f1d8eede5d66418abff6e20bd76ca8
19:56:47 <INFO> PyTorch compiled with CUDA version 11.3
19:56:47 <INFO> RDKit version 2022.09.5
19:56:47 <INFO> Platform Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
19:56:47 <INFO> Number of PyTorch CUDA devices 1
19:56:47 <INFO> Using CPU x86_64
19:56:47 <INFO> Starting Sampling
19:56:48 <INFO> Using generator Mol2Mol
19:56:48 <INFO> Writing sampled SMILES to CSV file /mnt/d/projects/github/REINVENT4_NOTEBOOKS/out/sampling/cpu_sampling_similar_1000.csv
19:56:48 <WARN> randomize_smiles set to fal

### Report

I used REINVENT4 to sampling new moelcules on my local machine. I used two models provided by REINVENT4, they are `reinvnet.prior` to generate new molecules and `mol2mol_medium_similarity.prior` (sample strategy="beamsearch") to generate molecules similar to one.


#### Hardware Specification

1. Operating System: WSL (Window Subsystem for Linux) Ubuntu 22.04 (Kernal: 5.15.133.1-microsoft-standard-WSL2)
1. GPU: RTX 3060, 3584 Cores, 6GB GDDR6
1. CPU: AMD Ryzen 9 5900HX (8 Cores, 16 Threads) Base Clock frequence 3.3GHz
1. RAM: 40GB (3200MHz)

#### Findings

_`reinvent.prior` on GPU_


|Number of moelcules generated|Peak Main Memory Usuage|Time|
|-|-|-|
|100|3.2GB|<1s|
|1,000|3.2GB|1s|
|10,000|3.3GB|14s|
|1,00,000|4.7GB|2min35sec|


_`reinvent.prior` on CPU_


|Number of moelcules generated|Peak Main Memory Usuage|Time|
|-|-|-|
|100|0.5GB|1s|
|1,000|0.5GB|13s|
|10,000|0.6GB|2min14s|
|1,00,000|2GB|21min41s|


_`mol2mol_medium_similarity.prior` on GPU_


|Number of moelcules generated|Peak Main Memory Usuage|Time|
|-|-|-|
|200|3.4GB|4s|
|2000|3.4GB|34s|


_`mol2mol_medium_similarity.prior` on CPU_


|Number of moelcules generated|Peak Main Memory Usuage|Time|
|-|-|-|
|200|1.4GB|1min36s|
|2000|1.4GB|18min35s|


#### Misc

REINVENT4 is capable of concurrent execution across multiple CPU threads.

