# MolMIM Endpoints

MolMIM provides the following endpoints and associated functions:

1. /embedding - Retrieve the embeddings from MolMIM for a given input molecule.

2. /hidden - Retrieve the hidden state from MolMIM for a given input molecule (shown as the latent code in Figure 1 of the MolMIM manuscript).

3. /decode - Decode a hidden state representation into a SMILES string sequence.

4. /sampling - Sample the latent space within a given scaled radius from a seed molecule. This method generates new molecule samples from the given input in an unguided fashion.

5. /generate - Generate novel molecules (optionally while optimizing against a certain property). This method generates new optimized molecules if CMA-ES-guided sampling is enabled.

## 1. Embedding
**/embedding**

- Request Body:
  - sequences: array of strings (SMILES strings)

- Response:
  - embeddings: array of arrays of floating point numbers (embeddings)

The following commands send a POST request to the `/embedding` endpoint, providing a JSON object with a single molecule sequence `(CC(Cc1ccc(cc1)C(C(=O)O)C)C)` to retrieve its embeddings from MolMIM.

In [1]:
import requests
import json

url = "http://localhost:8000/embedding"

headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}

data = json.dumps({"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]})

response = requests.post(url, headers=headers, data=data)

print(response.text)

{"embeddings":[[0.0801224559545517,-0.0896579921245575,0.24964910745620728,-0.2170124500989914,0.30946335196495056,0.3532591462135315,-0.3029751181602478,0.3649153411388397,0.01321359071880579,-0.5798253417015076,-0.8439728617668152,0.37885111570358276,-0.14184853434562683,-0.03583259880542755,-0.11395937204360962,-0.3530144989490509,-0.1570361852645874,0.380291610956192,-0.19600725173950195,0.02915271371603012,0.3751767575740814,0.36047524213790894,0.09469098597764969,-0.04155426472425461,0.10494697093963623,0.05581340938806534,-0.06103359907865524,-0.38275018334388733,-0.1162283718585968,0.40096673369407654,-0.07640613615512848,0.5257150530815125,-0.23027649521827698,-0.08075296133756638,-0.025870447978377342,-0.0695173367857933,0.5361998081207275,0.11133065819740295,-0.026700884103775024,0.12458030134439468,0.5584775805473328,-0.08450194448232651,-0.06929121911525726,-0.2546241581439972,-0.7630581259727478,-0.04098338633775711,0.08573420345783234,0.5237798690795898,0.157486543059349

The next commands send a POST request to the `/embedding` endpoint, providing a JSON object with two molecule sequences `(CN1C=NC2=C1C(=O)N(C(=O)N2C)C` and `CC(Cc1ccc(cc1)C(C(=O)O)C)C)` to retrieve their embeddings from MolMIM.

In [2]:
import requests
import json

url = "http://localhost:8000/embedding"

data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}

headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

{"embeddings":[[0.3483719825744629,0.1407497227191925,-0.3170012831687927,0.1958293467760086,0.32362350821495056,-0.3269166350364685,-0.1047329306602478,-0.6289811730384827,0.3348536193370819,-0.34813588857650757,-0.4577424228191376,-0.39922505617141724,-0.5516385436058044,0.11565665900707245,-0.6083441376686096,0.4170050323009491,-0.3099902868270874,-0.05247815325856209,0.03467511385679245,0.007897219620645046,-0.01286950521171093,0.5452896952629089,0.18813581764698029,0.3638717532157898,-0.36813652515411377,0.16082440316677094,0.36779940128326416,-0.053068794310092926,-0.7236502170562744,-0.7699316740036011,-0.5189110040664673,-0.32902130484580994,-0.295706182718277,-0.17694436013698578,0.3088310956954956,0.1800554096698761,-0.10589004307985306,0.07202401757240295,0.006727307103574276,0.1443556845188141,0.2628232538700104,0.03442505747079849,-0.48091232776641846,-0.4689796268939972,0.7261996865272522,0.2060258835554123,0.07987482845783234,0.37045955657958984,-0.8655847311019897,0.271

## Hidden
**/hidden**

- Request Body:
  - sequences: array of strings (SMILES strings)

- Response:
  - hiddens: array of arrays of arrays of floating point numbers (hidden states)
    
  - mask: array of arrays of booleans (mask)

The following commands send a POST request to the `/hidden` endpoint, providing a JSON object with a single molecule sequence `(CC(Cc1ccc(cc1)C(C(=O)O)C)C)` to retrieve its hidden state representation from MolMIM. The response is saved to the local file local-hidden-single.json.

In [3]:
import requests
import json

url = "http://localhost:8000/hidden"
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}
data = '{"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}'
response = requests.post(url, headers=headers, data=data)

with open('local-hidden-single.json', 'w') as f:
    json.dump(response.json(), f)

The following commands send a POST request to the `/hidden` endpoint, providing a JSON object with two molecule sequences `(CN1C=NC2=C1C(=O)N(C(=O)N2C)C` and `CC(Cc1ccc(cc1)C(C(=O)O)C)C)` to retrieve their hidden state representations from MolMIM. The response is saved to the local file local-hidden-multiple.json.

In [4]:
import requests
import json

url = "http://localhost:8000/hidden"
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}

data = {
    "sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]
}

response = requests.post(url, headers=headers, json=data)

with open('local-hidden-multiple.json', 'w') as f:
    json.dump(response.json(), f)

## Decode
**/decode**

- Request Body:
  - hiddens: array of arrays of arrays of floating point numbers (hidden states)
    
  - mask: array of arrays of booleans (mask)

- Response:
  - generated: array of strings (SMILES strings)

The following commands send a POST request to the `/decode` endpoint, providing the contents of the local-hidden-single.json file (which contains a single molecule's hidden state representation) to decode the hidden state into a SMILES string sequence.

In [5]:
import requests
import json

with open('./local-hidden-single.json') as f:
    data = json.load(f)

response = requests.post('http://localhost:8000/decode', 
                         headers={'accept': 'application/json', 'Content-Type': 'application/json'}, 
                         json=data)

print(response.text)

{"generated":["CC(C)Cc1ccc(C(C)C(=O)O)cc1"]}


The following commands send a POST request to the `/decode` endpoint, providing the contents of the local-hidden-multiple.json file (which contains multiple molecules' hidden state representations) to decode the hidden states into SMILES string sequences.

In [6]:
import requests
import json

with open('./local-hidden-multiple.json', 'r') as f:
    data = json.load(f)

response = requests.post('http://localhost:8000/decode', 
                         headers={'accept': 'application/json', 'Content-Type': 'application/json'}, 
                         json=data)

print(response.text)

{"generated":["Cn1c(=O)c2c(ncn2C)n(C)c1=O","CC(C)Cc1ccc(C(C)C(=O)O)cc1"]}


## Sampling
**/sampling**

- Request Body:
  - sequences: array of strings (SMILES strings)
    
  - beam_size: integer (beam width, between 1 and 10, default: 1)
 
  - num_molecules: integer (number of molecules, between 1 and 10, default: 1)
 
    
  - scaled_radius: floating point number (scaled radius, between 0 and 2, default: 0.7)

- Response:
  - generated: array of arrays of strings (SMILES strings)

The following commands send a POST request to the `/sampling` endpoint, providing a JSON object with one molecule sequence `(CN1C=NC2=C1C(=O)N(C(=O)N2C)C)`. The MolMIM server samples the latent space within a given scaled radius from each of this seed molecule, generating new molecule samples in an unguided fashion.

In [7]:
import requests
import json

url = "http://localhost:8000/sampling"
data = {"sequences": ["CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

{"generated":[["CC(C)Cc1ccc(C(C)(CO)CO)cc1"]]}


The following commands send a POST request to the `/sampling` endpoint, providing a JSON object with two molecule sequences `(CN1C=NC2=C1C(=O)N(C(=O)N2C)C` and `CC(Cc1ccc(cc1)C(C(=O)O)C)C)`. The MolMIM server samples the latent space within a given scaled radius from each of these seed molecules, generating new molecule samples in an unguided fashion.

In [8]:
import requests
import json

url = "http://localhost:8000/sampling"
data = {"sequences": ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C", "CC(Cc1ccc(cc1)C(C(=O)O)C)C"]}

headers = {"Content-Type": "application/json"}

response = requests.post(url, headers=headers, json=data)

print(response.text)

{"generated":[["CCN(C(=O)c1cncn1C)[C@@H](C)CNC(=O)c1cnn(C)c1C"],["CC(C)Cc1ccc(C(C)(C(=O)O)C(F)(F)F)cc1"]]}


## Generate
- Request Body:

  - `smi`: string (SMILES string)

  - `algorithm`: string (algorithm to use, either "CMA-ES" or "none", default: "CMA-ES")

  - `iterations`: integer (number of iterations, between 1 and 1000, default: 10)

  - `min_similarity`: floating point number (minimum similarity, between 0 and 0.7, default: 0.7)

  - `minimize`: boolean (whether to minimize the property, default: false)

  - `num_molecules`: integer (number of molecules, between 1 and 100, default: 10)

  - `particles`: integer (number of particles, between 2 and 1000, default: 30)

  - `property_name`: string (property to optimize, either "QED" or "plogP", default: "QED")

  - `scaled_radius`: floating point number (scaled radius, between 0 and 2, default: 1.0)

- Response:

  - `generated`: array of strings (SMILES strings)

The `/generate` endpoint provides two alternate options:

  1. CMA-ES - a black-box optimization algorithm that can guide MolMIM sampling to optimize for a specific property; in this case, either QED or plogP.

  2. Random sampling - functions similarly to the `/sampling` endpoint, but with less flexibility for the sampling parameters.

Required parameters for each algorithm type:

- For the "CMA-ES" algorithm:

  - `smi`

  - `num_molecules`

  - `property_name`

  - `minimize`

  - `min_similarity`

  - `particles`

  - `iterations`

- For random sampling ("none") algorithm:

  - `smi`

  - `num_molecules`

  - `particles`

  - `scaled_radius`

This first set of commands use the CMA-ES algorithm to generate five molecules, maximizing the QED property, with a minimum similarity of 0.4, eight particles, and three iterations.

In [9]:
import requests
import json

url = 'http://localhost:8000/generate'

data = {
    "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "algorithm": "CMA-ES",
    "num_molecules": 5,
    "property_name": "QED",
    "minimize": False,
    "min_similarity": 0.4,
    "particles": 8,
    "iterations": 3
}

headers = {'Content-Type': 'application/json'}

response = requests.post(url, headers=headers, json=data)

print(response.text)

{"generated":[{"smiles":"Cn1c(=O)c2c(ncn2CCC(C)(F)F)n(C)c1=O","score":0.8161010239583072},{"smiles":"Cn1c(=O)c2c(ncn2CCN(C)C(=O)OC(C)(C)C)n(C)c1=O","score":0.8032541436809285},{"smiles":"Cn1c(=O)c2c(ncn2CCN(C(=O)OC(C)(C)C)C2CC2)n(C)c1=O","score":0.8024687893051282},{"smiles":"Cn1c(N2CCN(C(=O)CCS(C)(=O)=O)CC2)nc2cccc(F)c21","score":0.7978309451315565},{"smiles":"Cn1c(=O)c2c(ncn2CC(C)(C)F)n(C)c1=O","score":0.7678231604347313}]}


This second set of commands use the CMA-ES algorithm to generate five molecules, maximizing plogP, with a minimum similarity of 0.4, eight particles, and three iterations.

In [10]:
import requests
import json

url = "http://localhost:8000/generate"

data = {
    "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "algorithm": "CMA-ES",
    "num_molecules": 5,
    "property_name": "plogP",
    "minimize": True,
    "min_similarity": 0.4,
    "particles": 8,
    "iterations": 3
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

{"generated":[{"smiles":"Cc1c(F)ccc(-n2c(C)n(C)c(=O)c2=O)c1Br","score":-100.0},{"smiles":"Cn1c(=O)c2c(ncn2C)n(C)c(=O)c1=O","score":-5.532875846274631},{"smiles":"Cn1c(=O)c2c(ncn2Cc2ncc[nH]c2=O)n(CCO)c1=O","score":-2.6372717831859305},{"smiles":"Cn1c(=O)c2c(ncn2Cc2ncnn2C)n(C)c1=O","score":-2.0100309815032937},{"smiles":"Cn1c(=O)c2c(ncn2CC(=O)NCc2ccns2)n(C)c1=O","score":-1.903221843709685}]}


The last set of commands use the random sampling ("none") algorithm to generate five molecules with a seed molecule specified by the SMILES string `(CN1C=NC2=C1C(=O)N(C(=O)N2C)C)`, using eight particles and a scaled radius of 1.0.

In [11]:
import requests
import json

url = "http://localhost:8000/generate"

data = {
    "smi": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "algorithm": "none",
    "num_molecules": 5,
    "particles": 8,
    "scaled_radius": 1.0
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=data)

print(response.text)

{"generated":[{"smiles":"O[C@@H]1CC[C@@H](CNCCCCCCCCBr)C1","score":0.4783246766105461},{"smiles":"OCCC(CCNCCCCCCCCc1ccccc1)CO","score":0.43312014738728954},{"smiles":"NCCCO[C@@H](CCCCCCCCCCCO)C1CC1","score":0.4231556933003187},{"smiles":"OCCCCCCCCCNCCCCCC1CCCCC1","score":0.39466806945423477},{"smiles":"O=C(CCC1CCCC1)NCCOCCNC(=O)CCCCCCO","score":0.3923987935800907}]}


# Benchmarking
Benchmarking the accuracy and performance of autoencoder-style models, such as MolMIM, is crucial to evaluate their effectiveness. Accuracy can be measured by evaluating the reconstruction accuracy, which is the ability of the model to accurately generate the original input SMILES string from its encoded representation. Performance can be measured by the time it takes for the model to generate the output starting from the input, which is known as the decoding time. The script provided below demonstrates how to benchmark the accuracy and performance of such a model for a single SMILES string example. It sends a POST request to the model's `/hidden` endpoint to encode the input SMILES string, and then sends another POST request to the `/decode` endpoint to decode the encoded representation back into a SMILES string. The script measures the time it takes for each of these steps and checks if the generated SMILES string matches the original input. This script can be easily extended to cover multiple SMILES strings by modifying the `smiles` variable and sending multiple requests to the model's endpoints.

In [12]:
import requests
import time

# You can change this line to match the URL and/or port of the MolMIM NIM
url = "http://localhost:8000"

# Input SMILES
smiles = "CC[C@@H](OC)[C@H](Cl)C(=O)N1CCCCC1"

# Send POST request to /hidden endpoint
data = {"sequences": [smiles]}
start_time = time.time()
response = requests.post(f"{url}/hidden", json=data)
hidden_time = time.time() - start_time
hidden_output = response.json()


# Send POST request to /decode endpoint
response = requests.post(f"{url}/decode", json=hidden_output)
decode_time = time.time() - (start_time + hidden_time)
output = response.json()["generated"][0]

if output == smiles:
    matches = True
else:
    matches = False

print(f"Matches:{matches}")
print(f"Hidden time:{hidden_time:.03f}")
print(f"Decode time:{decode_time:.03f}")

Matches:True
Hidden time:0.052
Decode time:2.320
