# Guided molecular generation with MolMIM

This tutorial demonstrates how to use the molmim_cma package to optimize exploration of the [MolMIM](https://arxiv.org/abs/2208.09016) model's latent space to generate molecules
with properties of interest. To do this, we use the [CMA-ES](https://en.wikipedia.org/wiki/CMA-ES) genetic optimization algorithm. The basic steps of optimization are:

1. Decode latent representations into smiles strings
2. Score generated smiles strings based on the desired oracle function
3. Update the CMA-ES algorithm with the smiles/score pairing
4. Ask the CMA-ES algorithm for a new set of latent space representations to sample.

### BioNeMo Service Configurations
To get started, please configure and provide your NGC access token by visiting https://ngc.nvidia.com/setup/api-key.

In [1]:
API_KEY="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"
API_HOST="https://api.bionemo.ngc.nvidia.com/v1"

# Define the NGC CLI API KEY and ORG for the model download
# If these variables are not already set in the container, uncomment below
# to define and set with your API KEY and ORG
api_key = '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'
ngc_cli_org = '0559908415901577'
# Update the environment variable
import os
os.environ['NGC_CLI_API_KEY'] = api_key
os.environ['NGC_CLI_ORG'] = ngc_cli_org
os.environ['NGC_CLI_TEAM'] = 'no-team'

Then we can import and install library dependencies.

Next, we'll set up the optimizer. It takes in a client, a scoring function callback (the oracle), and the seed SMILES. The client must implement encode() and decode() methods. For this example, the molmim_cma library provides us a client which wraps the bionemo service.

In [5]:
from bionemo.api import BionemoClient as BaseClient
from guided_molecule_gen.inference_client import BioNemoServiceClient

service_client = BioNemoServiceClient(BaseClient(api_key=API_KEY, api_host=API_HOST, org_id=ngc_cli_org))

ConnectionError: HTTPSConnectionPool(host='api.bionemo.ngc.nvidia.com', port=443): Max retries exceeded with url: /v1/models (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x785365bbf1c0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

In [None]:
optimizer = MoleculeGenerationOptimizer(service_client, scoring_function, smis, popsize=20, optimizer_args={"sigma":0.1})

In [3]:
BioNemoServiceClient?

[0;31mInit signature:[0m
[0mBioNemoServiceClient[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mbionemo_client[0m[0;34m:[0m [0mbionemo[0m[0;34m.[0m[0mapi[0m[0;34m.[0m[0mBionemoClient[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmodel_name[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m'molmim'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Abstract class representing the required method signatures for inference.

The functionality required is to be able to encode smiles strings into latent space,
and decode latent space into a smiles string.

Implementers could be (for example) a triton client wrapper, or a wrapper around the Bionemo service.
[0;31mFile:[0m           /opt/conda/envs/rapids/lib/python3.9/site-packages/guided_molecule_gen/inference_client.py
[0;31mType:[0m           _ProtocolMeta
[0;31mSubclasses:[0m     
