In [None]:
# Import styles
import sys
sys.path.append('./styles')
from init_style import init
init()

<!-- Header banner -->
<div class="banner"><div>Mammal Model Introduction</div><b>OpenAD <span>Tutorial</span></b></div>

### Table of Contents

- [Introduction to OpenAD](#Introduction-to-OpenAD)
- [Introduction to BMFM Mammal](#Introduction-to-BMFM-Mammal)<br>
    -[Foundation models for Targets Discovery](#Foundation-models-for-Targets-Discovery)<br>
    -[Foundation Models for Biologics Discovery](#Foundation-Models-for-Biologics-Discovery)<br>
    -[Foundation Models for Small Molecules Discovery](#Foundation-Models-for-Small-Molecules-Discovery)<br>
- [Working With OpenAD Magic Commands](#Working-with-OpenAD-Commands)
- [The Hands on Lab](#The-Hands-on-Lab)

## Introduction to OpenAD

OpenAD is an open-source framework for molecular and materials discovery developed by IBM Research.

The OpenAD client is accessible from a command line interface, Jupyter Notebook and an API. It provides unified access to a variety of tools and AI models for literature knowledge extraction, forward and retrosynthesis prediction, generative methods and property inference. You can train models on your own data as well as visualize and filter candidate molecules.

#### Home Page <br><a href="https://accelerate.science/projects/openad">accelerate.science/projects/openad</a>

#### PyPi <br><a href="https://pypi.org/project/openad/"> OpenAD on PYPI</a>

#### Online Documentation
This opens the OpenAD documentation website:<br><a href="https://acceleratedscience.github.io/openad-docs/">acceleratedscience.github.io/openad-docs</a>

## Introduction to BMFM Mammal

Learning a molecular language for protein interactions is crucial for advancing drug discovery. Foundation models, trained on diverse biomedical data like antibody-antigen interactions and small molecule-protein interactions, are transforming this field. Unlike traditional computational approaches, they widen the search scope for novel molecules and refine it to eliminate unsuitable ones, emphasizing the detailed nuances in molecular structure and dynamics.

IBM Research biomedical foundation model (BMFM) technologies leverage multi-modal data of different types, including drug-like small molecules and proteins (covering a total of more than a billion molecules), as well as single-cell RNA sequence and other biomedical data.

Our research team has a diverse range of expertise, including computational chemistry, medicinal chemistry, artificial intelligence, computational biology, physical sciences, and biomedical informatics.

Our BMFM Technologies currently cover the following three domains:
- Foundation models for Targets Discovery
- Foundation Models for Biologics Discovery
- Foundation Models for Small Molecules Discovery

<img src="./media/BMFM_Technologies_for_Drug_Discovery_1_625f4be741.jpeg" width=1000 >

### Foundation models for Targets Discovery

Targets discovery models learn the representation of DNA, bulk RNA, single-cell RNA expression data and other cell level signaling information for the identification of novel diagnostic and therapeutic targets, allowing tasks such as cell type annotation and classification, gene perturbation prediction, disease state prediction, splice variants prediction, promoter region, and treatment response.

<img src="./media/BMFM_1_1ce4bc0401.png" style="background-color:white;" width=1000>

### Foundation Models for Biologics Discovery

Biologics discovery models focus on biologic therapeutics discovery, with the goal of leveraging large-scale representations of protein sequences, structures, and dynamics for diverse downstream tasks associated with multiple biologics modalities. These models produce unified representations of biological molecular entities, integrating data such as protein sequences, protein complex structures, and protein-protein complex binding free energies into a single framework. These models can serve as the basis for diverse downstream tasks in therapeutic design, including candidate generation and assessment, across antibody, TCR, vaccine, and other modalities.

<img src="./media/BMFM_2_ae4ec386c4.png" style="background-color:white;"width=1000>

### Foundation Models for Small Molecules Discovery

Small molecules models can address a wide variety of downstream predictive and generative tasks. These models are trained on multiple representations of small molecules data to learn rich low-dimensional representations of biochemical entities relevant to drug discovery, allowing tasks such as property and affinity prediction, multi-model late fusion prediction, and scaffold-based generation. Predictive models are transformer models pretrained on multiple views (i.e., modalities) of small molecule data and learn rich latent representations by maximizing mutual information across different views of molecules. Generative models learn by driving input molecules to output mutant molecules with a cognate property embedding of the mutant via diffusive denoising networks. Given a set of desired properties and a desired template molecule (3D-strcutures), a set of designer molecules (3D-strcutures) can be obtained.

<img src="./media/BMFM_3_6ef1496dbf.png" style="background-color:white;" width=1000>

## Working with OpenAD Commands

When using Magic commands to access the Openad toolkit you have 2 options 

1. **Default mode:** `%openad`<br>
This is the recommended mode, which will display your data and warnings visually in your Notebook.<br>
Whenever displaying data, follow-up commands are displayed that allow you to further process the data, eg. `result open`, `result edit`, `result copy`<br><br>
Example usage:
    
        %openad display data 'sample.csv'


<br>

2. **Data mode:** `%openadd`<br>
This mode skips visualisation and returns your results in a dataframe or list format that can then be used programatically in functions or flows in your Notebook. This is useful for prebuilt Notebook process flows.<br><br>
Example usage:

        my_data = %openadd display data 'sample.csv'

    This is essentially shorthand for:
    
        %openad display data 'sample.csv'
        my_data = %openad result as dataframe

#### Using variables
Magic commands can access variables from your Notebook, using the `{variable_name}` syntax.

    external_file = '~/openad_notebooks/examples/base_molecules.csv'
    new_filename = '<imported_data.csv>'
    import from '{molecules_file}' to '{new_filename}'

## The Hands on Lab

In [1]:
import os


In [None]:
%openad uncatalog model service  mammal
%openad uncatalog model service sm

In [None]:
%openad catalog model service from remote '{mammal_url}' as mammal
%openad catalog model service from remote '{sm_url}' as sm

In [None]:
%openad model service status

### Initializing the Data we will be working with

#### First Lets initialise the Drug Small Molecules we will be working with as a List in Python

In [None]:
%openad clear molecules Force
drug_smiles = [ 'CC(C)C(CO)NC1=NC(=C2C(=N1)N(C=N2)C(C)C)NC3=CC(=C(C=C3)C(=O)O)Cl', 'Ruboxistaurin','flavopiridol','fasudil','Quercitin',\
                 'h89','Purvalanol','gefitinib','PD173955','C1=CC=C2C(=C1)C(=C(N2)O)C3=C(C4=CC=CC=C4N3)N=O','lapatinib','LY294002','PP2']

#### Now Lets Look at the Protein Targets we are defining

In [7]:
# this proteins will be used in our analysis later on 
proteins= ['SVERIYQKKTQLEHILLRPDTYIGSVELVTQQMWVYDEDVGINYREVTFVPGLYKIFDEILVNAADNKQRDPKMSCIRVTIDPENNLISIWNNGKGIPVVEHKVEKMYVPALIFGQLLTSSNYDDDE\
KKVTGGRNGYGAKLCNIFSTKFTVETASREYKKMFKQTWMDNMGRAGEMELKPFNGEDYTCITFQPDLSKFKMQSLDKDIVALMVRRAYDIAGSTKDVKVFLNGNKLPVKGFRSYVDMYLKDKLDETGNSLKVIHEQVNH\
RWEVCLTMSEKGFQQISFVNSIATSKGGRHVDYVADQIVTKLVDVVKKKNKGGVAVKAHQVKNHMWIFVNALIENPTFDSQTKENMTLQPKSFGSTCQLSEKFIKAAIGCGIVESILNWVKFKAQVQLNKKCS',\
'TTYADFIASGRTGRRNAIHD','KVTMNDFDYLKLLGKGTFGKVILVREKATGRYYAMKILRKEVIIAKDEVAHTVTESRVLQNTRHPFLTALKYAFQTHDRLCFVMEYANGGELFFHLSRERVFTEERARFYGAEIVSA\
LEYLHSRDVVYRDIKLENLMLDKDGHIKITDFGLCKEGISDGATMKTFCGTPEYLAPEVLEDNDYGRAVDWWGLGVVMYEMMCGRLPFYNQDHERLFELILMEEIRFPRTLSPEAKSLLAGLLKKDPKQRLGGGPSDAKEVM\
EHRFFLSINWQDVVQKKLLPPFKPQVTSEVDTRYFDDEFTAQSITITPPDRYDSLGLLELDQRTHFPQFDYSASIR',\
'GAMDPKVTMNDFDYLKLLGKGTFGKVILVREKATGRYYAMKILRKEVIIAKDEVAHTVTESRVLQNTRHPFLTALKYAFQTHDRLCFVMEYANGGELFFHLSRERVFTEERARFYGAEIVSALEYLHSRDVVYRDIKLENL\
MLDKDGHIKITDFGLCKEGISDGATMKTFCGTPEYLAPEVLEDNDYGRAVDWWGLGVVMYEMMCGRLPFYNQDHERLFELILMEEIRFPRTLSPEAKSLLAGLLKKDPKQRLGGGPSDAKEVMEHRFFLSINWQDVVQKKLLP\
PFKPQVTSEVDTRYFDDEFTAQSITITPPDRYDSLGLLELDQREEQEMFEDFDYIADW']

#### Adding the Molecules

There are different ways molecules can be loaded into the Molecule Working set in OpenAD 

****Using the Load Molecules Function:****

In [None]:
%openad load molecules ?

In [None]:
%openad load molecules|mols from dataframe ?
%openad load molecules|mols from file ?

But today we will be simply adding them one by one using different identifers and retrieving the base molecule details from pubchem

For every Molecule  in the list add it to the Working Molecule set forcing through wihtout conformation

In [None]:

for i in drug_smiles:
    %openad add molecule {i} force 

#### Viewing the Molecules We have Added
Now we will use the Molecule Viewer in OpenAD to view our molecules.

Ths view an be enlarged into a different window and the Molecules can be individually selected,

As youb cursor over a box you will see some icons, choose the one that looks like a light bulb that is located right of each molecules box then select and display that molecule in 3D and its properties.

In [None]:
%openad show Molecules

Now we can also List the molecules or export them as a raw data frame. Below we will simply list

In [None]:
%openadd list mols



### Running our First Inference with Foundation Models for Small Molecules Discovery

Now we are going to run our first inference using the Foundation Models for Small Molecules Discovery, we have a select number of properties available.

Run the following command  to view the help showing available molecule properties in the service.

In [None]:
%openad  sm get molecule property ?

As we run the servces rather than respecifying our molecules in the command we can refer to them in the command by using `@mols` which is a shortcut to the list of molecules , we also haved added the clause ` merge with molecules` to merge the data back in to our molecule working set.

In [None]:
%openad sm get molecule property [ BACE, BBBP,  ESOL, FREESOLV, LIPOPHILICITY, QM7,  TOX21 ] for @mols merge with molecules

### Performing Drug Target Interaction Analysis (DTI)

In the following we are going to extract the list of molecules we have in our working set, place the SMILES for each in a list, then retrieve for each one their ESOL property we generated earlier and their Molecular weight and performan arbitary filtering of the molecules based on the below if statement. We remove those that do not meet the criteria from the list and run a DTI analysis against each molecule against our selected proteins that we defined earlier.

In [None]:
mols = %openadd export molecules

from pandas import DataFrame as pd
results =[]
for x in mols['canonical_smiles'].to_list():
    molecular_weight = %openadd @{x}>>molecular_weight
    ESOL = %openadd @{x}>>ESOL
    
    if 4.40 < float(molecular_weight) and ESOL > float(-5.0) :
        name = %openadd @'{x}'>>name
        display(Markdown(f"#### Drug Target Interaction Indicator for {name} : {x}"))
        result = %openadd mammal get protein property dti for {proteins} using (drug_smiles='{x}')
        result
        results.append(result)
    else:
        %openad remove molecule {x} Force

import pandas as pd
results = pd.concat(results, ignore_index=True)


### Now Lets display the Results as one Data frame

In [None]:
results

### Now Lets Review our Molecule Working Set

Notice the ones idenfified as removed are no longer there

In [None]:
%openad show molecules

### Visualizing Proteins 

Now Lets to finish off by using the protein viewer and generate the solubility for each of the proteins in our list

In [None]:
for protein in proteins:
    %openad show protein '{protein}'
    %openadd mammal get protein property sol for protein