# End to End Accelerated Discovery Short Demonstration

<div>
<img src="./media/genai.png" left-align style=" width: 500px; height: 300px"/>
</div>

To setup our services we will first catalog the services in our toolkit

The first service we will name 'gen' for our generation services then for our Property Prediction services we will catalog our Property services as 'prop'.

These two service names will be the Namespace prefix for their respective services.

### Catalog our Model Services:

***-First lets catalog our generative model  set of functions that include Paccmann,Reinvent, torch Drug and Guacamol services.***

Run the followng from your Openad Command line or from a notebook `%openad`

 `catalog model service from 'git@github.com:acceleratedscience/generation_inference_service.git' as 'gen'`
 
***-Secondly lets catalog the Property Prediction Services.***

Run the followng from your Openad Command line or from a notebook `%openad`

 `catalog model service from 'git@github.com:acceleratedscience/property_inference_service.git' as 'prop'`
 

***To start these two services you can run the following commands:***
 
 `model service up  'gen'`
 
 `model service up  'prop'`


## Working with OpenAD Magic Commands

When using Magic commands to access the Openad toolkit you have 2 options 

1. `%openad` provides a simple user interface that provides styled and formatted objects back to the notebook. Tables use pandas Dataframe Styler object. These can be converted back to data frame objects using `.data` on the object or using the in memory assistant which will copy the last result to a file , dataframe or to the dataviewer.
  When this is available you will see `Next up, you can run: result open/edit/copy/display/as dataframe/save [as '<filename.csv>']` in the output.
  
  This magic command is the recommended version to use as it willprovide all warning and results visually.
  
2. `%openadd` is the second form that allows you to return api style results in dataframe or list formats that can be used programatically for functions or flows in your notebook. This is good for prebuilt notebook process flows.

# Demonstration:

## Generate similar molecules to PFAS molecules with similar soluability and search patents including the generated molecules.

### Step 1: Use Deep Search to identify molecules related to PFAS and download their PubChem collection data

In [None]:
%openad set context ds4sd
df = %openadd search collection 'PubChem' for 'PFOA OR PFOS OR PFHxS OR PFNA OR HFPO-DA'


### Step 3: load molecules into our OpenAD molecule set

In [None]:
%openad load molecules using dataframe df

### Step 3: From the list of molecules generate additional properties not available from Deep Search PubChem COllection and update our molecule set using Openad Model Service

In [None]:
#get list of Smiles molecules
a_list = list(set(df['SMILES'].to_list()))

#Define list of Delta to be inferred properties
properties = ['is_scaffold', 'bertz', 'tpsa', 'logp', 'qed', 'plogp', 'penalized_logp', 'lipinski', 'sas', 'esol']

# Generate SMILES properties
properties = %openadd prop get molecule property {properties} for  {a_list} 
%openad merge molecules data using dataframe properties

### Lets Examine the available Molecules

In [None]:
mol_list = %openadd export molecules
%openad show molecules using dataframe mol_list

In [None]:
%openad display molecule 'Perfluorononanoic acid'

### Step 4: For each of the molecules use Regression Transformer to to generate similar molecules with similar soluability

In [None]:
datasets = []
for row in mol_list.to_dict("records"):
    MY_SMILES= row['canonical_smiles']
    esol= float(row['esol'])
    MY_PARAMS = { "fraction_to_mask": 0.1 , "property_goal": { "<esol>": esol} }
    print("Generating Molecules for "+MY_SMILES+" with soluability:"+str(row['esol']) )
    result = %openadd gen generate with RegressionTransformerMolecules data for $MY_SMILES sample 20 using(algorithm_version=solubility  search=sample temperature=1.5 tolerance=60.0 sampling_wrapper = "$MY_PARAMS" )
    display(result)
    datasets.append(result)

### Step 5 Now lets use Deep Search to search for patents that may contain some or all of these molecules 

In [None]:
x = 0
patents_to_search=[]
patented_molecules=[]
for result in datasets:  
    for mol in result['0'].to_list():
        x = %openadd search for patents containing molecule '{mol}'
        if isinstance(x,DataFrame):
            patents_to_search.extend(x["PATENT ID"].to_list())
            patented_molecules.append(mol)

str(patents_to_search)

### Step 6: Add Patented Molecules and generate properties for new molecules

In [None]:
properties_all = ['molecular_weight', 'number_of_aromatic_rings', 'number_of_h_acceptors', 'number_of_atoms','number_of_rings', 'number_of_rotatable_bonds', 'number_of_large_rings', 'number_of_heterocycles', 'number_of_stereocenters','is_scaffold', 'bertz', 'tpsa', 'logp', 'qed', 'plogp', 'penalized_logp', 'lipinski', 'sas', 'esol']

new_props = %openadd prop get molecule property {properties_all} for {patented_molecules} 

for x in patented_molecules:
    %openad add molecule {x} Force

%openad merge molecules data using dataframe new_props
%openad enrich molecules with analysis


In [None]:
Full_list = %openadd export molecules
%openad show molecules using dataframe Full_list


In [None]:
%openad set context rxn

In [None]:
%openad predict retrosynthesis  'O=S(=O)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F'

In [None]:
%openad display molecule 'O=S(=O)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F'