In [1]:
%reload_ext openad.notebooks.styles

<!-- Header banner -->
<div class="banner"><div>End to End Discovery</div><b>OpenAD <span>Tutorial</span></b></div>

# End to End Accelerated Discovery Short Demonstration

To setup our services we will first catalog the services in our toolkit

The first service we will name 'gen' for our generation services then for our Property Prediction services we will catalog our Property services as 'prop'.

These two service names will be the Namespace prefix for their respective services.

### Catalog our Model Services:

***-First lets catalog our generative model  set of functions that include Paccmann,Reinvent, torch Drug and Guacamol services.***

Run the followng from your Openad Command line or from a notebook `%openad`

 `catalog model service from 'git@github.com:acceleratedscience/generation_inference_service.git' as 'gen'`
 
***-Secondly lets catalog the Property Prediction Services.***

Run the followng from your Openad Command line or from a notebook `%openad`

 `catalog model service from 'git@github.com:acceleratedscience/property_inference_service.git' as 'prop'`
 

***To start these two services you can run the following commands:***
 
 `model service up  'gen'`
 
 `model service up  'prop'`


## Working with OpenAD Magic Commands

When using Magic commands to access the Openad toolkit you have 2 options 

1. `%openad` provides a simple user interface that provides styled and formatted objects back to the notebook. Tables use pandas Dataframe Styler object. These can be converted back to data frame objects using `.data` on the object or using the in memory assistant which will copy the last result to a file , dataframe or to the dataviewer.
  When this is available you will see `Next up, you can run: result open/edit/copy/display/as dataframe/save [as '<filename.csv>']` in the output.
  
  This magic command is the recommended version to use as it willprovide all warning and results visually.
  
2. `%openadd` is the second form that allows you to return api style results in dataframe or list formats that can be used programatically for functions or flows in your notebook. This is good for prebuilt notebook process flows.

# Demonstration:

## Generate similar molecules to PFAS molecules with similar soluability and search patents including the generated molecules.

### Step 1: Use Deep Search to identify molecules related to PFAS and download their PubChem collection data

In [14]:
df = %openadd ds search collection 'PubChem' for 'PFOA OR PFOS OR PFHxS OR PFNA OR HFPO-DA'

### Step 2: load molecules into our OpenAD molecule set

In [15]:
%openad load molecules using dataframe df

<span style="color: #090">Successfully loaded <span style="color: #dc0">6</span> molecules into the working set</span> <br> 


### Step 3: From the list of molecules generate additional properties not available from Deep Search PubChem COllection and update our molecule set using Openad Model Service

In [18]:
# get list of Smiles molecules
a_list = list(set(df["SMILES"].to_list()))

# Define list of Delta to be inferred properties
properties = ["is_scaffold", "bertz", "tpsa", "logp", "qed", "plogp", "penalized_logp", "lipinski", "sas", "esol"]

# Generate SMILES properties
properties = %openadd prop get molecule property {properties} for  {a_list}
%openad merge molecules data using dataframe properties

Output()

<span style="color: #090">Data merged into your working set</span> <br> 


### Let's Examine the available Molecules

In [19]:
%openad show molecules

In [20]:
%openad display molecule 'Perfluorononanoic acid'

Output()

Output()

### Step 4: For each of the molecules use Regression Transformer to to generate similar molecules with similar soluability

In [21]:
mol_list = %openadd export molecules
datasets = []
for row in mol_list.to_dict("records"):
    MY_SMILES = row["canonical_smiles"]
    esol = float(row["esol"])
    MY_PARAMS = {"fraction_to_mask": 0.1, "property_goal": {"<esol>": esol}}
    print("Generating Molecules for " + MY_SMILES + " with soluability:" + str(row["esol"]))
    result = %openadd gen generate with RegressionTransformerMolecules data for $MY_SMILES sample 10 using(algorithm_version=solubility  search=sample temperature=1.5 tolerance=60.0 sampling_wrapper = "$MY_PARAMS" )
    display(result)
    datasets.append(result)

Generating Molecules for O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F with soluability:-6.5519409880260815


Output()

Unnamed: 0,0,1
0,FC(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.8935
1,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(O)(F)F,<esol>-6.852
2,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(N)(F)F,<esol>-6.852
3,N=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(Cl)(F)C(F)(F)F,<esol>-6.852
4,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)Cl,<esol>-6.852
5,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(O)(F)C(F)(F)F,<esol>-6.852
6,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)N,<esol>-6.852
7,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)CF,<esol>-6.852
8,O=C(O)C(F)(F)C(F)(F)P(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.852
9,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(Br)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.852


Generating Molecules for O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F with soluability:-5.7534665357831125


Output()

Unnamed: 0,0,1
0,OC(O)C(F)(F)C(F)(F)C(F)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
1,ON(O)C(F)(F)C(F)(F)C(F)(C)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.852
2,OC(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.852
3,ON(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.852
4,OP(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.852
5,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)I,<esol>-6.852
6,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(O)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
7,O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(O)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
8,O=C(O)C(F)(F)C(F)(F)P(F)(F)C(F)(O)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
9,O=C(O)S(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.5096


Generating Molecules for O=C(O)C(F)(OC(F)(F)C(F)(F)C(F)(F)F)C(F)(F)F with soluability:-4.211602675454153


Output()

Unnamed: 0,0,1
0,O=C(O)C(F)(OC(F)(F)C(F)(F)C(F)(F)F)C(F)(F)C,<esol>-5.4819
1,O=C(O)C(F)(OC(F)(F)C(F)(F)C(F)(F)F)C(F)(F)Cl,<esol>-5.4681
2,O=C(O)C(F)(OC(F)(F)C(F)(F)C(F)(F)F)C(O)(F)F,<esol>-5.4681
3,O=C(O)C(F)(OC(F)(F)C(F)(F)C(F)(F)F)C(F)(F)Br,<esol>-5.4681
4,O=C(O)C(F)(OC(F)(F)C(F)(F)C(F)(F)F)C(N)(F)F,<esol>-5.4681
5,OC(O)C(F)(OC(F)(F)C(F)(F)C(F)(O)F)C(F)(F)F,<esol>-5.4681
6,OC(O)C(F)(OC(F)(F)C(F)(F)C(F)(F)F)C(F)(F)F,<esol>-5.4681
7,O=C(O)C(F)(OC(F)(Cl)C(F)(F)C(F)(F)F)C(F)(F)F,<esol>-5.4681
8,O=C(O)C(F)(OC(F)(O)C(F)(F)C(F)(F)F)C(F)(F)F,<esol>-5.4681
9,O=C(O)C(F)(OC(F)(F)C(F)(F)C(F)(F)F)C(F)(Cl)F,<esol>-5.4681


Generating Molecules for O=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F with soluability:-6.612922026752669


Output()

Unnamed: 0,0,1
0,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(C)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.8935
1,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.852
2,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.852
3,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(N)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.852
4,O=S(=O)(O)C(F)(F)C(F)(F)C(Cl)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.8935
5,N=S(=O)(O)C(F)(F)C(F)(F)C(Cl)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.8935
6,C=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.8935
7,S=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.8935
8,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.8935
9,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(O)F,<esol>-6.852


Generating Molecules for O=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F with soluability:-5.0159731222667325


Output()

Unnamed: 0,0,1
0,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(O)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
1,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(C)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
2,O=S(=O)(O)C(F)(F)C(F)(F)C(F)C(CF)F,<esol>-5.4681
3,O=S(=O)(O)C(F)(F)C(F)C(F)(O)C(Cl)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
4,OS(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
5,OC(=O)COC(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
6,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F),<esol>-5.4681
7,O=C(O)COC(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
8,O=S(=O)(O)C(F)(F)C(Cl)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
9,O=S(=O)(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)Cl,<esol>-5.4681


Generating Molecules for O=S(=O)([O-])C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F with soluability:-4.755208830342681


Output()

Unnamed: 0,0,1
0,O=S(=O)([O-])C(F)CC[NH]C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.5096
1,O=S(=O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.5096
2,N=S(=S)([O-])C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.8935
3,O=S(=O)([O-])C(F)(F)C(F)(F)C(F)(F)C(F)(O)C(F)(F)C(F)(F)F,<esol>-5.4681
4,O=S(=S)([O-])C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(O)(F)C(F)(F)F,<esol>-5.4681
5,O=S(=S)([O-])C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-6.8935
6,O=S(=O)([O-])C(F)(F)C(Cl)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.5096
7,O=S(=O)([O-])C(F)(F)C(C)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.5096
8,O=S(=O)([O-])C(F)(F)C(F)(F)C(O)(F)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681
9,O=S(=O)([O-])C(F)(F)C(F)(F)C(F)(O)C(F)(F)C(F)(F)C(F)(F)F,<esol>-5.4681


### Step 5 Now lets use Deep Search to search for patents that may contain some or all of these molecules 

In [22]:
x = 0
patent_count = 0
patents_to_search = []
patented_molecules = []
for result in datasets:
    for mol in result["0"].to_list():
        x = %openadd search for patents containing molecule '{mol}'
        if isinstance(x, DataFrame):
            patents_to_search.extend(x["PATENT ID"].to_list())
            patented_molecules.append(mol)

str(patents_to_search)

'[]'

### Step 6: Add Patented Molecules and generate properties for new molecules

In [23]:
properties_all = [
    "molecular_weight",
    "number_of_aromatic_rings",
    "number_of_h_acceptors",
    "number_of_atoms",
    "number_of_rings",
    "number_of_rotatable_bonds",
    "number_of_large_rings",
    "number_of_heterocycles",
    "number_of_stereocenters",
    "is_scaffold",
    "bertz",
    "tpsa",
    "logp",
    "qed",
    "plogp",
    "penalized_logp",
    "lipinski",
    "sas",
    "esol",
]

new_props = %openadd propd get molecule property {properties_all} for {patented_molecules}

for x in patented_molecules:
    %openad add molecule {x} Force

%openad merge molecules data using dataframe new_props
%openad enrich molecules with analysis

<span style="color: #d00">Unknown error</span> <br> 
<span style="color: #ccc">'new_props'</span> <br> 


<span style="color: #090">1/6 molecules in your working set have been enriched with the latest analysis results</span> <br> 
<span style="color: #ccc">Run `show mols` to view the updated working set</span> <br> 


In [24]:

%openad show molecules 


### Step 7: lets run retrosynthesis prediction against one of the molecules and display the molecule and what information we now know about it

In [25]:
%openad rxn predict retrosynthesis  'O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F'
%openad enrich molecules with analysis

Output()

<span style="color: #d00">No matching analysis results found for any of the molecules in your working set</span> <br> 


In [26]:
%openad display molecule 'O=C(O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F'