# Generation Models

<div>
<img src="./media/genai.png" left-align style=" width: 500px; height: 300px"/>
</div>



To setup our model generation services we will first catalog the service in our toolkit and name it 'gen' which is the name we will give the main services that use torch framework. and 'moler' for the generation service using the tensorflow framework.

These two service names will be the Namespace prefix for their respective services.

### Catalog our  Generation Models:

***-First lets catalog our generative model  set of functions that include Paccmann,Reinvent, torch Drug and Guacamol services.***

run the followng from your Openad Command line or from a notebook %openad

 `catalog model service from 'git@github.com:acceleratedscience/generation_inference_service.git' as 'gen'`
 
***-Secondly lets catalog the Moler molecule generation.***

run the followng from your Openad Command line or from a notebook %openad

 `catalog model service from 'git@github.com:acceleratedscience/moler_inference_service.git' as 'moler'`
 

***To start these two services you can run the following commands:***
 
 `model service up  'gen'`
 
 `model service up  'moler'`


Once the service is cataloged, we can set the service to start up with the `model service up` command.

In [None]:
%openad model service status

## Loading the Service Functions



Once the service is cataloged, we can refresh our commands by restarting OpenAD, or in a Notebook follow the below process.

to make the newly cataloged service functions visable and usable you will need to restart the toolkit, this can be done simply by restarting the Notebook kernel. 

This can be done by choosing the recycle icon in the tool bar. ![image.png](attachment:19d75859-9817-4c6a-9857-1097ba2aa3c4.png)

![image.png](attachment:762c912f-ee34-4845-a299-188ae920e81d.png)

### Working Model OpenAD Magic Commands.

When using Magic commands to access the Openad toolkit you have 2 options 

1. `%openad` provides a simple user interface that provides styled and formatted objects back to the notebook. Tables use pandas Dataframe Styler object. These can be converted back to data frame objects using `.data` on the object or using the in memory assistant which will copy the last result to a file , dataframe or to the dataviewer.
  When this is available you will see `Next up, you can run: result open/edit/copy/display/as dataframe/save [as '<filename.csv>']` in the output.
  
  This magic command is the recommended version to use as it willprovide all warning and results visually.
  
2. `%openadd` is the second form that allows you to return api style results in dataframe or list formats that can be used programatically for functions or flows in your notebook. This is good for prebuilt notebook process flows.

## Lets Get Started

The following commands will show you the respective functions enabled for `moler` and `gen`

In [None]:
%openad gen ?
%openad moler ?

## Moler Generator functions

These use Tensorflow and for GPU sharing purpses are kept on a separate system.

In [None]:
%openad moler generate with MoLeRDefaultGenerator data  sample 10

In [None]:
%openad moler generate with MolGXQM9Generator data sample 10

## Now lets look at hte main set of generational Algorithms cataloged under the Name 'gen'

In [None]:

print('running OrganGenerator')
%openadd gen generate with OrganGenerator data  for "{'target':''}"  sample 10

print('running VaeGenerator')
%openadd gen generate with VaeGenerator data  for "{'target':''}"  sample 10 

print('running AaeGenerator')
%openadd gen generate with AaeGenerator data  for "{'target':''}"  sample 10 

print('running TorchDrugGraphAF')
%openadd gen generate with TorchDrugGraphAF data  sample 10 using ( algorithm_version=zinc250k_v0 )

print('running TorchDrugGCPN')
%openadd gen generate with TorchDrugGCPN data  sample 10 using ( algorithm_version=zinc250k_v0 )

print('running PaccMannVAEGenerator')
%openadd gen generate with PaccMannVAEGenerator data  sample 10


### Regression Transformer

With Regression transfomer we can substitute variables carrying masks as per below.

In [None]:
mask = '<esol>-3.53|[Br][C][=C][C][MASK][MASK][=C][C][=C][C][=C][Ring1][MASK][MASK][Branch2_3][Ring1][Branch1_2]'
%openadd gen generate with RegressionTransformerMolecules data for '{mask}' Sample 5 \
USING (algorithm_version=solubility search=sample temperature=1.4 tolerance=5.0 ) 

When substituting into the command dictionaries we need to use an alternate substitution method using $ due to limits in notebook parsing.

In [None]:
MY_SMILES= 'C12C=CC=NN1C(C#CC1=C(C)C=CC3C(NC4=CC(C(F)(F)F)=CC=C4)=NOC1=3)=CN=2'
MY_PARAMS = { "fraction_to_mask": 0.1, "property_goal": { "<esol>": 0.234 } }
%openadd gen generate with RegressionTransformerMolecules data for $MY_SMILES sample 5 \
using(algorithm_version=solubility  search=sample temperature=1.5 tolerance=60.0 sampling_wrapper = "$MY_PARAMS" )

In [None]:
target = 'GSQEVNSNASPEEAEIARKAGATTWTEKGNKWEIRI'
target=f"<stab>[MASK][MASK][MASK][MASK][MASK]|{target}"
%openad gen generate with RegressionTransformerProteins data for '{target}' sample 1 \
using ( algorithm_version=stability search=greedy )

In [None]:
string = """
MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLASWNYNTNITEENVQ
NMNNAGDKWSAFLKEQSTLAQMYPLQEIQNLTVKLQLQALQQNGSSVLSEDKSKRLNTIL
NTMSTIYSTGKVCNPDNPQECLLLEPGLNEIMANSLDYNERLWAWESWRSEVGKQLRPLY
EEYVVLKNEMARANHYEDYGDYWRGDYEVNGVDGYDYSRGQLIEDVEHTFEEIKPLYEHL
HAYVRAKLMNAYPSYISPIGCLPAHLLGDMWGRFWTNLYSLTVPFGQKPNIDVTDAMVDQ
AWDAQRIFKEAEKFFVSVGLPNMTQGFWENSMLTDPGNVQKAVCHPTAWDLGKGDFRILM
CTKVTMDDFLTAHHEMGHIQYDMAYAAQPFLLRNGANEGFHEAVGEIMSLSAATPKHLKS
IGLLSPDFQEDNETEINFLLKQALTIVGTLPFTYMLEKWRWMVFKGEIPKDQWMKKWWEM
KREIVGVVEPVPHDETYCDPASLFHVSNDYSFIRYYTRTLYQFQFQEALCQAAKHEGPLH
KCDISNSTEAGQKLFNMLRLGKSEPWTLALENVVGAKNMNVRPLLNYFEPLFTWLKDQNK
NSFVGWSTDWSPYADQSIKVRISLKSALGDKAYEWNDNEMYLFRSSVAYAMRQYFLKVKN
QMILFGEEDVRVANLKPRISFNFFVTAPKNVSDIIPRTEVEKAIRMSRSRINDAFRLNDN
SLEFLGIQPTLGPPNQPPVSIWLIVFGVVMGVIVVGIVILIFTGIRDRKKKNKARSGENP
YASIDISKGENNPGFQNTDDVQTSF
"""
target_protein = "".join(filter(str.isalpha, (list(string))))


In [None]:
%openad gen generate with PaccMannRLProteinBasedGenerator data  for '{target_protein}' sample 20

### Other Options for Generation and Samplying

Other generation Functions are Demonstrated below, to understand the Parameters and required Parametera for both the Target and USING clause in the generator function use the online help to tell you.

In [None]:
%openad gen generate with SMILESLSTMPPOGenerator ?

In [None]:
%openad gen generate with PaccMannRLProteinBasedGenerator ?

In [None]:
%openadd gen generate with SMILESLSTMPPOGenerator data for "{'isomer_scorer': {'target': 5.0, 'target_smile': 'NCCCCC'}}" sample 20 using (num_epochs=2 episode_size=10 optimize_batch_size=2)

In [None]:
%openadd gen generate with CatalystGenerator data for 1 sample 20

In [None]:
 %openadd gen generate with CatalystGenerator data for 10 sample 20 using(number_of_points=32 number_of_steps=50 generated_length=100)

In [None]:
%openadd gen generate with SMILESLSTMHCGenerator data  for  "{'isomer_scorer': {'target': 5.0, 'target_smile': 'NCCCCC'}}"  \
sample 20 using( mols_to_sample=100 max_len=2 optimize_batch_size=3 n_epochs=2 random_start=True)

In [None]:
%openad gen generate with KeyBERTGenerator data for 'samples_per_protein: number of points sampled per protein.It has to be greater than 1. protein_embedding_encoder_params: parameter for the protein embedding encoder.mprotein_embedding_encoder: protein embedding encoder.'