# Supplementary Python code/demo for Ligand-Based Design class
## Ligand-based design and shape overlaps
### PharmSci 175/275

## Let's start with a reminder of how to do a simple shape overlay

In our 3D Structure/Shape lecture, early in this course, we already saw a simple example of shape overlays - we overlaid morphine onto tramadol. Let's repeat that here. Here's the preparation we did:

In [None]:
# Run cell if using collab

# Import condacolab python library and install condacolab (~5 minutes). 
# Rerun cell after crashing
!pip install --target=$nb_path -q condacolab
import condacolab
condacolab.install()

#check condacolab to ensure that it works
condacolab.check()

# install packages
!conda install -c openeye openeye-toolkits --yes
!pip install --extra-index-url https://pypi.org/simple --extra-index-url https://pypi.anaconda.org/openeye/simple/ -i https://pypi.anaconda.org/openeye/label/oenotebook/simple openeye-oenotebook

In [2]:
# Run cell if using collab

# Mount google drive to Colab Notebooks to access files
from google.colab import drive
drive.mount('/content/drive', force_remount = True)

Mounted at /content/drive


In [3]:
# Run cell if using collab

# Link openeye license to .bash_profile  
%%shell 
echo export OE_LICENSE="/content/drive/MyDrive/drug-computing/oelicense/oe_license.txt" >> ~/.bash_profile
source ~/.bash_profile



In [None]:
# Run cell if using collab

#set the OE_LICENSE environment variable to point to the license file
%env OE_LICENSE=/content/drive/MyDrive/drug-computing/oelicense/oe_license.txt
# Check the OE_LICENSE environment variable set
%env

In [1]:
#Import what we need
from openeye.oechem import * #General chemistry toolkit
from openeye.oeomega import * #Conformation toolkit
from openeye.oeiupac import * #Naming toolkit
from openeye.oeshape import * #Shape toolkit

#Let's first generate morphine and tramadol from their names
morphine= OEMol()
tramadol = OEMol()
OEParseIUPACName(morphine, 'morphine')
#OpenEye's toolkit won't recognize tranadol, so we'll use its IUPAC name instead
OEParseIUPACName(tramadol, '2-[(Dimethylamino)methyl]-1-(3-methoxyphenyl)cyclohexanol')

#We make sure Omega is loaded and initialized
#This time we want to consider multiple conformations, since the OpenEye shape toolkit will
#just do a rigid overlay of whatever conformations we give it onto one another. But here,
#at least one of the molecules has substantial flexibility, and we want to find the
#conformation which results in the best match

#Initialize class
omega = OEOmega() 
#Here we want to use more conformers if needed
omega.SetMaxConfs(100) 
#Set to false to pick random stereoisomer if stereochemistry is not specified
omega.SetStrictStereo(False) 
#Be a little loose about atom typing to ensure parameters are available to omega for all
#molecules
omega.SetStrictAtomTypes(False) 
#In this case the 'StrictStereo' parameter above will also matter. Since tramadol has an
#unspecified stereocenter, this will cause Omega to pick a random stereoisomer. Without 
#specifying this, Omega will refuse to do anything.

#Now let's generate 3D conformations for morphine and label it our 'reference' molecule:
refmol = morphine
omega(refmol)

#Now we generate 3D conformations for tramadol and label it our 'fitmol', 
#the molecule to be fit onto the reference
fitmol = tramadol
omega(fitmol)

True

### Instead of directly performing a shape overlay like last time, let's make a shape overlay function we can reuse later.

In [2]:
def FitMolToReference( fitmol, refmol, outfile = None, ShapeColor = True):
    """Takes two (multi-conformer) OpenEye molecules, and fits the first molecule onto
the second molecule. Normally the fitted molecule at least should be multi-conformer. 
The reference molecule can be multi-conformer or not, as desired
(this will typically depend on whether the active conformation is known).

INPUT:
  - fitmol: The molecule to be fitted (multi-conformer)
  - refmol: The molecule to fit onto (multi-conformer if desired)
  - outfile (optional): File name to write output molecular structure(s) of the fitted 
      molecule. Default is None. If not provided, no output is written.
  - ShapeColor (optional): Optionally make this a "shape plus color" search rather than
      just a shape search. Default is True. Specify False if desired.

OUTPUT:
  - tanimotos: Similarity scores for overlaid conformations (list). Runs from 0 to 1 if 
      pure shape is used, and 0 to 2 if shape+color is used.
  - fittedMol: Fitted OpenEye molecule
"""
  
    # Setup ROCS to provide specified number of conformers per hit
    options = OEROCSOptions()
    options.SetNumBestHits(10)
    options.SetConfsPerHit(100)
    
    #Adjust overlay options to not use color in addition to shape, if desired
    if not ShapeColor:
        ovOpts = OEOverlayOptions()
        ovOpts.SetOverlapFunc(OEGridShapeFunc())
        options.SetOverlayOptions(ovOpts)
    
    outfs = oemolostream(outfile)
    
    rocs = OEROCS(options)
    rocs.AddMolecule(fitmol) #Add our molecule as the one we are fitting

    # Loop over results and output
    tanimotos = []
    for res in rocs.Overlay(refmol):
        outmol = res.GetOverlayConfs() #Use GetOverlayConf to get just the best; GetOverlayConfs for all
        OERemoveColorAtoms(outmol)
        OEAddExplicitHydrogens(outmol)
        OEWriteMolecule(outfs, outmol)
        if ShapeColor:
            score = res.GetTanimotoCombo()
        else:
            score = res.GetShapeTanimoto()
        print("title: %s  tanimoto combo = %.2f" % (outmol.GetTitle(), score))
        tanimotos.append(score)
        outfs.close()
    return tanimotos, outmol, res

### Here's a bit of code which applies this as we did previously

In [3]:
scores, outmol, res = FitMolToReference(fitmol, refmol, ShapeColor = True)
for score in scores:
    print('%.2f' % score)

title: _15  tanimoto combo = 0.94
0.94




### You might want to try and see what happens if you set ShapeColor to False
It might be worth looking at how this changes the scores you get out, as well as how it would affect the structure of the overlaid molecule (you can pass an optional argument to the function in order to get this written out).

### You might also want to try a couple other shape (or shape+color) overlays
For example, what happens if you overlay isobutylbenzene onto catechol, or two molecules of your choice?

## Now let's revisit the sandbox we examined in the library searching class
### When we looked at library searching, we did a simple Lingo search to compute the similarity of a few molecules. 
Here's what that looked like:

In [4]:
#Initialize our query molecule
mol1 = OEMol()
queryname = 'benzoic acid'
OEParseIUPACName(mol1, queryname)

#Set up our lingo search based on the query
lingo = OELingoSim(mol1)

#Specify a cutoff we'll use for filtering
cutoff = 0.3

#Specify our database - what compounds do we want to look at?
names = ['phenol', 'toluene', 'benzene', 'naphthalene', 'ibuprofen', 'naproxen',
         'acetic acid', 'ammonia']

#Loop over our "database" and do our lingo comparison/search
for name in names:
    #Initialize this molecule
    mol2 = OEMol()
    OEParseIUPACName(mol2, name)
    #Do our lingo comparison
    sim = lingo.Similarity(mol2)
    
    #Check and see if it is a match; if so, do something
    if sim > cutoff:
        print("Similarity of %s to %s is %.2f" % (queryname, name, sim))
        #More generally, you could dump image files of all molecules matching,
        #or write them out to a file, or...

Similarity of benzoic acid to phenol is 0.50
Similarity of benzoic acid to ibuprofen is 0.57


### Below, make a new version of the above code to use a shape or shape+color comparison rather than Lingo
Three significant changes you will need to make are:
* Use Omega to generate 3D structures for your molecules before overlaying them, and at least the fitted molecule should have multiple conformations
* Swap the Lingo search for the FitMolToReference function from above
* Now, rather than getting a single similarity score, you'll get a list of them. Use the highest score (the first entry in the list) as the similarity score.

Be sure to test how using shape vs shape+color would affect your conclusions about which molecules are the most similar. (Note that Tanimoto scores with shape+color run from 0 to 2 rather than 0 to 1).

You may also wish to use the depiction options in `oenotebook` to draw the most similar compounds.