## About this notebook
The goal of this notebook is to perform docking analysis.

The general workflow for docking analysis starts with the output .mae files from docking:

1) From each unzipped .mae file, rename the title for each pose to contain a mark/label to indicate which pose it is
    - NEW* This step is necessary so that poses for the same compound can have some mark or label that distinguishes it in a pdb file
    
2) Convert from .mae to .pdb in command line

3) Use PyMol to calculate the rmsd of each pose to reference ligand (lowest rmsd is best)

4) Get gscore corresponding to the pose in the pdb and put it in a table with the rmsd and compound ID

Functions in this notebook (see block #4):
1) get_mae_gscore
    - Input: glob of mae filepaths
    - Output: dataframe of 2 columns containing compound ID and gscore
2) rename_mae_poses
    - Input: glob of mae filepaths
    - Output: None. Rewrites the input mae files as
    - It might be better to move this to a separate .py executable
3) align_rms
    - Input: reference and target pdb
    - Output: rms

In [7]:
from pymol import cmd
import glob
import pandas as pd

In [8]:
# glob all mae files into single list
tck_8088 = glob.glob('/home/aguan/docking/docking52tcksXP088/glide-dock_XP_JJC8088_52tcks-000*_raw.mae')
tck_8091 = glob.glob('/home/aguan/docking/docking52tcksXP088/glide-dock_XP_JJC8091_52tcks-000*_raw.mae')

In [17]:
# Given a glob of mae filepaths
# return a dataframe with 2 columns: compound name and gscore value
def get_mae_gscore(maes):
    gscore = []
    title = []
    for mae in maes:
        with open(mae, 'r') as f:
            lines = f.readlines()
            bar = []
            gscore_i = []
            header = []
            title_i = []
            # Grab relevant line indexes
            for i in range(len(lines)):
                if "f_m_ct" in lines[i]:
                    header.append(i)
                if ':::' in lines[i]:
                    bar.append(i)
                if 'gscore' in lines[i]:
                    gscore_i.append(i)
                if 'title' in lines[i]:
                    title_i.append(i)

            # Grab gscore and compound name
            for i in range(len(header)):
                # Index of gscore relative to header line
                x = gscore_i[i] - header[i]
                # Index of bar/section separator immediately after current header line
                y = min(a for a in bar if a > header[i])

                gscore.append(float(lines[x+y]))

                # Index of compound name relative to header line
                x = title_i[i] - header[i]

                title.append(lines[x+y].replace("\n", "").replace(" ", ""))
            f.close()
                
    data = {"Name":title, "gscore":gscore}
    return pd.DataFrame(data)

def rename_mae_poses(maes):
    for mae in maes:
        with open(mae, 'r') as f:
            lines = f.readlines()
            header = []
            bar = []
            file_index = []
            title_i = []
            for i in range(len(lines)):
                if "f_m_ct" in lines[i]:
                    header.append(i)
                if ':::' in lines[i]:
                    bar.append(i)
                if 'Source_File_Index' in lines[i]:
                    file_index.append(i)
                if 'title' in lines[i]:
                    title_i.append(i)
            
            for i in range(len(header)):
                x = file_index[i] - header[i]
                y = min(a for a in bar if a > header[i])
                z = int(lines[x+y])
                
                x = title_i[i] - header[i]
                lines[x+y] = lines[x+y].replace("\n", "") + "_%d" % z
            f.close()
        
        with open("%s_renamed.mae" % mae.replace(".mae", ""), 'w') as w:
            w.writelines(lines)
            w.close()

def align_rms(target1, target2):
    cmd.load(target1, "t1")
    cmd.load(target2, "t2")
    cmd.align("t1", "t2")
    _rms = cmd.rms("t1 and resn unk", "t2 and resn unk")
    cmd.delete("all")
    return _rms

In [18]:
rename_mae_poses(tck_8088)