# Introduction:

## The world of transporters and the mechanism of substrate translocation.

Transporters are membrane-embedded proteins that translocate a wide range of substrates - ions, small molecules, big molecules - across the membrane. Primary active transporters utilize energy in the form of ATP to achieve this process, whereas secondary active transporters rely on the electrochemical gradient of one substrate to power the movement of the other. Regardless of the source of energy, all transporters need to undergo certain conformational motions to bind, translocate and release their substrates. It has been proposed 




In [1]:
# Name: difference_in_CB_distances.py
# Authors: I would like to acknowledge that a post-doc helped me write this coe
# Date of final version: 12 Dec 2017
# Description: Given two PDB files of the same protein (which represent two conformational states), it calculates all distances between their CB atoms, outputs them and then outputs a file with the difference between the distances in the two files (diff_CB_dist.txt)

import os
import numpy as np
#Write a function that reads two pdb file, extracts CB coordinates for all residues in them (with the exception of glycines, which don't have CB atoms, so their CA coordinates are taken instead), appends them to two separate arrays and calculates the distances between CB atoms for these two pdb files. It then compares the distances between the two structures and outputs a file containing their differences.

def output_distances(input,output): # input is the path to the 'in' file, output is the path to the 'out' file
    f=open(input,"r") #open and read the file
    coord=[] # empty array which will store indices and coordinates of the CB atoms
    for line in f: #this loop extracts coordinates
        if line.startswith("ATOM") and line[12:16].strip()=='CB': #accessing the lines of the pdb file that list atomic coordinates
            x,y,z=float(line[30:38]),float(line[38:46]),float(line[46:54])
            coord.append((line[22:26].strip(),x,y,z)) #appending the residue number and its CB coordinates to the array
        elif line.startswith("ATOM") and line[12:16].strip() =='CA' and line[17:20].strip()=='GLY': #a separate loop for glycines: doing the same thing but for their CA atoms
            x,y,z=float(line[30:38]),float(line[38:46]),float(line[46:54])
            coord.append((line[22:26].strip(),x,y,z)) 
    f.close()
    
    #this loop goes through the coord file and calculates all the distances between atoms, storing them in a matrix called 'dist'
    dist1=[]  # list of the form [(2,[0.0,5.43,2.41,...]),(3,[1.7,0.0,14.23,5.6,...),...]; i.e. each element is (#residue_id, [distance to CB, distance to CB, ...])
    for i, elem in enumerate(coord):
        tmp=[]
        for j in coord:
            dist=(np.sqrt(((elem[1]-j[1])**2+(elem[2]-j[2])**2+(elem[3]-j[3])**2)))  # stores the distance between the current residue's CB atom and the CB atoms of all other residues in the protein
            tmp.append(dist)
        dist1.append((elem[0],tmp))
#Adding labels to dist1 
    out=open(output,"w")
    out.write("ID\t")
    for i, elem in enumerate(dist1):
        out.write(elem[0]+'\t')
    out.write('\n')
    for i, elem in enumerate(dist1):
        out.write(elem[0]+'\t')
        for d in elem[1]:
            out.write(str(d)+'\t')
        out.write('\n')
    out.close()

    return dist1


# Define inputs and outputs of the function 
input1="Aligned_Model.pdb" #model of the inward-facing state, which is to be tested experimentally
output1=input1[:-4]+"_CB_dist_2 .txt" #all CB atom-atom distances  of Aligned_Model
input2="clean_struct.pdb" #X-ray structure of the outward-facing state
output2=input2[:-4]+"_CB_dist_2.txt" #call CB atom-atom distances of X-ray structure
general_out="diff_CB_dist_2.txt" #differences in the distances between the two structures

dist1=output_distances(input1,output1)
dist2=output_distances(input2,output2)
out=open(general_out,"w")
out.write("ID\t") #residue ID, i.e. label for the matrix indices
for i, elem in enumerate(dist1):
    out.write(elem[0]+'\t') #residue number labels in the matrix
out.write('\n')
for i, elem in enumerate(dist1):
    out.write(elem[0]+'\t')
    for j, d1 in enumerate(elem[1]):
        assert elem[0]==dist2[i][0], "Not the same protein! Check residue ids!" #This assures that only the distances between the same residues are compared, in case the two structures don't have the same number of residues
        out.write(str(float('%.3f'% abs(d1-dist2[i][1][j])))+'\t') #The difference in distances is converted to an absolute value
    out.write('\n')
out.close()


FileNotFoundError: [Errno 2] No such file or directory: 'Aligned_Model.pdb'

I ran the code in the terminal, and it worked, giving three files as output. Surprisingly, it was unable to open files in Jupyter Notebooks. I proceeded to use them with pandas to sort through the distances.