# <center>Circuit Topology script V1.0</center>

<center>Duane Moes - For suggestions and further questions: moesduane@gmail.com </center>

---
This is a fully automated script that mainly utilizes biopython to perform circuit topology analysis on a given set of proteins. When possible, try to use the mmCIF file system instead of the PDB file option, this is because PDB is outdated and more prone to missing atoms etc.

#### Packages used
<ul><li>BioPython</li>
<li>SciPy </li>
<li>NumPy</li>
<li>MatPlotlib</li>
<li>DSSP</li>
<li>Ipympl</li>
<li>ipywidgets</li>
</ul>


  
Run the code below to install all the needed dependencies. (only once!) <br>

      


In [None]:
%conda install --file requirements.txt

### User guide
<ul>
    <li>Either copy your <code>.PDB</code> or <code>.CIF </code> files to their respective maps in <code>/input_files/</code>, or enter the 4 letter protein codes in <code>input_files/protlist.txt</code> and run the first code block.
</li>
</ul>
<i>NOTE that when using a large number of proteins (>50), it is more efficient to use the batch download function from the </i>

__[RCSB Db](https://www.rcsb.org/downloads)__

In [1]:
from functions.plots.circuit_plot import circuit_plot
from functions.plots.matrix_plot import matrix_plot
from functions.plots.stats_plot import stats_plot

from functions.calculating.get_cmap import get_cmap
from functions.calculating.get_matrix import get_matrix
from functions.calculating.get_stats import get_stats
from functions.calculating.energy_cmap import energy_cmap
from functions.calculating.string_pdb import string_pdb
from functions.calculating.secondary_struc_cmap import secondary_struc_cmap
from functions.calculating.secondary_struc_filter import secondary_struc_filter

from functions.importing.retrieve_chain import retrieve_chain
from functions.importing.retrieve_cif import retrieve_cif
from functions.importing.retrieve_secondary_struc import retrieve_secondary_struc

from functions.exporting.export_psc import export_psc

from ipywidgets import widgets
import numpy as np 
import os
import matplotlib

%matplotlib qt

<ul><li>Enter the file type you're using, if you want to downoad the files you've entered in <code>input_files/protlist.txt</code>, indicate it here. </li></ul>

***Variable input***<br>
<code>fileformat</code>
<code>fetch_db</code>

<code>cutoff_distance</code>, maximal distance (Ångström) between two atoms that will count as an atom-atom contact.<br> 
<code>cutoff_numcontacts</code>, minimum number of contacts between two residues to count as a res-res contact. <br>
<code>length_filtering</code>, if length_filtering > 0, it is activated, input is the max contact distance. <br> 
<code>exclude_neighbour</code>, number of neighbours that are excluded from possbile res-res contacts. <br>

When <code>plot_figures = 1</code>, figures will also be saved in <code>results/</code>.<br>
<code>export_psc</code>, exporting the resulting PSC stats to a txt file. <code>results/statistics/psc</code>       (Overwrites a previous created file) 


In [3]:
#Format
fileformat =            'cif'
fetch_db =              0

#CT variables
cutoff_distance =       3.6
cutoff_numcontacts =    3
length_filtering =      0
energy_filtering =      0
exclude_neighbour =     4

plot_figures =          0
exporting_psc =         1

if energy_filtering:
    potential_sign = input("positive or negative filtering? (1/0)")  
    
if fileformat == 'cif' and fetch_db:  
    retrieve_cif()

***cmap***    - Atom-Atom contact map (<i>cutoff_distance</i>)<br>
***cmap2***  - Res-Res contact map<br>
***cmap3***  - Boolean Res-Res contact map (<i>cutoff_numcontacts</i>)<br>
***cmap4***  - Boolean Res-Res contact map with indicated secondary structures filtered out <br>
***cmap5***  - Boolean Secondary Structure - Secondary Structure contact map<br>


In [7]:
number_of_files = len(os.listdir('input_files/' +fileformat))

psclist = []

for num,files in enumerate(os.listdir('input_files/' +fileformat)):
    
    try:
        chain,file_path = retrieve_chain(files)
        print(f'{files} - {num+1}/{number_of_files-1}')
    except:
        continue

    #Step 1 - Draw a segment-segment based contact map 
    cmap3, cmap2, protid ,numbering, res_names, = get_cmap(chain, 
                                                            cutoff_distance, 
                                                            cutoff_numcontacts, 
                                                            length_filtering, 
                                                            exclude_neighbour)

    #Step 1.5 - Energy filtering
    if energy_filtering:
        cmap3 = energy_cmap(cmap3,numbering,res_names,potential_sign)
        protid = protid + '_(' + str(energy_filtering) + ')ef'
    
    #Step 2 - Draw a circuit topology relations matrix
    mat, c = get_matrix(cmap3,protid)
    
    #Step 3 - Circuit topology statistics
    psc, entangled = get_stats(mat,protid)
    psclist.append([protid,psc])
    
    #plotting
    if plot_figures:
        sitelist = circuit_plot(cmap2,protid,numbering,cutoff_numcontacts)
        matrix_plot(mat,protid)
        stats_plot(entangled,psc,protid)
        
    #exporting    
    if exporting_psc:
        export_psc(psclist)

1a5v.cif - 1/1


### Secondary structure tool
This function uses the DSSP tool to calculate the protein's secondary structure. <br> ***NOTE*** STRIDE and DSSP agree in 95,4% of the cases, DSSP tends to assign shorter secondary structures.  <br>https://en.wikipedia.org/wiki/STRIDE <br> 

It can be used to build a Sec. Struc - Sec. struc contact map, or filter out res-res contacts within a secondary structure.

* H - Alpha-Helix
* B - Isolated Beta-Bridge
* E - Strand
* G - 3-10 Helix
* I - Pi helix
* T - Turn
* S - Bend

In [9]:
sequence, structure = retrieve_secondary_struc(chain,file_path)

The following function uses the secondary structure to create a secondary structure-secondary structure based cmap (cmap5).<br> Keep in mind that this function overwrites certain variables.

In [None]:
cmap5,struc,segment,numbering = secondary_struc.cmap(chain,
                                                      seq,
                                                      struc,
                                                      cutoff_distance = 6,
                                                      cutoff_numcontacts = 10,
                                                      exclude_neighbour=0,
                                                      ss_elements = ['H','E','B','G'])

This function takes in a res-res contact map and filters out contacts that are within specified secondary structures,<code>filtered_structures</code>.

In [None]:
cmap4 = secondary_struc_filter(cmap3,
                              struc,
                              filtered_structures = ['H','G'],
                              ss_elements = ['H','E','B','G'])