# SWRITHE

Before you use this notebook, make sure you have ran *make* in the directory to build necessary packages

In [None]:
import swrithe_tools as sw

### Specify your PDB code of interest below.

In [None]:
pdb_code="3F1L"

The function *get_pdb* retrieves a PDB file from RCSB, and saves it in a new folder named *molecules*

In [None]:
sw.get_pdb(pdb_code)

As standard we consider the first chain in the PDB file, run the below cell and check this is the chain you are interested in. If not, you can change the index in the first line.

In [None]:
chain_id=sw.get_chains_from_biotite(pdb_code)[0]
print("Analysing Chain",chain_id)

The next cell will compute the SKMT smoothed representation of your protein.\
Secondary structure prediction is performed by PSIPRED<sup>[1]</sup>

In [None]:
sw.skmt(pdb_code,chain_id)

The next cell allows you to view the SKMT smoothed representation of your protein

In [None]:
sw.plot_molecule_tube(pdb_code)

# Calculate the writhe fingerprint (writhe of all subsections)

In [None]:
sw.calculate_writheFP(pdb_code)

Plot the writhe as a function of length and check for any helical subsections 

In [None]:
sw.writhePlot(pdb_code,highlight_helical_subsections=True)

Visualise your molecule with the helical subsections highlighted

In [None]:
sw.view_molecule_helical("3F1L")

Plot the acn as a function of length

In [None]:
sw.acnPlot(pdb_code)

# Pairwise Comparison of Molecules

The first cell returns the mutually similar subsections, along with the percentage coverage of the respective molecules.\
The second cell will plot both curves, with the mutually similar subsections highlighted in the same colour.

In [None]:
sw.compare_molecules(pdb_code,"1P1X",0.05)

In [None]:
sw.view_similar_sections(pdb_code,"1P1X",0.05)

# Comparing a molecule to the current database (used in the paper)

This cell will run a full similarity comparison to the database used in the paper.\
*Note:* any new structures you have downloaded from the PDB using this notebook will also have been added to the database.

In [None]:
sw.compareToDatabase(pdb_code)

## Finding globally similar structures

If you would like to find proteins in our database that are globally similar to your own then run the following cell.\
The second parameter is the similarity cut off, which we set as 0.05 as standard.\
The third parameter is the percentage coverage by the similar subsections for both protein, we recommend setting this as 0.8 for globally similar proteins.

In [None]:
sw.find_globally_similar_proteins('3F1L',cutoff=0.05,pc_sim=0.8)

## Finding structures as domains

The following cell looks for proteins in the database for which your protein is structurally similar to a subset of.\
The second parameter is the similarity cut off, we set it as 0.1 to here to match the example given in the paper.\
The third parameter is the percentage coverage by the similar subsections of your comparison protein, in this instance we would set it as 1 to ask for the similar subsections to account for the entirety of your comparison protein.

In [None]:
sw.find_subset_similarities(pdb_code,cutoff=0.05,pc_sim=1.)

## Examples

### 1) Investigating the Rossmann Fold - TIM Barrel Relationship

First we compare 3F1L to the database at 80% shared coverage.

In [None]:
sw.compareToDatabase('3F1L')

Then we view the share of CATH topology classifications for the similar proteins.

In [None]:
sw.view_CATH_percentages('3F1L',0.05,0.8)

### 2) A knotted protein

First we compare 2RH3 (a trefoil knotted protein) to the database at 80% shared coverage

In [None]:
sw.compareToDatabase('2RH3')

In [None]:
sw.find_globally_similar_proteins('2RH3',cutoff=0.05,pc_sim=0.8)

In [None]:
sw.view_similar_sections('2RH3',"7YTT",0.05)

[1] Buchan DWA, Jones DT (2019). The PSIPRED Protein Analysis Workbench: 20 years on. Nucleic Acids Research. https://doi.org/10.1093/nar/gkz297