# PDB 

The module `PDB` defines types and methods to work with protein structures inside Julia. It is useful to link structural and sequential information, and needed for measure the predictive performance at protein contact prediction of mutual information scores. 

## Features  

- Read and parse PDF and PDBML files  
- Calculate distance and contacts between atoms or residues  
- Determine interaction between residues  


In [1]:
# Truncate IJulia outputs at:
ENV["LINES"]   = 15 
ENV["COLUMNS"] = 60;

In [2]:
using MIToS.PDB

INFO: Recompiling stale cache file /home/diego/.julia/lib/v0.4/MIToS.ji for module MIToS.


## Retrieve information from PDB database

This module exports the `downloadpdb` function, to retrieve a PDB file from  [PDB database<span class="fa fa-external-link" aria-hidden="true"></span>](http://www.rcsb.org/pdb/home/home.do). This function download a gzipped PDBML (`"xml"`) file, which could be easily read it with MIToS by default, but you are able to determine the `format` as `"pdb"` if you want it.

In [3]:
pdbfile = downloadpdb("1IVO", format="pdb")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  192k  100  192k    0     0   144k      0  0:00:01  0:00:01 --:--:--  144k


"1IVO.pdb.gz"

`PDB` module also exports a `getpdbdescription` to access the header information of a PDB entry.

In [4]:
getpdbdescription("1IVO")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   901  100   901    0     0   4419      0 --:--:-- --:--:-- --:--:--  4438


Dict{AbstractString,AbstractString} with 15 entries:
  "keywords"         => "TRANSFERASE/SIGNALING PROTEIN"
  "citation_authors" => "Ogiso, H., Ishitani, R., Nureki, O…
  "structureId"      => "1IVO"
  "status"           => "CURRENT"
  "release_date"     => "2002-10-16"
  "structure_author… => "Ogiso, H., Ishitani, R., Nureki, O…
  "nr_residues"      => "1350"
  "resolution"       => "3.30"
  "deposition_date"  => "2002-03-28"
  "last_modificatio… => "2011-07-13"
  ⋮                  => ⋮

<a href="#"><i class="fa fa-arrow-up"></i></a>

## Read and parse PDB files

This is easy using the `read` and `parse` functions, indicating the filename and the `Format`: `PDBML` for PDB `"xml"` files or `PDBFile` for usual `"pdb"` files. This functions returns a `Vector` of `PDBResidue` objects with all the residues in the PDB.  
To return only a specific subset of residues/atoms you can use any of the following keyword arguments:  

|keyword arguments | default | returns only ... |
|------------------|---------|-----------|
|`chain` | `"all"` | residues from a PDB chain, i.e. `"A"` |
|`model` | `"all"` | residues from a determined model, i.e. `"1"` |
|`group` | `"all"` | residues from a group: `"ATOM"`, `"HETATM"` or `"all"` for both |
|`atomname` | `"all"` | atoms with a specific name, i.e. `"CA"` |
|`onlyheavy` | `false` | heavy atoms (not hydrogens) if it's `true` |

In [5]:
# Read α carbon of each residue from the 1ivo pdb file, in the model 1, chain A and in the ATOM group.
CA_1ivo = read(pdbfile, PDBFile, model="1", chain="A", group="ATOM", atomname="CA")

CA_1ivo[1] # First residue. It has only the α carbon.

PDBResidue:
	id::PDBResidueIdentifier
		    PDBe_number          number            name           group           model           chain
		             ""             "2"           "GLU"          "ATOM"             "1"             "A"
	atoms::Vector{PDBAtom}	length: 1
		                                                  coordinates            atom         element       occupancy               B
		1:                MIToS.PDB.Coordinates(92.793,69.578,31.657)            "CA"             "C"             1.0        "151.39"


<a href="#"><i class="fa fa-arrow-up"></i></a>

## Looking for particular residues

MIToS parse PDB files to vector of residues, instead of using a hierarchical structure like other pages. This approach makes the search and selection of residues or atoms a little different. To make it easy, this module exports a number of functions and macros to find, select, collect particular residues or atoms.  
Given the fact that residue numbers from different chains, models, etc. can collide, **it's mandatory to to indicate the `model`, `chain`, `group`, `residue` number and `atom` name in a explicit way** to this functions or macros.  
If you want to select all the residues in one of the categories, you are able to use the wildcard `"*"`. You can also use regular expressions or functions to make the selections.

In [6]:
res_1ivo = read(pdbfile, PDBFile)

println("res_1ivo has all the ", length(res_1ivo), " residues.")

res_1ivo has all the 1204 residues.


### Getting a `Dict` of `PDBResidue`s

If you prefer a `Dict` of `PDBResidue`, indexed by their residue numbers, you can use the `residuedict` function or the `@residuedict` macro. 


In [7]:
# Dict of residues from the model 1, chain A and from the ATOM group

residuesdict(res_1ivo, "1", "A", "ATOM", "*")

DataStructures.OrderedDict{ASCIIString,MIToS.PDB.PDBResidue} with 511 entries:
  "2"  => PDBResidue:…
  "3"  => PDBResidue:…
  "4"  => PDBResidue:…
  "5"  => PDBResidue:…
  "6"  => PDBResidue:…
  "7"  => PDBResidue:…
  "8"  => PDBResidue:…
  "9"  => PDBResidue:…
  "10" => PDBResidue:…
  "11" => PDBResidue:…
  ⋮    => ⋮

In [8]:
@residuesdict res_1ivo model "1" chain "A" group "ATOM" residue "*"

DataStructures.OrderedDict{ASCIIString,MIToS.PDB.PDBResidue} with 511 entries:
  "2"  => PDBResidue:…
  "3"  => PDBResidue:…
  "4"  => PDBResidue:…
  "5"  => PDBResidue:…
  "6"  => PDBResidue:…
  "7"  => PDBResidue:…
  "8"  => PDBResidue:…
  "9"  => PDBResidue:…
  "10" => PDBResidue:…
  "11" => PDBResidue:…
  ⋮    => ⋮

### Select particular residues

Use the `residues` function to collect specific residues.

In [9]:
# Select all the residues of the model 1, chain A of the ATOM group with residue number less than 5

first_res = residues(res_1ivo, "1", "A", "ATOM", x -> parse(Int, match(r"^(\d+)", x)[1]) <= 5 )

for res in first_res
    println(res.id.name, " ", res.id.number)
end

GLU 2
GLU 3
LYS 4
LYS 5


Use the `@residues` macro for a cleaner syntax.

In [10]:
for res in @residues res_1ivo model "1" chain "A" group "ATOM" residue x -> parse(Int, match(r"^(\d+)", x)[1]) <= 5 
    println(res.id.name, " ", res.id.number)
end

GLU 2
GLU 3
LYS 4
LYS 5


### Select particular atoms

The `atoms` function or macro allow to select a particular set of …. .

In [11]:
# Select all the atoms with name starting with "C" 
# from all the residues of the model 1, chain A of the ATOM group

carbons = @atoms res_1ivo model "1" chain "A" group "ATOM" residue "*" atom r"C.+"

carbons[1]

                                       coordinates            atom         element       occupancy               B
       MIToS.PDB.Coordinates(92.793,69.578,31.657)            "CA"             "C"             1.0        "151.39"


In [12]:
atoms(res_1ivo, "1", "A", "ATOM", "*", r"C.+")[1]

                                       coordinates            atom         element       occupancy               B
       MIToS.PDB.Coordinates(92.793,69.578,31.657)            "CA"             "C"             1.0        "151.39"


<a href="#"><i class="fa fa-arrow-up"></i></a>

## Protein_contact_map

The PDB module offers a number of functions to measure `distance`s between atoms or residues, to detect possible interactions or `contact`s. In particular the `contact` function calls the `distance` function using a threshold or limit. The measure can be done between alpha carbons (`"CA"`), beta carbons (`"CB"`) (alpha carbon for glycine), any heavy atom (`"Heavy"`) or any (`"All"`) atom of the residues.

In the following **example**, where are going to plot a contact map for the *1ivo* chain A. Two residues will be considered in contact if their β carbons (α carbon for glycine) have a distance of 8Å or less.

In [18]:
pdb = @residues res_1ivo model "1" chain "A" group "ATOM" residue "*"

N = length(pdb)

cmap = spzeros(N,N)

for i in 1:(N-1)
    for j in i+1:N
        cmap[i,j] = cmap[j,i] = contact(pdb[i], pdb[j], 8.0, criteria="CB")
    end
end
    
cmap

511x511 sparse matrix with 5012 Float64 entries:
	[2  ,   1]  =  1.0
	[3  ,   1]  =  1.0
	⋮
	[505, 511]  =  1.0
	[509, 511]  =  1.0
	[510, 511]  =  1.0

In [19]:
using Plots
plotlyjs()

Plots.PlotlyJSBackend()

In [20]:
spy(cmap)