# SIFTS 

The `SIFTS` module of MIToS allows to obtain the residue-level mapping between databases stored in the SIFTS XML files. It makes easy to assign PDB residues to UniProt/Pfam positions.  
Given the fact that pairwise alignments can lead to misleading association between residues in both sequences, SIFTS offers  more reliable association between sequence and structure residue numbers.


## Features

- Download and parse SIFTS XML files
- Store residue-level mapping in Julia
- Easy generation of `Dict`s between residues numbers


In [1]:
# Truncate IJulia outputs at:
ENV["LINES"]   = 15 
ENV["COLUMNS"] = 60;

## Simplest residue-level mapping  

In [2]:
using MIToS.SIFTS

This module export the function `siftsmapping` to generate a `Dict` between residue numbers. This function takes 5 positional arguments. 1) The name of the SIFTS XML file to parse, 2) the source database 3) the source protein/structure identifier, 4) the destiny database and 5) the destiny protein/structure identifier. Optionally it’s possible to indicate a particular PDB `chain` and if `missings` will be used.  

Databases should be indicated using an available sub-type of `DataBase`. Keys and values types will be depend on the type of the residue number of that database.

| Type			| Database | Residue number type |
|---------------|----------|---------------------|
| `dbPDBe`		| **PDBe** (Protein Data Bank in Europe) | `Int` | 
| `dbInterPro`	| **InterPro** | `ASCIIString` |
| `dbUniProt`	| **UniProt** | `Int` |
| `dbPfam`		| **Pfam** (Protein families database) | `Int` |
| `dbNCBI`		| **NCBI** (National Center for Biotechnology Information) | `Int` |
| `dbPDB`		| **PDB** (Protein Data Bank) | `ASCIIString` |
| `dbCATH`		| **CATH** | `ASCIIString` |
| `dbSCOP` 		| **SCOP** (Structural Classification of Proteins) | `ASCIIString` |

To download the XML SIFTS file of a determined PDB use the `downloadsifts` function.

In [5]:
siftsfile = downloadsifts("1IVO")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 81 53644   81 43776    0     0   9652      0  0:00:05  0:00:04  0:00:01  9650

"1ivo.xml.gz"

100 53644  100 53644    0     0  11826      0  0:00:04  0:00:04 --:--:-- 16143


The following example, shows the residue number mapping between *Pfam* and *PDB*. *Pfam* uses *UniProt* coordinates and *PDB* uses their own residue numbers with insertion codes. Note that <span class="text-warning">the `siftsmapping` function is case sensitive</span>, and that <span class="text-warning">SIFTS stores PDB identifiers using lowercase characters</span>.  

In [26]:
siftsmap = siftsmapping( siftsfile, 
                    dbPfam, "PF00757", 
                    dbPDB, "1ivo", # SIFTS stores PDB identifiers in lowercase
                    chain="A", # In this example we are only using the chain A of the PDB
                    missings=false) # Residues without coordinates aren't used in the mapping

Dict{Int64,ASCIIString} with 162 entries:
  329 => "305"
  210 => "186"
  288 => "264"
  241 => "217"
  267 => "243"
  306 => "282"
  275 => "251"
  197 => "173"
  215 => "191"
  181 => "157"
  ⋮   => ⋮

<a href="#"><i class="fa fa-arrow-up"></i></a>

## Storing residue-level mapping  

If you need more than a dictionary mapping residues number between two databases, you could access all the residue-level cross references using the function `read` for a file of `SIFTSXML` `Format`. The `parse` function (and therefore the `read` function) for the `SIFTSXML` format, also takes the keyword arguments `chain` and `missings`. The `read`/`parse` function returns a `Vector` of `SIFTSResidue`s objects that stores the reference of that residue in each database.  

In [32]:
siftsresidues = read(siftsfile, SIFTSXML, chain="A", missings=false) # Array{SIFTSResidue,1}

residue_data = siftsresidues[300]

SIFTSResidue
  PDBe:
    number: 301
    name: LYS
  UniProt:
    id: P00533
    number: 325
    name: K
  Pfam:
    id: PF00757
    number: 325
    name: K
  NCBI:
    id: 9606
    number: 325
    name: K
  PDB:
    id: 1ivo
    number: 301
    name: LYS
    chain: A
  SCOP:
    id: 76847
    number: 301
    name: LYS
    chain: A
  CATH:
    id: 2.10.220.10
    number: 301
    name: LYS
    chain: A
    InterPro: [MIToS.SIFTS.dbInterPro("IPR009030","325","K","SSF57184")]


In [33]:
dump(residue_data)

MIToS.SIFTS.SIFTSResidue 
  PDBe: MIToS.SIFTS.dbPDBe 
    number: Int64 301
    name: ASCIIString "LYS"
  UniProt: Nullable{MIToS.SIFTS.dbUniProt} 
    isnull: Bool false
    value: MIToS.SIFTS.dbUniProt 
      id: ASCIIString "P00533"
      number: Int64 325
      name: ASCIIString "K"
  Pfam: Nullable{MIToS.SIFTS.dbPfam} 
    isnull: Bool false
    value: MIToS.SIFTS.dbPfam 
      id: ASCIIString "PF00757"
      number: Int64 325
      name: ASCIIString "K"
  NCBI: Nullable{MIToS.SIFTS.dbNCBI} 
    isnull: Bool false
    value: MIToS.SIFTS.dbNCBI 
      id: ASCIIString "9606"
      number: Int64 325
      name: ASCIIString "K"
  InterPro: Array(MIToS.SIFTS.dbInterPro,(1,)) [MIToS.SIFTS.dbInterPro("IPR009030","325","K","SSF57184")]
  PDB: Nullable{MIToS.SIFTS.dbPDB} 
    isnull: Bool false
    value: MIToS.SIFTS.dbPDB 
      id: ASCIIString "1ivo"
      number: ASCIIString "301"
      name: ASCIIString "LYS"
      chain: ASCIIString "A"
  SCOP: Nullable{MIToS.SIFTS.dbSCOP} 
    isnull