 # BIOPYTHON
 --------------------

### Introducción

Our objective in this activity is to create phylogenetic trees and to make different queries about them, such as knowing which is the common ancestor. ç poner después más cosas cuando clara haya hecho su parte.

To do this, first of all, we will look for examples of proteins that are relevant because of their functions or applications and carry out the study on them. When we have their sequences, we will align them to find out the degree of similarity and, finally, we will create the phylogenetic trees on which we will work. 

Using these phylogenetic trees, we'll use some of the functions included in the 'Phylo' section in BioPython to showcase the way they work and their many uses.

### Protein search and reading

On the search for proteins, we considered it appropriate to select two different ones: 

    - Beta casein: it is a major protein component of milk and, in conjunction with the other caseins, it is assembled into micelles. The casein micelles determine many of the physical characteristics of milk, which are important for stability during storage and for milk-processing properties. 
    - Pyrimidine/purine Nucleoside Phosphorylase: it is a protein that catalyzes the phosphorolysis of diverse nucleosides, yielding D-ribose 1-phosphate and the respective free bases. It can use uridine, adenosine, guanosine, cytidine, thymidine, inosine and xanthosine as substrates. It also catalyzes the reverse reactions.
    
For both cases, we have selected 10 organisms which have that protein. In the case of beta casein, these are organisms such as humans, gorillas and horses, among others; while in the second case, we have selected different species of bacteria. 


In [2]:
def read_multiFasta(protein_file):
    fastas = {}
   
    file = open(protein_file)
   
    for line in file:
        
        if (line[0]==">"):
            head=line[1:]
            seq=""
            
        else:
            line=line.rstrip("\r")
            seq+=line.rstrip("\n")
            fastas[head]=seq
        
    return fastas

In [3]:
read_multiFasta("BetaCasein.txt")

{'sp|P05814|CASB_HUMAN Beta-casein OS=Homo sapiens OX=9606 GN=CSN2 PE=1 SV=4\n': 'MKVLILACLVALALARETIESLSSSEESITEYKQKVEKVKHEDQQQGEDEHQDKIYPSFQPQPLIYPFVEPIPYGFLPQNILPLAQPAVVLPVPQPEIMEVPKAKDTVYTKGRVMPVLKSPTIPFFDPQIPKLTDLENLHLPLPLLQPLMQQVPQPIPQTLALPPQPLWSVPQPKVLPIPQQVVPYPQRAVPVQALLLNQELLLNPTHQIYPVTQPLAPVHNPISV',
 'tr|G3RZH2|G3RZH2_GORGO Beta-casein OS=Gorilla gorilla gorilla OX=9595 PE=3 SV=1\n': 'MKVLILACLVALALARETVESLSSSEESITEYKQKVEKVKHEDQQQGEDEHQDKIYHSFQPQPLIYPFVEPIPYGFLPQNILPLAQPAVMLPVPQPEIMEVPKAKDTVYTKGRVMPVLKSPTMPFFDPQIPKLTDLENLHLPLPLLQPLMQQVPQPIPQTLALPSQPLWSVPQPKVLPIPQQVVPYPQRAVPVQALLLNQELLLNPTHQIYPVTQPLAPVHNPISV',
 'tr|H2QPK9|H2QPK9_PANTR Beta-casein OS=Pan troglodytes OX=9598 GN=CSN2 PE=3 SV=1\n': 'MKVLILACLVALALARETVESLSSSEESITEYKQKVEKVKHEDQQQGEDEHQDKIYPSFQPQPLIYPFVEPIPYGFLPQNILPLAQPAVVLPVPQPEIMEVPKAKDTVYTKGRVMPVLKSPTMPFFDPQIPKLTDLENLHLPLPLLQPLMQQVPQPIPQTLALPPQSLWSVPQPKVLPIPQQVPYPQRAVPVQALLLNQELLLNPTHQIYPVTQPLAPVHNPISV',
 'tr|A0A2R8Z5M4|A0A2R8Z5M4_PANPA Beta-casein OS=Pan panisc

In [7]:
read_multiFasta("PyrimidinePurineNucleosidePhosphorylase.txt")

NameError: name 'read_multiFasta' is not defined

### ClustalW2

In order to create the phylogenetic trees, we are going to use ClustalW2. ClustalW2 is the command line interface of Clustal, a computer program for performing multiple sequence alignments and creating phylogenetic trees.

In our project, we will implement ClustalWrapper, a class which will use biopython functions of Aligment tools section, and whose purpose is to be an adapter for ClustalW2. 

#### Download and install of ClustalW2

First, we will show the steps for downloading ClustalW2:

1. Enter in Clustal W official website in: <a href="http://www.clustal.org/clustal2/">"http://www.clustal.org/clustal2/">  
   

Then download cluatalw suited for your computer.

<img src="DescargarEjecutable.png">

2. Use the setup file to install ClustalW2. Pay attention to the installation location.    
    

<img src= "InstalaciónClustalW2.jpg">

3. Now, we will train the ClustarW2 application with 'opuntia.fasta', a fasta file with some DNA sequences.

<img src= "CargaOpuntiaFileClustalW2.png">

After choosing 'Do complete multiple alignment now Slow/Accurate', the application generates two file: opuntia.aln, with the alignment, and opuntia.dnd, with the phylogenetic tree.

#### Adapter class

We want to use ClustalW2 from our python project for creating phylogenetic trees. For this purpose, we use biopython functions in the class ClusterWrapper. 

When we initialize ClustaWrapper, the only input parameter is the path, where the file clustal.exe is. We do it because we use *os* package instead of modifying the path. Then, the class has the function *generate_aligment_tree*, which needs the fasta file with the sequence, and uses ClustalW2 to create the alignment and the phylogenetic tree. Subsequently, we will use this tree to show some of the functions included in the 'Phylo' section in BioPython.


In [3]:
import os
from Bio.Align.Applications import ClustalwCommandline


class ClustalWrapper:
    def __init__(self, clustal_exe_path):
        self.clustalw_exe = clustal_exe_path

    def generate_alignment_tree(self, fasta_file):
        try:
            clustalw_cline = ClustalwCommandline(self.clustalw_exe, infile=fasta_file, seqnos="ON")
            assert os.path.isfile(self.clustalw_exe), "Clustal W executable missing"
            stdout, stderr = clustalw_cline()
        except:
            raise Exception("ClustalW2 did not work")
