# Comparison, looping, and plotting output



## loops

A lot of times we use computer programs to simplify, or speed up a task that is repetitive, or tedious to do by hand (remember translating RNA to protein, or finding START codons?)

In order to do these things over and over and OVER, we use something called a 'loop'


In [2]:
# start with a list
x = [1,2,3,4,5,6,7]

# define a looping function:
def times2(listofnumbers):
    for number in listofnumbers:
        print(number*2)
        
# call that function on our list
times2(x)

2
4
6
8
10
12
14


# Biopython!

Biopython is a library that contains lots of functions relevant to biology and genetic computations.

In [5]:
from Bio.Alphabet import generic_dna
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment

align1 = MultipleSeqAlignment([
             SeqRecord(Seq("ACTGCTAGCTAG", generic_dna), id="Alpha"),
             SeqRecord(Seq("ACTCTA-GCTAG", generic_dna), id="Beta"),
             SeqRecord(Seq("ACTGCTAGDTAG", generic_dna), id="Gamma"),
         ])

my_alignments = [align1]

In [6]:
print(align1)

DNAAlphabet() alignment with 3 rows and 12 columns
ACTGCTAGCTAG Alpha
ACTCTA-GCTAG Beta
ACTGCTAGDTAG Gamma


In [7]:
from Bio import AlignIO
AlignIO.write(my_alignments, "my_example.phy", "phylip")

1

In [12]:
import Bio.Align.Applications
from Bio.Align.Applications import ClustalwCommandline
dir(Bio.Align.Applications)

['ClustalOmegaCommandline',
 'ClustalwCommandline',
 'DialignCommandline',
 'MSAProbsCommandline',
 'MafftCommandline',
 'MuscleCommandline',
 'PrankCommandline',
 'ProbconsCommandline',
 'TCoffeeCommandline',
 '_ClustalOmega',
 '_Clustalw',
 '_Dialign',
 '_MSAProbs',
 '_Mafft',
 '_Muscle',
 '_Prank',
 '_Probcons',
 '_TCoffee',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__']

In [13]:
help(ClustalwCommandline)

Help on class ClustalwCommandline in module Bio.Align.Applications._Clustalw:

class ClustalwCommandline(Bio.Application.AbstractCommandline)
 |  Command line wrapper for clustalw (version one or two).
 |  
 |  http://www.clustal.org/
 |  
 |  Example:
 |  --------
 |  
 |  >>> from Bio.Align.Applications import ClustalwCommandline
 |  >>> in_file = "unaligned.fasta"
 |  >>> clustalw_cline = ClustalwCommandline("clustalw2", infile=in_file)
 |  >>> print(clustalw_cline)
 |  clustalw2 -infile=unaligned.fasta
 |  
 |  You would typically run the command line with clustalw_cline() or via
 |  the Python subprocess module, as described in the Biopython tutorial.
 |  
 |  Citation:
 |  ---------
 |  
 |  Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA,
 |  McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD,
 |  Gibson TJ, Higgins DG. (2007). Clustal W and Clustal X version 2.0.
 |  Bioinformatics, 23, 2947-2948.
 |  
 |  Last checked against versions: 1.83 and 2.1


In [29]:
cline = ClustalwCommandline("/home/instructor1/STEM_camp_programming/clustalw-2.1/src/clustalw2", infile="test.fasta")

In [30]:
print(cline)
stdout, stderr = cline()

/home/instructor1/STEM_camp_programming/clustalw-2.1/src/clustalw2 -infile=test.fasta


In [31]:
from Bio import AlignIO
align = AlignIO.read("test.aln", "clustal")

In [33]:
print(align)


SingleLetterAlphabet() alignment with 4 rows and 12 columns
AAAAAAAAAAA- test1
-AAAAAAAAAAT test2
-AAAAAAAAAAC test3
-AAAAAAAAAGG test4
