# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Analyses 5 - Multiple Sequence Alignment**
Welcome to the ninth jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with performing multiple sequence alignment of your data using several different algorithms. This is not limited to only data from WormBase but the examples in the tutorial use worm data.
Let's get started!

As always, we start by importing any libraries that are required for the tutorials.

In [1]:
import requests, sys, json
import xml.dom.minidom

#### Read in the input data of multiple sequences from a fasta file

In [107]:
sequences = open("data/mutltiple_fasta.fa")
sequences = sequences.read()
print(sequences)

>GOA-1
ATGGGTTGTACCATGTCACAGGAAGAGCGTGCCGCTCTTGAAAGATCACGAATGATTGAGAAAAATCTTAAAGAAGACGG
CATGCAAGCGGCAAAAGATATCAAACTGCTGCTACTTGGTGCAGGAGAATCAGGAAAATCGACTATTGTAAAACAGATGA
AAATTATTCACGAATCGGGATTCACAGCAGAAGACTACAAACAGTACAAGCCGGTTGTCTACAGTAACACGGTTCAATCA
TTGGTCGCTATTTTGCGAGCCATGAGCAACTTAGGCGTTTCATTTGGTTCGGCTGACAGAGAGGTAGATGCAAAATTAGT
GATGGATGTGGTGGCACGAATGGAGGACACAGAGCCATTCTCAGAAGAATTGCTCAGTTCAATGAAACGGTTGTGGGGAG
ACGCAGGTGTACAGGATTGTTTCTCAAGGAGTAACGAGTATCAATTGAATGATTCAGCCAAATATTTCCTTGACGACCTG
GAAAGGTTAGGAGAGGCAATATATCAACCAACTGAGCAAGATATTCTCCGAACTCGTGTCAAAACAACTGGTATTGTTGA
AGTTCACTTCACATTCAAAAATCTCAATTTCAAATTGTTCGATGTGGGAGGTCAAAGATCAGAAAGGAAGAAGTGGATTC
ATTGTTTCGAAGATGTTACTGCTATTATTTTCTGTGTTGCCATGTCAGAGTATGATCAAGTGTTGCACGAAGATGAGACA
ACAAACCGAATGCACGAATCGCTGAAGCTGTTCGATTCGATCTGTAATAACAAATGGTTCACAGATACATCGATTATTCT
GTTCCTGAACAAGAAGGATCTGTTTGAAGAGAAAATCAAGAAAAGCCCGTTAACGATCTGCTTCCCAGAATATTCAGGAC
GACAAGACTACCACGAGGCATCTGCGTATATTCAAGCACAATTTGAGGCTAAAAACAAATCAGCGAATAAGGAAATCTAT
TGCCACATGACATGTGCCACA

### Clustal Omega

Clustal Omega is an MSA tool that uses seeded guide trees and HMM profile-profile techniques to generate alignments that is suitable for medium-large alignments.

In [3]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/clustalo/'

We can first explore which parameters can be used for multiple sequence alignment using the API. Out of these, only email, and input sequences are mandatory fields in most cases!

In [4]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>guidetreeout</id>
	<id>dismatout</id>
	<id>dealign</id>
	<id>mbed</id>
	<id>mbediteration</id>
	<id>iterations</id>
	<id>gtiterations</id>
	<id>hmmiterations</id>
	<id>outfmt</id>
	<id>order</id>
	<id>stype</id>
	<id>sequence</id>
</parameters>



Now, we know the different parameters that can be used for running the alignment. We can then find out the details of each of these parameters, the type of data accepted, and the permissible values. 

In [5]:
parameter = 'guidetreeout' #Change the variable to whichever parameter you want to find details for

In [6]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Output guide tree</name>
	<description>Output guide tree.</description>
	<type>BOOLEAN</type>
	<values>
		<value>
			<label>yes</label>
			<value>true</value>
			<defaultValue>true</defaultValue>
		</value>
		<value>
			<label>no</label>
			<value>false</value>
			<defaultValue>false</defaultValue>
		</value>
	</values>
</parameter>



Now, we can request the API and run the alignment. Here, multiple parameters can be specified based on the details from the cells above. For Clustal Omega, the mandatory fields are - email and sequence. And after running the alignment, we can get the job id for getting the status of the job and also the results of the alignment.

In [7]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "sequence" : sequences })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

clustalo-R20210630-223141-0701-33480292-p2m


It is easy to check the status of our job. Just run the code in the next cell.

In [8]:
request = requests.get(server+'status/'+jobid)
print(request.text)

RUNNING


The API gives different data as results. You can use the next cell to understand the different result types and which ones you need to output.

In [12]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>Your input sequences as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>sequence</identifier>
		<label>Input Sequences</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The alignment in CLUSTAL format with base/residue numbering</description>
		<fileSuffix>clustal_num</fileSuffix>
		<identifier>aln-clustal_num</identifier>
		<label>Alignment in CLUSTAL format with base/residue numbering</label>
		<mediaType>text/x-clustalw-alignment</mediaType>
	</type>
	<type>
		<description>The phylogenetic tree</description>
		<fileSuffix>ph</fileSuffix>
		<identifier>phylotree</identifier>
		<label>Phylogenetic Tree</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>

Assign the kind of output you would want to display and then request the API to return it. 

In [13]:
outtype = 'aln-clustal_num' #Can be any of the identifiers mentioned in the previous output

In [14]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

CLUSTAL O(1.2.4) multiple sequence alignment


GOA-1       ------------------------------------------------------------	0
CCCT-1      ATGAATTTCAATGATATTGATAATCAAATGTACG---------GGAATTTGGAGGAGGAC	51
LIN-1       ATGAATCACATTGACCTTTTGAAGGTCAAAAAAGAGCCGCCGTCGAGTTCGGAAGAAGCC	60
DDX-10      ------------------ATGCCTCACACAA-----------------ACGGAAAAGGCG	25
                                                                        

GOA-1       ------------------------------------------------------------	0
CCCT-1      GCTGAGTTGCTCGCCGAGCTTGCCGCAATACAAGAAGAGGAGATGGGTCGTGTTAGCCGA	111
LIN-1       GAGGAAGAAGAAT-----CTCCGAAACATACGATTGAGGGAATTTTGGA--TATAAGAAA	113
DDX-10      GCGGAGCACAGAAATGGGCTGGAAATAATAAAAGAAAACGAAATTTCGA-----------	74
                                                                        

GOA-1       ------------------------------------------------------------	0
CCCT-1      CCGGCAGCTCCAGCTCGCGGAGCCCCACCAGCCGCCCGAGGCCGCCCAGCACCTGCCGCC	171
LIN-1       GAAAGAGATGAACGTCTC---------------

In [15]:
outtype = 'phylotree' #Can be any of the identifiers mentioned in the previous output

In [16]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

(
(
GOA-1:0.29037,
CCCT-1:0.24832)
:0.04668,
LIN-1:0.30512,
DDX-10:0.28297);



In [17]:
outtype = 'pim' #Can be any of the identifiers mentioned in the previous output

In [18]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

#
#
#  Percent Identity  Matrix - created by Clustal2.1 
#
#

     1: GOA-1       100.00   46.13   36.42   37.36
     2: CCCT-1       46.13  100.00   39.35   42.84
     3: LIN-1        36.42   39.35  100.00   41.19
     4: DDX-10       37.36   42.84   41.19  100.00



And you are done!

We will check out some more tutorials, based on the other alignment algorithms!

### KALIGN

KALIGN is a very fast MSA tool that concentrates on local regions that is suitable for large alignments.

In [19]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/kalign/'

In [20]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>stype</id>
	<id>format</id>
	<id>gapopen</id>
	<id>gapext</id>
	<id>termgap</id>
	<id>bonus</id>
	<id>sequence</id>
</parameters>



In [21]:
parameter = 'gapext'

In [22]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Gap Extension Penalty</name>
	<description>Penalty for extending a gap</description>
	<type>FLOAT</type>
	<values>
		<value>
			<label>0.85</label>
			<value>0.85</value>
			<defaultValue>true</defaultValue>
		</value>
	</values>
</parameter>



In [23]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "sequence" : sequences, "stype" : "dna"})

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

kalign-R20210630-223211-0254-58447288-p2m


In [24]:
request = requests.get(server+'status/'+jobid)
print(request.text)

FINISHED


In [25]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>Error messages produced by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>error</identifier>
		<label>Tool Error Details</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>Your input sequences as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>sequence</identifier>
		<label>Input Sequences</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The alignment in CLUSTAL format</description>
		<fileSuffix>clustalw</fileSuffix>
		<identifier>aln-clustalw</identifier>
		<label>Alignment in CLUSTAL format</label>
		<mediaType>text/x-clustalw-alignment</mediaType>
	</type>
	<type>
		<description>The phylogenetic tree</description>
		<fileSuffix>

In [26]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [27]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

CLUSTAL multiple sequence alignment by Kalign (2.0)


GOA-1      ATGGGTTGTA--------------------------------------------------
CCCT-1     ATGAATTTCAATGATATTGATAATCAAATGTACGGGAATTTGGAGGAGGACGCTGAGTTG
LIN-1      ATGAATCACA-----------------------------------------------TTG
DDX-10     ATGCCTCACA--------------CAAACGGAAAAGGCGGCGGAGCACAGAAATGGGCTG


GOA-1      --------CCATGTCACAGGAAGAGC-GTGCCGCTCTTGAAA------------------
CCCT-1     CTCGCCGAGCTTGCCGCAATACAAGAAGAGGAGATGGGTCGTGTTAGCCGACCGGCAGCT
LIN-1      -------ACCTTTTGAAGGTCAAAAAAGAGCCGCCGTCGAGT------------------
DDX-10     ------GAAATAATAAAAGAAAACGAAATTTCGATGGCGAGCAGGATCCAAGTGTCAGAA


GOA-1      ----GATCACGAA-----------TGATTGAGA------AAAATCTTAAA----------
CCCT-1     CCA-GCTCGCGGAGCCCCACCAGCCGCCCGAGGCCGCCCAGCACCTGCCGCCCCCGCAAA
LIN-1      ----TCGGAAGAAG---------CCGAGGAAGA------AGAATCTCCGA----------
DDX-10     AAGCGTTGAAGGAGAAGCGGCTTCTGAAAAAGAGAAAACAGGATTTAAAG-----GGACA


GOA-1      ----------------------------GAAGACGGCAT-------------------GC
CCCT

In [28]:
outtype = 'aln-clustalw' #Can be any of the identifiers mentioned in the previous output

In [29]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

CLUSTAL multiple sequence alignment by Kalign (2.0)


GOA-1      ATGGGTTGTA--------------------------------------------------
CCCT-1     ATGAATTTCAATGATATTGATAATCAAATGTACGGGAATTTGGAGGAGGACGCTGAGTTG
LIN-1      ATGAATCACA-----------------------------------------------TTG
DDX-10     ATGCCTCACA--------------CAAACGGAAAAGGCGGCGGAGCACAGAAATGGGCTG


GOA-1      --------CCATGTCACAGGAAGAGC-GTGCCGCTCTTGAAA------------------
CCCT-1     CTCGCCGAGCTTGCCGCAATACAAGAAGAGGAGATGGGTCGTGTTAGCCGACCGGCAGCT
LIN-1      -------ACCTTTTGAAGGTCAAAAAAGAGCCGCCGTCGAGT------------------
DDX-10     ------GAAATAATAAAAGAAAACGAAATTTCGATGGCGAGCAGGATCCAAGTGTCAGAA


GOA-1      ----GATCACGAA-----------TGATTGAGA------AAAATCTTAAA----------
CCCT-1     CCA-GCTCGCGGAGCCCCACCAGCCGCCCGAGGCCGCCCAGCACCTGCCGCCCCCGCAAA
LIN-1      ----TCGGAAGAAG---------CCGAGGAAGA------AGAATCTCCGA----------
DDX-10     AAGCGTTGAAGGAGAAGCGGCTTCTGAAAAAGAGAAAACAGGATTTAAAG-----GGACA


GOA-1      ----------------------------GAAGACGGCAT-------------------GC
CCCT

In [30]:
outtype = 'phylotree' #Can be any of the identifiers mentioned in the previous output

In [31]:
request = requests.get('server+'result/'+jobid+'/'+outtype)

print(request.text)

SyntaxError: invalid syntax (<ipython-input-31-450e1a314d8e>, line 1)

In [32]:
outtype = 'pim' #Can be any of the identifiers mentioned in the previous output

In [34]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

#
#
#  Percent Identity  Matrix - created by Clustal2.1 
#
#

     1: GOA-1       100.00   41.41   42.25   42.01
     2: CCCT-1       41.41  100.00   47.89   43.68
     3: LIN-1        42.25   47.89  100.00   46.87
     4: DDX-10       42.01   43.68   46.87  100.00



### MAFFT

MAFFT is an MSA tool that uses Fast Fourier Transforms that is suitable for medium-large alignments.

In [35]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/mafft/'

In [36]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>format</id>
	<id>matrix</id>
	<id>gapopen</id>
	<id>gapext</id>
	<id>order</id>
	<id>nbtree</id>
	<id>treeout</id>
	<id>maxiterate</id>
	<id>ffts</id>
	<id>stype</id>
	<id>sequence</id>
</parameters>



In [37]:
parameter = 'gapext'

In [38]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Gap Extension</name>
	<description>Penalty for each additional base/residue in a gap.</description>
	<type>FLOAT</type>
	<values>
		<value>
			<label>0.123</label>
			<value>0.123</value>
			<defaultValue>true</defaultValue>
		</value>
	</values>
</parameter>



In [39]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "sequence" : sequences })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

mafft-R20210630-223254-0640-67609151-p1m


In [40]:
request = requests.get(server+'status/'+jobid)
print(request.text)

RUNNING


In [41]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>Error messages produced by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>error</identifier>
		<label>Tool Error Details</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>Your input sequences as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>sequence</identifier>
		<label>Input Sequences</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The phylogenetic tree</description>
		<fileSuffix>ph</fileSuffix>
		<identifier>phylotree</identifier>
		<label>Phylogenetic Tree</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The percent identity matrix output file</description>
		<fileSuffix>pim</fileSuffix>
		<identi

In [42]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [43]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

>GOA-1
ATGGGTTGTAC---------------CATGTCACAGGAAGAGCGTGCCGCTCTTGAAAGA
TCACGAATGATTG-----------------------AGAAAAATCTT-------------
-----------------AAAGAAGA------------------------CGGCATGCAAG
CGGCAAAAGATATCAAACTGCTGCTACTTGGTGCAGGAGAATCAGGAAAATCGACTATTG
TAAAACAGAT--------------------GAAAATTATTCACGAATC------------
--GGGATTCACAGCAGAAGACTACAAA---------------CAGTACAAGCCGGTTGTC
TA------------------------------------------CAGTAACACGGTTCAA
TCATTGGT--------------------------------------CGCTATTTTGCGAG
C-----------------------------------------------------------
----------------CATGAGCAACTTAGG--------------------CG-------
--------TTTCATTTGGTTCGGCTGACAGAGAGGTAGATGCAAAA--------------
------------------------------------------------------------
-------------------------------------------------TTAGTGATGGA
TGTGGTGGCACGAATGGAGGACACAGAGCCATTCTCAGAAGAATTGCTCAGTTCAA----
------------------------------------------------------------
------------------------------------------------------------
-----------------

In [44]:
outtype = 'phylotree' #Can be any of the identifiers mentioned in the previous output

In [45]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

(
GOA-1:0.23215,
(
CCCT-1:0.26662,
LIN-1:0.28812)
:0.03778,
DDX-10:0.22318);



In [46]:
outtype = 'pim' #Can be any of the identifiers mentioned in the previous output

In [47]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

#
#
#  Percent Identity  Matrix - created by Clustal2.1 
#
#

     1: GOA-1       100.00   45.43   45.11   54.47
     2: CCCT-1       45.43  100.00   44.53   48.16
     3: LIN-1        45.11   44.53  100.00   44.17
     4: DDX-10       54.47   48.16   44.17  100.00



### MUSCLE

MUSCLE is an accurate MSA tool that is especially good with proteins. It is suitable for medium alignments.

In [48]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/muscle/'

In [49]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>format</id>
	<id>tree</id>
	<id>sequence</id>
</parameters>



In [50]:
parameter = 'tree'

In [51]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Output Tree</name>
	<description>The guide tree to output</description>
	<type>STRING</type>
	<values>
		<value>
			<label>none</label>
			<value>none</value>
			<defaultValue>true</defaultValue>
		</value>
		<value>
			<label>From first iteration</label>
			<value>tree1</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>From second iteration</label>
			<value>tree2</value>
			<defaultValue>false</defaultValue>
		</value>
	</values>
</parameter>



In [52]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "sequence" : sequences})

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

muscle-R20210630-223316-0376-54356344-p1m


In [53]:
request = requests.get(server+'status/'+jobid)
print(request.text)

FINISHED


In [54]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>Your input sequences as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>sequence</identifier>
		<label>Input Sequences</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The alignment in CLUSTAL format</description>
		<fileSuffix>clw</fileSuffix>
		<identifier>aln-clustalw</identifier>
		<label>Alignment in CLUSTAL format</label>
		<mediaType>text/x-clustalw-alignment</mediaType>
	</type>
	<type>
		<description>The phylogenetic tree</description>
		<fileSuffix>ph</fileSuffix>
		<identifier>phylotree</identifier>
		<label>Phylogenetic Tree</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The percent identity matrix output file</description>
		<fileSuffix

In [55]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [56]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

Started Wed Jun 30 22:28:25 2021
/nfs/public/ro/es/appbin/linux-x86_64/muscle-3.8.31/muscle -in muscle-R20210630-223316-0376-54356344-p1m.sequence -verbose -log muscle-R20210630-223316-0376-54356344-p1m.output -quiet -clw -out muscle-R20210630-223316-0376-54356344-p1m.clw 
Alphabet DNA

MUSCLE v3.8.31 by Robert C. Edgar
http://www.drive5.com/muscle

Profile-profile score    SPN
Max iterations           8
Max trees                1
Max time                 (No limit)
Max MB                   4294966168
Gap open                 -400
Gap extend (dimer)       0
Gap ambig factor         0
Gap ambig penalty        -0
Center (LE)              0
Term gaps                Half
Smooth window length     7
Refine window length     200
Min anchor spacing       32
Min diag length (lambda) 24
Diag margin (mu)         5
Min diag break           1
Hydrophobic window       5
Hydrophobic gap factor   1.2
Smooth score ceiling     999
Min best col score       90
Min anchor score         90
SUEFF            

In [57]:
outtype = 'aln-clustalw' #Can be any of the identifiers mentioned in the previous output

In [58]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

CLUSTAL multiple sequence alignment by MUSCLE (3.8)


LIN-1           ATGAATCACATTGACCTT-----TTGAAGGTCAAAAAAGAGCCGCCGTCGAGTTCGGAAG
CCCT-1          ATGAATTTCAATGATATTGATAATCAAATGT-ACGGGA-AT------------TTGGAGG
GOA-1           ATGGGTTGTAC---------------CATGTCACAGGA-AGAGCGTGCCGCTCTTGAAAG
DDX-10          ATGCCTCACACAAACGGAAAAGGCGGCGGAGCACAGAA-ATGGGCTGGAAATAATAAAAG
                ***  *   *                      *    * *                 * *

LIN-1           AAGCCGAGGA---------------------------AGAAGAATCTCCGAAA-------
CCCT-1          AGGACGCTGAGTTGCTCGCCGAGCTTGCCGCAATACAAGAA--------GAGGAGATGGG
GOA-1           ATCACGAATGATTG-----------------------AGAAAAATCTTAAAGAAGACGG-
DDX-10          AAAACGAAATTTCGATGGCGAGCAGGATCCAAGTGTCAGAAAAGCGTTGAAGGAGAAGCG
                *   **                               ****         *         

LIN-1           -CATACGATTGAGGGAATTTTGGATATAA---GAAAGAAAGAGATGAACGTCTCAGACTT
CCCT-1          TCGTGTT---------------------AGCCGACCGGCAGCTCCAGCTCGCGGAGCCCC
GOA-1           -CAT

In [59]:
outtype = 'phylotree' #Can be any of the identifiers mentioned in the previous output

In [61]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

(
(
LIN-1:0.28136,
CCCT-1:0.24616)
:0.04648,
GOA-1:0.22248,
DDX-10:0.20761);



In [62]:
outtype = 'pim' #Can be any of the identifiers mentioned in the previous output

In [63]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

#
#
#  Percent Identity  Matrix - created by Clustal2.1 
#
#

     1: LIN-1       100.00   47.25   44.91   46.51
     2: CCCT-1       47.25  100.00   48.54   49.92
     3: GOA-1        44.91   48.54  100.00   56.99
     4: DDX-10       46.51   49.92   56.99  100.00



### PRANK

PRANK is a phylogeny-aware multiple sequence alignment program which makes use of evolutionary information to help place insertions and deletions.

In [64]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/prank/'

In [65]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>sequence</id>
	<id>data_file</id>
	<id>tree_file</id>
	<id>do_njtree</id>
	<id>do_clustalw_tree</id>
	<id>model_file</id>
	<id>output_format</id>
	<id>trust_insertions</id>
	<id>show_insertions_with_dots</id>
	<id>use_log_space</id>
	<id>use_codon_model</id>
	<id>translate_DNA</id>
	<id>mt_translate_DNA</id>
	<id>gap_rate</id>
	<id>gap_extension</id>
	<id>tn93_kappa</id>
	<id>tn93_rho</id>
	<id>guide_pairwise_distance</id>
	<id>max_pairwise_distance</id>
	<id>branch_length_scaling</id>
	<id>branch_length_fixed</id>
	<id>branch_length_maximum</id>
	<id>use_real_branch_lengths</id>
	<id>do_no_posterior</id>
	<id>run_once</id>
	<id>run_twice</id>
	<id>penalise_terminal_gaps</id>
	<id>do_posterior_only</id>
	<id>use_chaos_anchors</id>
	<id>minimum_anchor_distance</id>
	<id>maximum_anchor_distance</id>
	<id>skip_anchor_distance</id>
	<id>drop_anchor_distance</id>
	<id>output_ancestors</id>
	<id>noise_level</id>
	<id>stay_quiet</id>
	<id>random_seed</

In [66]:
parameter = 'gap_extension'

In [67]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Gap Extension Probability</name>
	<description>Gap Extension Probability</description>
	<type>FLOAT</type>
	<values>
		<value>
			<label>0.5</label>
			<value>0.5</value>
			<defaultValue>true</defaultValue>
		</value>
	</values>
</parameter>



In [68]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "sequence" : sequences})

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

prank-R20210630-223344-0871-56253367-p2m


In [77]:
request = requests.get(server+'status/'+jobid)
print(request.text)

FINISHED


In [78]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>Your input sequences as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>sequence</identifier>
		<label>Input Sequences</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The alignment from the first iteration in HSAML format</description>
		<fileSuffix>xml</fileSuffix>
		<identifier>aln-1-hsaml</identifier>
		<label>First iteration alignment in HSAML format</label>
		<mediaType>application/xml</mediaType>
	</type>
	<type>
		<description>The alignment from the second iteration in HSAML format</description>
		<fileSuffix>xml</fileSuffix>
		<identifier>aln-2-hsaml</identifier>
		<label>Second iteration alignment in HSAML format</label>
		<mediaType>application/xml</mediaType>
	</typ

In [79]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [80]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)



PRANK: aligning sequences in '/nfs/public/rw/es/projects/wp-jdispatcher/sources/prod/jobs/prank/rest/20210630/2232/prank-R20210630-223344-0871-56253367-p2m.sequence', writing results to '/nfs/public/rw/es/projects/wp-jdispatcher/sources/prod/jobs/prank/rest/20210630/2232/prank-R20210630-223344-0871-56253367-p2m.result.?.fas' [plain alignment] and '/nfs/public/rw/es/projects/wp-jdispatcher/sources/prod/jobs/prank/rest/20210630/2232/prank-R20210630-223344-0871-56253367-p2m.result.?.xml' [xml alignment] and '/nfs/public/rw/es/projects/wp-jdispatcher/sources/prod/jobs/prank/rest/20210630/2232/prank-R20210630-223344-0871-56253367-p2m.result.?.dnd' [guidetree].

Generating approximate guidetree.
aligning 0 to 1 (16%)                    aligning 0 to 2 (33%)                    aligning 0 to 3 (50%)                    aligning 1 to 2 (66%)                    

In [81]:
outtype = 'aln-hsaml' #Can be any of the identifiers mentioned in the previous output

In [82]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

<ms_alignment>
<newick>((seq1:0.16251,seq2:0.21452)#1#:0.03229,(seq3:0.18991,seq4:0.19120)#2#:0.05560)#3#:0.00000</newick>
<nodes>
<leaf id="seq1" name="CCCT-1">
  <sequence>
    ATGAATTTCA----------------------------------ATGATATTGATAATCAAATGTACGGGAATTTGGAGGAGGACGCTGAGTTGCTCGCCGAGCTTGCCGCAA--------------TACAAGAAGAGGAGATGGGTCGTGTTAGCCGACCGGCAGCTCCAGCTCGCGGAGCCCCACCAGCCGCCCGAGGCCGCCCAGCACCTGCCGCCCCCGCAAATGTCCCAGGATTGGACCCCCGTCTACTGGCAGCCGCCTTGGCAGATAATCATGGAGAT------GGAGGAGATGAAGAGCTTGAAATGGATGAAGATTTGCTCAATGAGCTCAATGGATTGGTTGGTGGTGGTGGTGGTGGTGGAGCAGCACCTAC----AGTACCCACAAGAGCTGCTCCGAGAGCTCCAGGACCTTCAGGACCACCG----------CCATCGGCTT----------CGGCGCCAAATTCTCAGCTTGGACACCT-------------------------GAAACAGCTTCATGTGTACTATATGAAAGCTCATAAGTCTGCAGAGCAAGCTGGTGAAGGGCCAAAAGCTAGAAGATACA------------------AACGAGCTGTTGACAAGCT---CGTGGAGCTCATTCGAGCTGTCGA------------------------------------------A---------CGTGGCAAGAC-----CATTGACGAATCGGAAATTCCTGTTGCTCCACCAAATTTCTCTTCGGCGGCGGCGGAACCCCTTC--CACCACCGGCTCCGACTGCCCAG

In [83]:
outtype = 'tree' #Can be any of the identifiers mentioned in the previous output

In [84]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

((CCCT-1:0.16251,LIN-1:0.21452):0.03229,(GOA-1:0.18991,DDX-10:0.19120):0.05560);



### T-COFFEE

T-COFFEE is a consistency-based MSA tool that attempts to mitigate the pitfalls of progressive alignment methods. This is suitable for small alignments.

In [85]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/tcoffee/'

In [86]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>format</id>
	<id>matrix</id>
	<id>order</id>
	<id>stype</id>
	<id>sequence</id>
</parameters>



In [87]:
parameter = 'order'

In [88]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Output Order</name>
	<description>The order in which the sequences appear in the final alignment</description>
	<type>STRING</type>
	<values>
		<value>
			<label>aligned</label>
			<value>aligned</value>
			<defaultValue>true</defaultValue>
			<properties>
				<property>
					<key>description</key>
					<value>Determined by the guide tree</value>
				</property>
			</properties>
		</value>
		<value>
			<label>input</label>
			<value>input</value>
			<defaultValue>false</defaultValue>
			<properties>
				<property>
					<key>description</key>
					<value>Same order as the input sequences</value>
				</property>
			</properties>
		</value>
	</values>
</parameter>



In [89]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "sequence" : sequences})

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

tcoffee-R20210630-223530-0871-4812339-p2m


In [94]:
request = requests.get(server+'status/'+jobid)
print(request.text)

FINISHED


In [95]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself but the same as the alignment in CLUSTAL format</description>
		<fileSuffix>clustalw</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/x-clustalw-alignment</mediaType>
	</type>
	<type>
		<description>Error messages produced by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>error</identifier>
		<label>Tool Error Details</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>Your input sequences as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>sequence</identifier>
		<label>Input Sequences</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The alignment in CLUSTAL format</description>
		<fileSuffix>clustalw</fileSuffix>
		<identifier>aln-clustalw</identifier>
		<label>Alignment in CLUSTAL format</label>
		<mediaType>text/x-clustalw-alignment</mediaType>
	</type>
	<typ

In [96]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [97]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

CLUSTAL W (1.83) multiple sequence alignment

GOA-1           ATGGGTTGTACCA----T------GT-----CACAGGAAGAGCGTGCCGC
CCCT-1          ATGAATTTCAATGA---TAT----TG---A-TAATCAAATGTACGGGAAT
LIN-1           ATGAATCACATTGAC----------------CTTTTGAA---------GG
DDX-10          ATGCCTCACACAAACGGAAAAGGCGGCGGAGCACAGAAATGGGCTGGAAA
                ***  *   *                           **           

GOA-1           TCTTGAAAGATCACGAAT---G-AT-------------------TGAGAA
CCCT-1          TTGGAGGAGGACGCTGAGTTGC-TCGCCGAGCTTGCCGCAA-TACAAGAA
LIN-1           TCAAAAAAG-------AG--CCGCCGTCGAGTTCG----G-----AAGA-
DDX-10          TAATAAAAGAAAACGAAATTTCGATGGCGAGCAGGATCCAAGTGTCAGAA
                *      **       *                             *** 

GOA-1           A-------------AA---TCT---T---------------------AAA
CCCT-1          G--AG------GAGATGGGTCGTGTTAGCCGACCGGCAGCTCCAGCTCGC
LIN-1           -AG--CCGAGGAAGAAGAATCT-CCGAAA--------CATACGATTGA--
DDX-10          AAGCGTTGAAGGAGAAGCGGCTTCTGAAA-AAGAGAAAACAGGATTTAAA
              

In [98]:
outtype = 'aln-clustalw' #Can be any of the identifiers mentioned in the previous output

In [99]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

CLUSTAL W (1.83) multiple sequence alignment

GOA-1           ATGGGTTGTACCA----T------GT-----CACAGGAAGAGCGTGCCGC
CCCT-1          ATGAATTTCAATGA---TAT----TG---A-TAATCAAATGTACGGGAAT
LIN-1           ATGAATCACATTGAC----------------CTTTTGAA---------GG
DDX-10          ATGCCTCACACAAACGGAAAAGGCGGCGGAGCACAGAAATGGGCTGGAAA
                ***  *   *                           **           

GOA-1           TCTTGAAAGATCACGAAT---G-AT-------------------TGAGAA
CCCT-1          TTGGAGGAGGACGCTGAGTTGC-TCGCCGAGCTTGCCGCAA-TACAAGAA
LIN-1           TCAAAAAAG-------AG--CCGCCGTCGAGTTCG----G-----AAGA-
DDX-10          TAATAAAAGAAAACGAAATTTCGATGGCGAGCAGGATCCAAGTGTCAGAA
                *      **       *                             *** 

GOA-1           A-------------AA---TCT---T---------------------AAA
CCCT-1          G--AG------GAGATGGGTCGTGTTAGCCGACCGGCAGCTCCAGCTCGC
LIN-1           -AG--CCGAGGAAGAAGAATCT-CCGAAA--------CATACGATTGA--
DDX-10          AAGCGTTGAAGGAGAAGCGGCTTCTGAAA-AAGAGAAAACAGGATTTAAA
              

In [100]:
outtype = 'tree' #Can be any of the identifiers mentioned in the previous output

In [101]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

((CCCT-1:0.07500,GOA-1:0.09500):0.01500,DDX-10:0.07500,LIN-1:0.10500);


In [102]:
outtype = 'phylotree' #Can be any of the identifiers mentioned in the previous output

In [103]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

(
(
GOA-1:0.20738,
CCCT-1:0.16415)
:0.09881,
LIN-1:0.25159,
DDX-10:0.19923);



In [104]:
outtype = 'pim' #Can be any of the identifiers mentioned in the previous output

In [105]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

#
#
#  Percent Identity  Matrix - created by Clustal2.1 
#
#

     1: GOA-1       100.00   62.85   42.64   51.04
     2: CCCT-1       62.85  100.00   50.13   52.20
     3: LIN-1        42.64   50.13  100.00   54.92
     4: DDX-10       51.04   52.20   54.92  100.00



This is the end of the fifth tutorial for WormBase data analysis! This tutorial dealt with using several algorithms for Multiple Sequence Alignment analyses.

In the next tutorial, we will perform Ontology analyses to better understand WormBase data!