# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Analyses 5 - Multiple Sequence Alignment**
Welcome to the ninth jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with performing multiple sequence alignment of your data using several different algorithms. This is not limited to only data from WormBase but the examples in the tutorial use worm data.
Let's get started!

As always, we start by importing any libraries that are required for the tutorials.

In [None]:
import requests
import sys
import json
import xml.dom.minidom

#### Read in the input data of multiple sequences from a fasta file

In [None]:
sequences = open("data/mutltiple_fasta.fa")
sequences = sequences.read()
print(sequences)

### Clustal Omega

Clustal Omega is an MSA tool that uses seeded guide trees and HMM profile-profile techniques to generate alignments that is suitable for medium-large alignments.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/clustalo/'

We can first explore which parameters can be used for multiple sequence alignment using the API. Out of these, only email, and input sequences are mandatory fields in most cases!

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

Now, we know the different parameters that can be used for running the alignment. We can then find out the details of each of these parameters, the type of data accepted, and the permissible values. 

You can change the parameter variable to whichever parameter you want to find details about.

In [None]:
parameter = 'guidetreeout'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

Now, we can request the API and run the alignment. Here, multiple parameters can be specified based on the details from the cells above. For Clustal Omega, the mandatory fields are - email and sequence. And after running the alignment, we can get the job id for getting the status of the job and also the results of the alignment.

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data = {"email" : "xyz@wormbase.org", "sequence" : sequences })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

It is easy to check the status of our job. Just run the code in the next cell.

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

The API gives different data as results. You can use the next cell to understand the different result types and which ones you need to output.

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

Assign the kind of output you would want to display and then request the API to return it. 

You can change the outtype variable to any of the identifiers mentioned in the previous output.

In [None]:
outtype = 'aln-clustal_num' 

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'phylotree' 

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'pim'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

And you are done!

We will check out some more tutorials, based on the other alignment algorithms!

### KALIGN

KALIGN is a very fast MSA tool that concentrates on local regions that is suitable for large alignments.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/kalign/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'gapext'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data = {"email" : "xyz@wormbase.org", "sequence" : sequences, "stype" : "dna"})

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out' 

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'aln-clustalw' 

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'phylotree' 

In [None]:
request = requests.get(server + 'result/ '+ jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'pim'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

### MAFFT

MAFFT is an MSA tool that uses Fast Fourier Transforms that is suitable for medium-large alignments.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/mafft/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'gapext'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data = {"email" : "xyz@wormbase.org", "sequence" : sequences })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'phylotree' 

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'pim'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

### MUSCLE

MUSCLE is an accurate MSA tool that is especially good with proteins. It is suitable for medium alignments.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/muscle/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'tree'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data = {"email" : "xyz@wormbase.org", "sequence" : sequences})

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'aln-clustalw'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'phylotree'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'pim' 

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

### PRANK

PRANK is a phylogeny-aware multiple sequence alignment program which makes use of evolutionary information to help place insertions and deletions.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/prank/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'gap_extension'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data = {"email" : "xyz@wormbase.org", "sequence" : sequences})

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'aln-hsaml'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'tree'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

### T-COFFEE

T-COFFEE is a consistency-based MSA tool that attempts to mitigate the pitfalls of progressive alignment methods. This is suitable for small alignments.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/tcoffee/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'order'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data = {"email" : "xyz@wormbase.org", "sequence" : sequences})

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'aln-clustalw'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'tree'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'phylotree'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

In [None]:
outtype = 'pim' 

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

This is the end of the fifth tutorial for WormBase data analysis! This tutorial dealt with using several algorithms for Multiple Sequence Alignment analyses.

In the next tutorial, we will perform Ontology analyses to better understand WormBase data!

Acknowledgements:
- EBI APIs for Multiple Alignment (https://www.ebi.ac.uk/Tools/msa/)