# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Analyses 4 - Pairwise Sequence Alignment**
Welcome to the eighth jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with performing pairwise sequence alignment of your data using multiple algorithms. This is not limited to only data from WormBase but the examples in the tutorial use worm data.
Let's get started!

As always, we start by importing any libraries that are required for the tutorials.

In [None]:
import requests
import sys
import json
import xml.dom.minidom

#### Read in the input data from fasta files

In [None]:
sequence1 = open("data/fasta1.fa")
sequence1 = sequence1.read()
print(sequence1)

In [None]:
sequence2 = open("data/fasta2.fa")
sequence2 = sequence2.read()
print(sequence2)

### Global Sequence Alignment

#### EMBOSS Needle

EMBOSS Needle creates an optimal global alignment of two sequences using the Needleman-Wunsch algorithm.

We can first get the different parameters that can be used for alignment using the API. Out of these, only email, and input sequences are mandatory fields.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/emboss_needle/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

Now, we know the different parameters that can be used for running the alignment. We can then find out the details of each of these parameters, the type of data accepted, and the permissible values. 

You can change the parameter variable to whichever parameter you want to find details for.

In [None]:
parameter = 'gapopen' 

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

Now, we can request the API and run the alignment. Here, multiple parameters can be specified based on the details from the cells above. For EMBOSS Needle, the mandatory fields are - email, asequence, and bsequence. And after running the alignment, we can get the job id for getting the status of the job and also the results of the alignment.

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"},
                        data = {"email" : "xyz@wormbase.org", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

It is easy to check the status of our job. Just run the code in the next cell.

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

The API gives different data as results. You can use the next cell to understand the different result types and which ones you need to output.

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

Assign the kind of output you would want to display and then request the API to return it. 

The outtype variable can be assigned to any of the identifiers mentioned in the previous output.

In [None]:
outtype = 'out'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

And you are done!

We will check out some more tutorials, based on the other alignment algorithms!

#### EMBOSS Stretcher

EMBOSS Stretcher uses a modification of the Needleman-Wunsch algorithm that allows larger sequences to be globally aligned.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/emboss_stretcher/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'gapopen'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"},
                        data = {"email" : "xyz@wormbase.org", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out' 

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

### Local Alignment

#### EMBOSS Water

EMBOSS Water uses the Smith-Waterman algorithm (modified for speed enhancements) to calculate the local alignment of two sequences.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/emboss_water/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'gapopen'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)
 
if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"},
                        data = {"email" : "xyz@wormbase.org", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out'

In [None]:
request = requests.get('https://www.ebi.ac.uk/Tools/services/rest/emboss_water/result/' + jobid + '/' + outtype)

print(request.text)

#### EMBOSS Matcher

EMBOSS Matcher identifies local similarities between two sequences using a rigorous algorithm based on the LALIGN application.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/emboss_matcher/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'gapopen'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"},
                        data = {"email" : "xyz@wormbase.org", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out'

In [None]:
request = requests.get(server + 'result/' + jobid +'/' + outtype)

print(request.text)

### LALIGN

LALIGN finds internal duplications by calculating non-intersecting local alignments of protein or DNA sequences.

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/lalign/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'matrix'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"},
                        data = {"email" : "xyz@wormbase.org", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

### Genomic Alignment

#### GeneWise

GeneWise compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors.

Read in fasta files for the input protein sequence and also the genomic DNA sequence to which you are comparing it to.

In [None]:
prot = open("data/genewise_prot.fa")
prot = prot.read()
print(prot)

In [None]:
nucl = open("data/genewise_nucl.fa")
nucl = nucl.read()
print(nucl)

In [None]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/genewise/'

In [None]:
request = requests.get(server + 'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

In [None]:
parameter = 'genes'

In [None]:
request = requests.get(server + 'parameterdetails/' + parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

In [None]:
request = requests.post(server + 'run', 
                        headers = {"Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"},
                        data = {"email" : "xyz@wormbase.org", "asequence" : prot, "bsequence" : nucl })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

In [None]:
request = requests.get(server + 'status/' + jobid)
print(request.text)

In [None]:
request = requests.get(server + 'resulttypes/' + jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

In [None]:
outtype = 'out'

In [None]:
request = requests.get(server + 'result/' + jobid + '/' + outtype)

print(request.text)

This is the end of the fourth tutorial for WormBase data analysis! This tutorial dealt with using multiple algorithms for Pairwise Sequence Alignment analyses.

In the next tutorial, we will use different algorithms for Multiple Sequence Alignment analyses!

Acknowledgements:
- EBI APIs for Pairwise Alignments (https://www.ebi.ac.uk/Tools/psa/)