# **Bioinformatics with Jupyter Notebooks for WormBase:**
## **Analyses 4 - Pairwise Sequence Alignment**
Welcome to the eighth jupyter notebook in the WormBase tutorial series. Over this series of tutorials, we will write code in Python that allows us to retrieve and perform simple analyses with data available on the WormBase sites.

This tutorial will deal with performing pairwise sequence alignment of your data using multiple algorithms. This is not limited to only data from WormBase but the examples in the tutorial use worm data.
Let's get started!

As always, we start by importing any libraries that are required for the tutorials.

In [1]:
import requests, sys, json
import xml.dom.minidom

#### Read in the input data from fasta files

In [2]:
sequence1 = open("data/fasta1.fa")
sequence1 = sequence1.read()
print(sequence1)

>C37F5.1                   
atgaatcacattgaccttttgaaggtcaaaaaagagccgccgtcgagttc 
ggaagaagccgaggaagaagaatctccgaaacatacgattgagggaattt 
tggatataagaaagaaagagatgaacgtctcagacttgatgtgtacccga 
tcctcgacacctccgacctcatcatccgtcgactcaatcataaccctgtg 
gcaattccttctagaactactgcaacaagaccaaaatggtgatataatcg 
aatggacacgcggaacggacggcgaattccgactgattgatgcagaagcc 
gtggcgagaaagtggggacaacggaaggcgaaaccgcatatgaattatga 
taaactgtcgagagcgttacgatattattatgagaagaatattattaaga 



In [3]:
sequence2 = open("data/fasta2.fa")
sequence2 = sequence2.read()
print(sequence2)

>FM864439                                          
tggcatttctacgggtgatgaggtggatggagtcgccgaggaggcacact 
gcttgacggggagccacactatctgttggacttggtgtgctgacactgtg 
cgtcgtcggtgaatccgtcccaagaaggctcagcgccgagaaatccgtgt 
tgccacgtggattttgaggtggtggaggcggcggcggcggtgcttgtaat 
gacgtcataaacgacggaatctcgtgtcgaatgtccttctcgtctttgac 
ataacacatcttcatgttcatatttgaggaaaagtcggcggtcggcggag 
cgtgggcgtcagtagttacaaagcgatatacgaactttttgccgatcacc 
ttcttaataatattcttctcataataatatcgtaacgctctcgacagttt 
atcataattcatatgcggtttcgccttccgttgtccccactttctcgcca 
cggcttctgcatcaatcagtcggaattcgccgtccgttccgcgtgtccat 
tcgattatatcaccattttggtcttgttgcagtagttctagaaggaattg 
ccacagggttatgattgagtcgacggatgatgaggtcggaggtgtcgagg 
atcgggtacacatcaagtctgagacgttcatctctttctctctaatttgc 
gcaattaaatctatctactccttcagacattgttccgtgcccagcttctt 
ccgaactctacggatgctcttttttgacctaaaattgtcaatgagattca 
ttttcgtgtaatctgtagagggtccgcgctttgatatcctctctctactg 
cgtaattcatcgttactactcattcagtcatggtcaatggtcaaantttt 
tcncccttcttatttcctnacttccttcttctccctcacttttctttcta 
tctatctattca

### Global Sequence Alignment

#### EMBOSS Needle

EMBOSS Needle creates an optimal global alignment of two sequences using the Needleman-Wunsch algorithm.

We can first get the different parameters that can be used for alignment using the API. Out of these, only email, and input sequences are mandatory fields.

In [4]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/emboss_needle/'

In [5]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>matrix</id>
	<id>gapopen</id>
	<id>gapext</id>
	<id>endweight</id>
	<id>endopen</id>
	<id>endextend</id>
	<id>format</id>
	<id>stype</id>
	<id>asequence</id>
	<id>bsequence</id>
</parameters>



Now, we know the different parameters that can be used for running the alignment. We can then find out the details of each of these parameters, the type of data accepted, and the permissible values. 

In [6]:
parameter = 'gapopen' #Change the variable to whichever parameter you want to find details for

In [7]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Gap open</name>
	<description>Pairwise alignment score for the first residue in a gap.</description>
	<type>FLOAT</type>
	<values>
		<value>
			<label>100</label>
			<value>100</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>50</label>
			<value>50</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>25</label>
			<value>25</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>20</label>
			<value>20</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>15</label>
			<value>15</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>10</label>
			<value>10</value>
			<defaultValue>true</defaultValue>
		</value>
		<value>
			<label>5</label>
			<value>5</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>1</label>
			<value>1</value>
			<defaultValue>false</defaultValue>
		</value>
	</values>
</parameter

Now, we can request the API and run the alignment. Here, multiple parameters can be specified based on the details from the cells above. For EMBOSS Needle, the mandatory fields are - email, asequence, and bsequence. And after running the alignment, we can get the job id for getting the status of the job and also the results of the alignment.

In [8]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

emboss_needle-R20210630-222733-0117-97240744-p2m


It is easy to check the status of our job. Just run the code in the next cell.

In [9]:
request = requests.get(server+'status/'+jobid)
print(request.text)

RUNNING


The API gives different data as results. You can use the next cell to understand the different result types and which ones you need to output.

In [10]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The first input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>asequence</identifier>
		<label>First Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The second input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>bsequence</identifier>
		<label>Second Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The pairwise alignment as returned by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>aln</identifier>
		<label>Pairwise Alignment</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The submission details which was submitted as 

Assign the kind of output you would want to display and then request the API to return it. 

In [11]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [12]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

########################################
# Program: needle
# Rundate: Wed 30 Jun 2021 22:27:36
# Commandline: needle
#    -auto
#    -stdout
#    -asequence emboss_needle-R20210630-222733-0117-97240744-p2m.asequence
#    -bsequence emboss_needle-R20210630-222733-0117-97240744-p2m.bsequence
# Align_format: srspair
# Report_file: stdout
########################################

#
# Aligned_sequences: 2
# 1: C37F5.1
# 2: FM864439
# Matrix: EDNAFULL
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 1319
# Identity:     273/1319 (20.7%)
# Similarity:   273/1319 (20.7%)
# Gaps:         967/1319 (73.3%)
# Score: 359.5
# 
#

C37F5.1            1 --------------------------------------atgaatcaca--     10
                                                           |.||..||||  
FM864439           1 tggcatttctacgggtgatgaggtggatggagtcgccgaggaggcacact     50

C37F5.1           11 --ttgac---------------cttttgaa---ggtcaaaaaagagc---     37
                       |||||               ||.|||.|   |||    

And you are done!

We will check out some more tutorials, based on the other alignment algorithms!

#### EMBOSS Stretcher

EMBOSS Stretcher uses a modification of the Needleman-Wunsch algorithm that allows larger sequences to be globally aligned.

In [13]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/emboss_stretcher/'

In [14]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>matrix</id>
	<id>gapopen</id>
	<id>gapext</id>
	<id>format</id>
	<id>stype</id>
	<id>asequence</id>
	<id>bsequence</id>
</parameters>



In [15]:
parameter = 'gapopen'

In [16]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Gap open</name>
	<description>Pairwise alignment score for the first residue in a gap.</description>
	<type>INTEGER</type>
	<values>
		<value>
			<label>1</label>
			<value>1</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>2</label>
			<value>2</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>3</label>
			<value>3</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>4</label>
			<value>4</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>5</label>
			<value>5</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>6</label>
			<value>6</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>7</label>
			<value>7</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>8</label>
			<value>8</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>9</label>
			

In [17]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

emboss_stretcher-R20210630-222753-0838-30472598-p1m


In [18]:
request = requests.get(server+'status/'+jobid)
print(request.text)

FINISHED


In [19]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The first input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>asequence</identifier>
		<label>First Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The second input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>bsequence</identifier>
		<label>Second Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The pairwise alignment as returned by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>aln</identifier>
		<label>Pairwise Alignment</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The submission details which was submitted as 

In [20]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [21]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

########################################
# Program: stretcher
# Rundate: Wed 30 Jun 2021 22:20:55
# Commandline: stretcher
#    -auto
#    -stdout
#    -asequence emboss_stretcher-R20210630-222753-0838-30472598-p1m.asequence
#    -bsequence emboss_stretcher-R20210630-222753-0838-30472598-p1m.bsequence
# Align_format: markx0
# Report_file: stdout
########################################

#
# Aligned_sequences: 2
# 1: C37F5.1
# 2: FM864439
# Matrix: EDNAFULL
# Gap_penalty: 16
# Extend_penalty: 4
#
# Length: 1273
# Identity:     298/1273 (23.4%)
# Similarity:   298/1273 (23.4%)
# Gaps:         875/1273 (68.7%)
# Score: -3046
# 
#

                                                     10  
C37F5. -----------------ATGA-----AT----------------CACA--
                        ::::     ::                ::::  
FM8644 TGGCATTTCTACGGGTGATGAGGTGGATGGAGTCGCCGAGGAGGCACACT
               10        20        30        40        50

                                20                       
C37F5. --TTGAC-

### Local Alignment

#### EMBOSS Water

EMBOSS Water uses the Smith-Waterman algorithm (modified for speed enhancements) to calculate the local alignment of two sequences.

In [22]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/emboss_water/'

In [23]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>matrix</id>
	<id>gapopen</id>
	<id>gapext</id>
	<id>format</id>
	<id>stype</id>
	<id>asequence</id>
	<id>bsequence</id>
</parameters>



In [24]:
parameter = 'gapopen'

In [25]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Gap open</name>
	<description>Pairwise alignment score for the first residue in a gap.</description>
	<type>FLOAT</type>
	<values>
		<value>
			<label>100</label>
			<value>100</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>50</label>
			<value>50</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>25</label>
			<value>25</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>20</label>
			<value>20</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>15</label>
			<value>15</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>10</label>
			<value>10</value>
			<defaultValue>true</defaultValue>
		</value>
		<value>
			<label>5</label>
			<value>5</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>1</label>
			<value>1</value>
			<defaultValue>false</defaultValue>
		</value>
	</values>
</parameter

In [26]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

emboss_water-R20210630-222810-0411-64808521-p2m


In [27]:
request = requests.get(server+'status/'+jobid)
print(request.text)

RUNNING


In [28]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The first input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>asequence</identifier>
		<label>First Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The second input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>bsequence</identifier>
		<label>Second Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The pairwise alignment as returned by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>aln</identifier>
		<label>Pairwise Alignment</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The submission details which was submitted as 

In [29]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [30]:
request = requests.get('https://www.ebi.ac.uk/Tools/services/rest/emboss_water/result/'+jobid+'/'+outtype)

print(request.text)

########################################
# Program: water
# Rundate: Wed 30 Jun 2021 22:28:11
# Commandline: water
#    -auto
#    -stdout
#    -asequence emboss_water-R20210630-222810-0411-64808521-p2m.asequence
#    -bsequence emboss_water-R20210630-222810-0411-64808521-p2m.bsequence
# Align_format: srspair
# Report_file: stdout
########################################

#
# Aligned_sequences: 2
# 1: C37F5.1
# 2: FM864439
# Matrix: EDNAFULL
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 837
# Identity:     273/837 (32.6%)
# Similarity:   273/837 (32.6%)
# Gaps:         486/837 (58.1%)
# Score: 363.5
# 
#

C37F5.1            1 atgaatcaca----ttgac---------------cttttgaa---ggtca     28
                     |.||..||||    |||||               ||.|||.|   |||  
FM864439          39 aggaggcacactgcttgacggggagccacactatctgttggacttggt--     86

C37F5.1           29 aaaaagagc----------cgccgtcgagt-------tcggaagaa----     57
                          |.||          ||.||||| ||       ||..|||||  

#### EMBOSS Matcher

EMBOSS Matcher identifies local similarities between two sequences using a rigorous algorithm based on the LALIGN application.

In [31]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/emboss_matcher/'

In [32]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>matrix</id>
	<id>gapopen</id>
	<id>gapext</id>
	<id>alternatives</id>
	<id>format</id>
	<id>stype</id>
	<id>asequence</id>
	<id>bsequence</id>
</parameters>



In [33]:
parameter = 'gapopen'

In [34]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Gap open</name>
	<description>Pairwise alignment score for the first residue in a gap.</description>
	<type>INTEGER</type>
	<values>
		<value>
			<label>1</label>
			<value>1</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>2</label>
			<value>2</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>3</label>
			<value>3</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>4</label>
			<value>4</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>5</label>
			<value>5</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>6</label>
			<value>6</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>7</label>
			<value>7</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>8</label>
			<value>8</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>9</label>
			

In [35]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

emboss_matcher-R20210630-222826-0908-13953254-p1m


In [36]:
request = requests.get(server+'status/'+jobid)
print(request.text)

RUNNING


In [37]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The first input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>asequence</identifier>
		<label>First Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The second input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>bsequence</identifier>
		<label>Second Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The pairwise alignment as returned by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>aln</identifier>
		<label>Pairwise Alignment</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The submission details which was submitted as 

In [38]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [39]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

########################################
# Program: matcher
# Rundate: Wed 30 Jun 2021 22:28:28
# Commandline: matcher
#    -auto
#    -stdout
#    -asequence emboss_matcher-R20210630-222826-0908-13953254-p1m.asequence
#    -bsequence emboss_matcher-R20210630-222826-0908-13953254-p1m.bsequence
# Align_format: markx0
# Report_file: stdout
########################################

#
# Aligned_sequences: 2
# 1: C37F5.1
# 2: FM864439
# Matrix: EDNAFULL
# Gap_penalty: 16
# Extend_penalty: 4
#
# Length: 21
# Identity:      16/21 (76.2%)
# Similarity:    16/21 (76.2%)
# Gaps:           0/21 ( 0.0%)
# Score: 60
# 
#

             380       390  
C37F5. ATATTATTATGAGAAGAATAT
       ::::: :: : : :: :::::
FM8644 ATATTCTTCTCATAATAATAT
     360       370       380


#---------------------------------------
#---------------------------------------



### LALIGN

LALIGN finds internal duplications by calculating non-intersecting local alignments of protein or DNA sequences.

In [40]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/lalign/'

In [41]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>stype</id>
	<id>matrix</id>
	<id>match_scores</id>
	<id>gapopen</id>
	<id>gapext</id>
	<id>expthr</id>
	<id>format</id>
	<id>graphics</id>
	<id>asequence</id>
	<id>bsequence</id>
</parameters>



In [42]:
parameter = 'matrix'

In [43]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Matrix</name>
	<description>Default substitution scoring matrices.</description>
	<type>STRING</type>
	<values>
		<value>
			<label>BLOSUM50</label>
			<value>BL50</value>
			<defaultValue>true</defaultValue>
		</value>
		<value>
			<label>BLOSUM62</label>
			<value>BL62</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>BLOSUM80</label>
			<value>BL80</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>PAM120</label>
			<value>P120</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>PAM250</label>
			<value>P250</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>MDM10</label>
			<value>M10</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>MDM20</label>
			<value>M20</value>
			<defaultValue>false</defaultValue>
		</value>
		<value>
			<label>MDM40</label>
			<value>M40</value>
			<defaultValue>false</defaultValue>
		

In [44]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "asequence" : sequence1, "bsequence" : sequence2 })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

lalign-R20210630-222838-0384-67845787-p2m


In [45]:
request = requests.get(server+'status/'+jobid)
print(request.text)

FINISHED


In [46]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The first input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>asequence</identifier>
		<label>First Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The second input sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>bsequence</identifier>
		<label>Second Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The pairwise alignment as returned by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>aln</identifier>
		<label>Pairwise Alignment</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The submission details which was submitted as 

In [47]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [48]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

# /nfs/public/ro/es/appbin/linux-x86_64/fasta-36.3.8h/lalign36 -m 9i lalign-R20210630-222838-0384-67845787-p2m.asequence lalign-R20210630-222838-0384-67845787-p2m.bsequence -m 0
LALIGN finds non-overlapping local alignments
 version 36.3.8h Aug, 2019
Please cite:
 X. Huang and W. Miller (1991) Adv. Appl. Math. 12:373-381

Query: lalign-R20210630-222838-0384-67845787-p2m.asequence
  1>>>C37F5.1 - 400 nt
Library: lalign-R20210630-222838-0384-67845787-p2m.bsequence
     1271 residues in     1 sequences

Statistics: (shuffled [500]) MLE statistics: Lambda= 0.1399;  K=0.04844
 statistics sampled from 1 (1) to 500 sequences
Threshold: E() < 0.2 score: 84
Algorithm: Smith-Waterman (SSE2, Michael Farrar 2006) (7.2 Nov 2010)
Parameters: +5/-4 matrix (5:-4), open/ext: -12/-4
 Scan time:  0.100

The best non-identical alignments are:     ls-w bits E(1) %_id  %_sim  alen
FM864439                        (1271) [r] 1702 348.0 8.9e-100 0.923 0.923  401
+-                                          104 

### Genomic Alignment

#### GeneWise

GeneWise compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors.

Read in fasta files for the input protein sequence and also the genomic DNA sequence to which you are comparing it to.

In [49]:
prot = open("data/genewise_prot.fa")
prot = prot.read()
print(prot)

>GOA-1
MGCTMSQEERAALERSRMIEKNLKEDGMQAAKDIKLLLLGAGESGKSTIVKQMKIIHESGFTAEDYKQYKPVVYSNTVQS
LVAILRAMSNLGVSFGSADREVDAKLVMDVVARMEDTEPFSEELLSSMKRLWGDAGVQDCFSRSNEYQLNDSAKYFLDDL
ERLGEAIYQPTEQDILRTRVKTTGIVEVHFTFKNLNFKLFDVGGQRSERKKWIHCFEDVTAIIFCVAMSEYDQVLHEDET
TNRMHESLKLFDSICNNKWFTDTSIILFLNKKDLFEEKIKKSPLTICFPEYSGRQDYHEASAYIQAQFEAKNKSANKEIY
CHMTCATDTTNIQFVFDAVTDVIIANNLRGCGLY



In [50]:
nucl = open("data/genewise_nucl.fa")
nucl = nucl.read()
print(nucl)

>GOA-1
ATGGGTTGTACCATGTCACAGGAAGAGCGTGCCGCTCTTGAAAGATCACGAATGATTGAGAAAAATCTTAAAGAAGACGG
CATGCAAGCGGCAAAAGATATCAAACTGCTGCTACTTGGTGCAGGAGAATCAGGAAAATCGACTATTGTAAAACAGATGA
AAATTATTCACGAATCGGGATTCACAGCAGAAGACTACAAACAGTACAAGCCGGTTGTCTACAGTAACACGGTTCAATCA
TTGGTCGCTATTTTGCGAGCCATGAGCAACTTAGGCGTTTCATTTGGTTCGGCTGACAGAGAGGTAGATGCAAAATTAGT
GATGGATGTGGTGGCACGAATGGAGGACACAGAGCCATTCTCAGAAGAATTGCTCAGTTCAATGAAACGGTTGTGGGGAG
ACGCAGGTGTACAGGATTGTTTCTCAAGGAGTAACGAGTATCAATTGAATGATTCAGCCAAATATTTCCTTGACGACCTG
GAAAGGTTAGGAGAGGCAATATATCAACCAACTGAGCAAGATATTCTCCGAACTCGTGTCAAAACAACTGGTATTGTTGA
AGTTCACTTCACATTCAAAAATCTCAATTTCAAATTGTTCGATGTGGGAGGTCAAAGATCAGAAAGGAAGAAGTGGATTC
ATTGTTTCGAAGATGTTACTGCTATTATTTTCTGTGTTGCCATGTCAGAGTATGATCAAGTGTTGCACGAAGATGAGACA
ACAAACCGAATGCACGAATCGCTGAAGCTGTTCGATTCGATCTGTAATAACAAATGGTTCACAGATACATCGATTATTCT
GTTCCTGAACAAGAAGGATCTGTTTGAAGAGAAAATCAAGAAAAGCCCGTTAACGATCTGCTTCCCAGAATATTCAGGAC
GACAAGACTACCACGAGGCATCTGCGTATATTCAAGCACAATTTGAGGCTAAAAACAAATCAGCGAATAAGGAAATCTAT
TGCCACATGACATGTGCCACA

In [51]:
server = 'https://www.ebi.ac.uk/Tools/services/rest/genewise/'

In [52]:
request = requests.get(server+'parameters')

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameters = xml.dom.minidom.parseString(request.text)
parameters = parameters.toprettyxml()

print(parameters)

<?xml version="1.0" ?>
<parameters>
	<id>para</id>
	<id>pretty</id>
	<id>genes</id>
	<id>trans</id>
	<id>cdna</id>
	<id>embl</id>
	<id>ace</id>
	<id>gff</id>
	<id>diana</id>
	<id>init</id>
	<id>splice</id>
	<id>random</id>
	<id>alg</id>
	<id>asequence</id>
	<id>bsequence</id>
</parameters>



In [53]:
parameter = 'genes'

In [54]:
request = requests.get(server+'parameterdetails/'+parameter)

if not request.ok:
  request.raise_for_status()
  sys.exit() 

parameter_details = xml.dom.minidom.parseString(request.text)
parameter_details = parameter_details.toprettyxml()

print(parameter_details)

<?xml version="1.0" ?>
<parameter>
	<name>Gene Structure</name>
	<description>Show gene structure, as in genewise</description>
	<type>BOOLEAN</type>
	<values>
		<value>
			<label>ON</label>
			<value>true</value>
			<defaultValue>true</defaultValue>
		</value>
		<value>
			<label>OFF</label>
			<value>false</value>
			<defaultValue>false</defaultValue>
		</value>
	</values>
</parameter>



In [55]:
request = requests.post(server+'run', 
                        headers={ "Content-Type" : "application/x-www-form-urlencoded", "Accept" : "text/plain"}, 
                        data={ "email" : "hebbarprajna2000@gmail.com", "asequence" : prot, "bsequence" : nucl })

if not request.ok:
  request.raise_for_status()
  sys.exit() 

print(request.text)
jobid = request.text

genewise-R20210630-222856-0161-28749315-p2m


In [56]:
request = requests.get(server+'status/'+jobid)
print(request.text)

RUNNING


In [57]:
request = requests.get(server+'resulttypes/'+jobid)

resulttypes = xml.dom.minidom.parseString(request.text)
resulttypes = resulttypes.toprettyxml()

print(resulttypes)

<?xml version="1.0" ?>
<types>
	<type>
		<description>The output from the tool itself</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>out</identifier>
		<label>Tool Output</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>Error messages produced by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>error</identifier>
		<label>Tool Error Details</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The input protein sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>asequence</identifier>
		<label>First Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The input DNA sequence as seen by the tool</description>
		<fileSuffix>txt</fileSuffix>
		<identifier>bsequence</identifier>
		<label>Second Input Sequence</label>
		<mediaType>text/plain</mediaType>
	</type>
	<type>
		<description>The pairwise alignment as returned by the tool</descript

In [58]:
outtype = 'out' #Can be any of the identifiers mentioned in the previous output

In [59]:
request = requests.get(server+'result/'+jobid+'/'+outtype)

print(request.text)

genewise $Name: wise2-4-1 $ (unreleased release)
This program is freely distributed under a GPL. See source directory
Copyright (c) GRL limited: portions of the code are from separate copyright

Query protein:       GOA-1
Comp Matrix:         BLOSUM62.bla
Gap open:            12
Gap extension:       2
Start/End            default
Target Sequence      GOA-1
Strand:              forward
Start/End (protein)  default
Gene Parameter file: gene.stat
Splice site model:   GT/AG only
GT/AG bits penalty   -9.96
Codon Table:         codon.table
Subs error:          1e-06
Indel error:         1e-06
Null model           syn
Algorithm            623

genewise output
Score 868.23 bits over entire alignment
Scores as bits over a synchronous coding model

See WWW help for more info

GOA-1              1 MGCTMSQEERAALERSRMIEKNLKEDGMQAAKDIKLLLLGAGESGKSTI 
                     MGCTMSQEERAALERSRMIEKNLKEDGMQAAKDIKLLLLGAGESGKSTI 
                     MGCTMSQEERAALERSRMIEKNLKEDGMQAAKDIKLLLLGAGESGKSTI 
GOA-1  

This is the end of the fourth tutorial for WormBase data analysis! This tutorial dealt with using multiple algorithms for Pairwise Sequence Alignment analyses.

In the next tutorial, we will use different algorithms for Multiple Sequence Alignment analyses!