# Protein location prediction

...a story of trial and error

Task: use [this service](http://www.cbs.dtu.dk/ws/ws.php?entry=SignalP4) to predict signal peptides.

# Read the docs

This is directly from the documentation:

In [4]:
from suds.client import Client
from suds.bindings import binding
wsdl = 'http://www.cbs.dtu.dk/ws/SignalP/SignalP_3_1_ws0.wsdl'
client = Client(wsdl,cache=None)

seq1 = client.factory.create('runService.parameters.sequencedata.sequence')
seq1.id="IPI:IPI00000005.1"
seq1.seq="""MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG
QEEYSAMRDQYMRTGEGFLCVFAINNSKSFADINLYREQIKRVKDSDDVPMVLVGNKCDL
PTRTVDTKQAHELAKSYGIPFIETSAKTRQGVEDAFYTLVREIRQYRMKKLNSSDDGTQG
CMGLPCVVM"""

request=client.factory.create('runService.parameters')
request.organism="euk"
request.method="best"
request.sequencedata.sequence=[seq1]
response = client.service.runService(request)
print response

ERROR:suds.client:<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:ns0="http://www.cbs.dtu.dk/ws/WSSignalP_3_1_ws0" xmlns:ns1="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
   <SOAP-ENV:Header/>
   <ns1:Body>
      <ns0:runService>
         <parameters>
            <organism>euk</organism>
            <method>best</method>
            <sequencedata>
               <sequence>
                  <id>IPI:IPI00000005.1</id>
                  <seq>MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG
QEEYSAMRDQYMRTGEGFLCVFAINNSKSFADINLYREQIKRVKDSDDVPMVLVGNKCDL
PTRTVDTKQAHELAKSYGIPFIETSAKTRQGVEDAFYTLVREIRQYRMKKLNSSDDGTQG
CMGLPCVVM</seq>
               </sequence>
            </sequencedata>
         </parameters>
      </ns0:runService>
   </ns1:Body>
</SOAP-ENV:Envelope>


Exception: (404, u'Not Found')

A `404`. We're not going anywhere...

# Try 2: online demo

There is [this online demo](http://www.cbs.dtu.dk/services/SignalP/) so if we can get our results in the browser, we should be able to get them also in Python. We need to simulate submitting that form. Let's [inspect the source](view-source:http://www.cbs.dtu.dk/services/SignalP/). Looking for `<form>` and `<input>` HTML tags will tell us what fields the form has and where its data need to be submitted. We see at least:

`orgtype - euk, format - short, SEQSUB - sequences, configfile - ...`

Forms are submitted using requests.post with the argument `data`. Let's try.

**[EDIT: There was a bug in the code below. After much trying and pondering, it turns out the SignalP service does nothing useful unless the string you give it uses Windows newline characters. I.e. \r\n instead of the usual \n. So I added the .replace() below and that fixed it]**

In [3]:
import requests

seq=""">IPI:IPI00000005.1
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAG
QEEYSAMRDQYMRTGEGFLCVFAINNSKSFADINLYREQIKRVKDSDDVPMVLVGNKCDL
PTRTVDTKQAHELAKSYGIPFIETSAKTRQGVEDAFYTLVREIRQYRMKKLNSSDDGTQG
CMGLPCVVM""".replace("\n","\r\n")

### NOTE: bug fixed in this code. It used to say SEQSUB but should say SEQPASTE. Sorry. And thanks to Ning Wang once again.
q={"orgtype":"euk", "format":"short", "SEQPASTE":seq, "configfile":"/usr/opt/www/pub/CBS/services/SignalP-4.1/SignalP.cf"}
r=requests.post("http://www.cbs.dtu.dk/cgi-bin/webface2.fcgi",data=q)
print r.text

<html>
<head>
<title>Job status of 58A448C200003E204ADAE030</span></title></head>
<script type="text/javascript" src="http://code.jquery.com/jquery-1.10.1.min.js"></script>
<script type="text/javascript" src="/js/webface.js"></script>
<script type="text/javascript">
$(document).ready(function(){
	launchcheck('queued','58A448C200003E204ADAE030','http://www.cbs.dtu.dk/cgi-bin/webface2.fcgi',20);
});
</script>
</head>
<body>
<!-- jobid: 58A448C200003E204ADAE030 status: queued -->
<H1>Your job 58A448C200003E204ADAE030 is <span name="status">queued</span></H1>
<br>

<form>
	Send me email when job finishes
	<input type="hidden" name="jobid" value="58A448C200003E204ADAE030">
	<input type="text" name="email" value="">
	<input type="hidden" name="wait" value="20">
	<input type="submit" name="submit" value="Send email">
</form>
<div id="progress"></div>
<noscript>This page should reload automatically. Otherwise <a href="http://www.cbs.dtu.dk//cgi-bin/webface2.fcgi?jobid=58A448C200003E204ADAE030"

Success! Now we need to get the jobid from the response.

In [4]:
# Look for lines like this
#
# <!-- jobid: 56B2792F00004B9894F74F8C status: queued -->

import re
for line in r.text.split("\n"):
    match=re.search(r"jobid: (\S+)",line)
    if match:
        jobid=match.group(1)
        break
print "jobid:", jobid
#SUCCESS!

r=requests.get("http://www.cbs.dtu.dk//cgi-bin/webface2.fcgi",params={"jobid":jobid})
print r.text
#SUCCESS! We have our response!
        

jobid: 58A448C200003E204ADAE030
<html>
<title> SignalP 4.1 Server - prediction results</title>
<body text="#000000" bgcolor="#f8f8f8"
      link="#FF3399" vlink="#FF3399" alink="#808080">
<font face="ARIAL,HELVETICA">
<br>
<table>
<tr><td><img src="/images/m_logo.gif">
    <td>&nbsp;&nbsp;&nbsp;
    <td><h2>SignalP 4.1 Server - prediction results</h2>
        <h3>Technical University of Denmark</h3>
</table>
<br>
</font>
<hr>
<pre>
<PRE>
# SignalP-4.1 euk predictions
# name                     Cmax  pos  Ymax  pos  Smax  pos  Smean   D     ?  Dmaxcut    Networks-used
IPI_IPI00000005.1          0.120  19  0.120  19  0.137   4  0.118   0.119 N  0.450      SignalP-noTM
</PRE>
Please cite:
SignalP 4.0: discriminating signal peptides from transmembrane regions
Petersen TN., Brunak S., von Heijne G. & Nielsen H.
Nature Methods, 8:785-786, 2011
<font face="ARIAL,HELVETICA">
<a href="/services/SignalP-4.1/output.html"
target=_blank><b>Explain</b></a> the output.  Go <a
href="javascript:history