# Finding a Protein Motif
### Problem
To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.

You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into

http://www.uniprot.org/uniprot/uniprot_id
Alternatively, you can obtain a protein sequence in FASTA format by following

http://www.uniprot.org/uniprot/uniprot_id.fasta
For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.

### Given: 
At most 15 UniProt Protein Database access IDs.

### Return: 
For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

In [51]:
import numpy as np
import urllib.request
import requests

In [77]:
def read_FASTA(data):
    string = ""
    name = []
    for line in [l.strip() for l in data.splitlines()]:
        if line.startswith('>'):
            name.append(line[1:])
        else:
            string += line
    return string

def get_FASTA(uniprot_id):
    uri = 'http://www.uniprot.org/uniprot/{}.fasta'.format(uniprot_id)
    response = requests.get(uri).text
    return read_FASTA(response)

In [91]:
ids = ['P01045_KNH2_BOVIN', 'P11171_41_HUMAN', 'O13188', 'P02725_GLP_PIG','A3DF24','P22891_PRTZ_HUMAN','A2A2Y4','P02765_A2HS_HUMAN','P07358_CO8B_HUMAN','P02750_A2GL_HUMAN','P07359_GPBA_HUMAN','A5F5B4']
strings = []
for i in ids: strings.append(get_FASTA(i))

In [92]:
for i in range(len(ids)):
    print("")
    print(ids[i])
    for x in range(len(strings[i]) - 3):
        if (strings[i][x] == 'N') and (strings[i][x+1] != 'P') and ((strings[i][x+2] == 'S') or (strings[i][x+2] == 'T')) and (strings[i][x+3] != 'P'):
            print(x+1, end=" ")


P01045_KNH2_BOVIN
47 87 168 169 197 204 280 
P11171_41_HUMAN
258 281 358 
O13188
207 
P02725_GLP_PIG
16 19 39 
A3DF24
178 
P22891_PRTZ_HUMAN
99 225 233 306 332 
A2A2Y4
90 359 407 
P02765_A2HS_HUMAN
156 176 
P07358_CO8B_HUMAN
44 101 243 553 
P02750_A2GL_HUMAN
79 186 269 306 325 
P07359_GPBA_HUMAN
37 175 362 398 
A5F5B4
68 