# REQUESTS-0 - Exercises

**Request library notebook**

Notebook with exercises for practicing the use of the requests library

#1. Start using 'requests' library

Start by choosing one of the following APIs

**1. Chuck Norris API**
This is a free JSON API for hand-curated Chuck Norris facts. It also has Slack and Facebook messenger integration! For example, you can retrieve a random chuck joke in JSON format —

https://api.chucknorris.io/jokes/random

**2. Numbers API**
An API for interesting facts about numbers. It provides trivia, math, date, and year facts about numbers. For example, if I look for date facts on February 29th, I get the following result.

http://numbersapi.com/#42

**3. Bored API**
This is something you could add to your personal website, for instance. The Bored API ensures that a user is never bored. When requested, it responds with a random activity for the user to do. You can even customize the type and the number of participants!

https://www.boredapi.com/

**4. Agify API**
How do you tell the age of someone from their name? Well, here is a fun little API that you can use. Agify is used for predicting the age of a person given their name. It is free to use for up to 1000 requests/day. You can try out the following in your browser —

https://api.agify.io?name=michael

Write a small script to use them!





#2. Basic library usage

1. Modify the following program to get the ‘id’ parameter from the user and show the result.

In [None]:
import requests

dna_id = input("Introduce el identificador: ")
url=f"https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id=id={dna_id}&style=raw"
retrieve=requests.get(url)
print(retrieve.text)


Introduce el identificador: J00231
ERROR 11 Unable to connect to database [ena_sequence].



2. Modify the following program to show only the lines containing the organism and the molecule type for the sequence.


In [None]:
import requests

# Define the URL
ebi_url = 'https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id=J00231&style=raw'

# Search Dbfetch tool
response = requests.get(ebi_url)

# Inspect some attributes of the `requests` repository
salida=response.text.splitlines() #Split a string into a list where each line is a list item
#needed to create a list in order to iterate with it in the loop
for line in salida:
    if line.startswith("OS "):
        print(line)
    if "sig_peptide" in line:  # Molecule type line (if this is the correct tag)
        print(f"Molecule type: {line}")

OS   Homo sapiens (human)
Molecule type: FT   sig_peptide     26..79


#3. Passing parameters
----

3. One common way to customize a GET request is to pass values through query string parameters in the URL. To do this using *get()*, you pass data to params as a dictionary.

The purpose of the params dictionary in the requests.get() function is to pass query parameters to the API. These parameters allow you to customize or filter the request to the API and determine what type of data is returned.

In [None]:
import requests

# Define the URL
ebi_url = 'https://www.ebi.ac.uk/Tools/dbfetch/dbfetch'

# Search Dbfetch tool
response = requests.get(ebi_url, #esta libreria admite un diccionario llamado
                        #params en el que le pasamos los arguments
    params={'db': 'ena_sequence', 'id':'J00231' , 'style':'raw'}
)

# Inspect some attributes of the `requests` repository
print(response.text)

ID   J00231; SV 1; linear; mRNA; STD; HUM; 1089 BP.
XX
AC   J00231;
XX
DT   13-JUN-1985 (Rel. 06, Created)
DT   17-APR-2005 (Rel. 83, Last updated, Version 9)
XX
DE   Human Ig gamma3 heavy chain disease OMM protein mRNA.
XX
KW   C-region; gamma heavy chain disease protein;
KW   gamma3 heavy chain disease protein; heavy chain disease; hinge exon;
KW   immunoglobulin gamma-chain; immunoglobulin heavy chain;
KW   secreted immunoglobulin; V-region.
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
OC   Homo.
XX
RN   [1]
RP   1-1089
RX   DOI; 10.1073/pnas.79.10.3260.
RX   PUBMED; 6808505.
RA   Alexander A., Steinmetz M., Barritault D., Frangione B., Franklin E.C.,
RA   Hood L., Buxbaum J.N.;
RT   "gamma Heavy chain disease in man: cDNA sequence supports partial gene
RT   deletion model";
RL   Proc. Natl. Acad. Sci. U.S.A. 79(10):3260-3264(1982).
XX
DR   

The dictionary params={'db': 'ena_sequence', 'id': 'J00231', 'style': 'raw'} breaks down as follows:

**db**: Specifies the database you're querying. In this case, ena_sequence indicates you're querying the ENA (European Nucleotide Archive) sequence database.

**id**: Specifies the unique identifier of the sequence you want. In this case, 'J00231' is the ID of the sequence you're requesting.

**style**: Specifies the format in which you want the data. The raw style means that you want the raw sequence data, without additional formatting.

4. Rewrite both programs 1. and 2. to use the params parameter in *get()* function. First, copy the resulting code 1 and 2 in the following cells (don't modify the originals)

PROGRAMA 1 con *.get(url, params=...)*

In [None]:
dna_id = input("Introduce el identificador: ")
url=f"https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id=id={dna_id}&style=raw"

retrieve=requests.get(url, params={"db":"ena_sequence", "id": dna_id, "style":"raw"})
print(retrieve.text)

Introduce el identificador: J00231
ID   J00232; SV 1; linear; genomic DNA; STD; HUM; 88 BP.
XX
AC   J00232;
XX
DT   13-JUN-1985 (Rel. 06, Created)
DT   30-JUN-2006 (Rel. 88, Last updated, Version 8)
XX
DE   Human Ig germline heavy chain D-region gene, D4.
XX
KW   C-region; D-region; germline; immunoglobulin heavy chain; V-region.
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae;
OC   Homo.
XX
RN   [1]
RP   1-88
RX   DOI; 10.1038/294631a0.
RX   PUBMED; 7312051.
RA   Siebenlist U., Ravetch J.V., Korsmeyer S., Waldmann T., Leder P.;
RT   "Human immunoglobulin D segments encoded in tandem multigenic families";
RL   Nature 294(5842):631-635(1981).
XX
DR   MD5; a572029a56f20b326ed5fe890f9c7680.
DR   IMGT/LIGM; J00232.
XX
CC   Members of the D-region family are embedded in a 9 kb repeat unit.
CC   The probe used to isolate the four D-region genes was an ab

PROGRAMA 2 con *.get(url, **params**=...)*

In [None]:
import requests

# Define the URL
ebi_url = 'https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id=J00231&style=raw'

# Search Dbfetch tool
response = requests.get(ebi_url, params={"db":"ena_sequence", "id": dna_id, "style":"raw"})

# Inspect some attributes of the `requests` repository
salida=response.text.splitlines() #Split a string into a list where each line is a list item
#needed to create a list in order to iterate with it in the loop
for line in salida:
    if line.startswith("OS "):
        print(line)
    if "sig_peptide" in line:  # Molecule type line (if this is the correct tag)
        print(f"Molecule type: {line}")

OS   Homo sapiens (human)
Molecule type: FT   sig_peptide     26..79


5. Complete the following program to retrieve different sequences. Read each ID from 'prot_codes.txt' (one per line).

In [None]:
import requests

# Define the URL
urlbase = "https://www.ebi.ac.uk/Tools/dbfetch/dbfetch"

# Set parameters
params = {'db':'uniprotkb','id':'S4TR86','format':'fasta','style':'raw'}

# Make the request!
print(requests.get(urlbase, params=params).text)

>tr|S4TR86|S4TR86_9HEMI Cytochrome c oxidase subunit 1 (Fragment) OS=Graptocleptes sp. 00004431 OX=1276425 GN=COI PE=3 SV=1
LGTPGTFIGNDQIYNVFVTAHAFIMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNM
SFWLLPPSLTLLLISSIAEGGAGTGWTVYPPLSSNIAHSGAAVDLAIFSLHLAGVSSILG
AVNFISTIINMRPXGMSPERIPMFVWSVGITALLLLLSLPVLAGAITMLLTDRNFNTSFF
DPSGGGDPILYQHLFWFFGHPEVXILILPGFGLISHIIAMETGK



In [None]:
import requests

# Define the URL
urlbase = "https://www.ebi.ac.uk/Tools/dbfetch/dbfetch"

# Define the id extraction:
#Read each ID from 'prot_codes.txt' (one per line)
file = "../prot_codes.txt"
f=open(file)
for line in f: #each line is a dna id
    content=f.readline()
    #print(line)
    params = {'db':'uniprotkb','id':{line},'format':'fasta','style':'raw'}
    print(requests.get(urlbase, params=params).text)
f.close()
# Make the request!

>UNIPROT:1433X_MAIZE P29306 14-3-3-like protein (Fragment)
ILNSPDRACNLAKQAFDEAISELDSLGEESYKDSTLIMQLLXDNLTLWTSDTNEDGGDEI
K

>tr|S4TR87|S4TR87_9CAUD dUTP diphosphatase OS=Salmonella phage FSL SP-058 OX=1173761 GN=SP058_00140 PE=3 SV=1
MQVKLRVLPFNNPNMSVPARATEGSAGVDLRANTSEPFELKPGETKLIETGLAIHLDDVH
VAAMILPRSGLGHKHGVVLGNLTGLIDSDYQGELMVSLWNRSTEPFTVNPGDRIAQMVIV
PVMQPEFVVVDSFESTERGAGGFNSTGVK



for line in f.readline():
#in one line: Many results! **Why?**

In [None]:
import requests

# Define the URL
urlbase = "https://www.ebi.ac.uk/Tools/dbfetch/dbfetch"

# Define the id extraction:
#Read each ID from 'prot_codes.txt' (one per line)
file = "../prot_codes.txt"
f=open(file)
for line in f.readline(): #in one line: Many results!
    #print(line)
    params = {'db':'uniprotkb','id':{line},'format':'fasta','style':'raw'}
    print(requests.get(urlbase, params=params).text)
f.close()

>sp|P17861|XBP1_HUMAN X-box-binding protein 1 OS=Homo sapiens OX=9606 GN=XBP1 PE=1 SV=2
MVVVAAAPNPADGTPKVLLLSGQPASAAGAPAGQALPLMVPAQRGASPEAASGGLPQARK
RQRLTHLSPEEKALRRKLKNRVAAQTARDRKKARMSELEQQVVDLEEENQKLLLENQLLR
EKTHGLVVENQELRQRLGMDALVAEEEAEAKGNEVRPVAGSAESAALRLRAPLQQVQAQL
SPLQNISPWILAVLTLQIQSLISCWAFWTTWTQSCSSNALPQSLPAWRSSQRSTQKDPVP
YQPPFLCQWGRHQPSWKPLMN
>sp|P50479|PDLI4_HUMAN PDZ and LIM domain protein 4 OS=Homo sapiens OX=9606 GN=PDLIM4 PE=1 SV=2
MPHSVTLRGPSPWGFRLVGGRDFSAPLTISRVHAGSKAALAALCPGDLIQAINGESTELM
THLEAQNRIKGCHDHLTLSVSRPEGRSWPSAPDDSKAQAHRIHIDPEIQDGSPTTSRRPS
GTGTGPEDGRPSLGSPYGQPPRFPVPHNGSSEATLPAQMSTLHVSPPPSADPARGLPRSR
DCRVDLGSEVYRMLREPAEPVAAEPKQSGSFRYLQGMLEAGEGGDWPGPGGPRNLKPTAS
KLGAPLSGLQGLPECTRCGHGIVGTIVKARDKLYHPECFMCSDCGLNLKQRGYFFLDERL
YCESHAKARVKPPEGYDVVAVYPNAKVELV
>sp|P78369|CLD10_HUMAN Claudin-10 OS=Homo sapiens OX=9606 GN=CLDN10 PE=1 SV=2
MASTASEIIAFMVSISGWVLVSSTLPTDYWKVSTIDGTVITTATYWANLWKACVTDSTGV
SNCKDFPSMLALDGYIQACRGLMIAAVSLGFFGSIFALFGMKCTKVGGSDKAKAKIACLA
GIVFILSGLCSMTG

###**¿PORQUE** solo me aparecen 2 de los de la lista del **FILE?**

In [None]:
import os
from pathlib import Path

cwd = os.getcwd()
print(cwd)

cwp = Path.cwd()
print(cwp)


/content
/content


# Solutions (do NOT open!. Yet)

**Exercise 1**. Get the ID from the user

It can be done with f-strings (only for Python3), or old-school %-formatting. More info: https://realpython.com/python-f-strings/

In [None]:
import requests

# J00231
id = input("Type the desired ID, please: ")

# Define the URL
# Only for Python 3
ebi_url = f'https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id={id}&style=raw'
# Valid in Python 2.x
ebi_url = 'https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id={}&style=raw'.format(id)

# Search Dbfetch tool
response = requests.get(ebi_url)

# Inspect some attributes of the `requests` repository
print(response.text)

**Exercise 2**. Filter output

In [None]:
# 2. Modify the following program to show only the lines containing the organism and the molecule type for the sequence

import requests

# Define the URL
ebi_url = 'https://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=ena_sequence&id=J00231&style=raw'

# Search Dbfetch tool
response = requests.get(ebi_url).text

splited_response = response.split('\n')

# Method 1: line-by-line
for line in splited_response:
    if 'organism' in line or 'mol_type' in line:
        print(line)

# Method 2: list comprehension
filtered_response = [ line for line in splited_response if ('organism' in line or 'mol_type' in line)]
print(filtered_response)

**Exercise 5**. Making multiple requests

In [None]:
import requests

params = {'db':'uniprotkb','id':'S4TR86','format':'fasta','style':'raw'}
urlbase = "https://www.ebi.ac.uk/Tools/dbfetch/dbfetch"

# Retrieve many
with open('prot_codes.txt') as f:
    for line in f:
        params['id'] = line.strip()
        print(requests.get(urlbase,params=params).text)

FileNotFoundError: [Errno 2] No such file or directory: 'prot_codes.txt'