# Ensembl REST API – urllib2
HTTP is based on requests and responses
Using urllib2 we can make http request using python and get a response from the server.
Try the example below and see what it does:

In [None]:
import urllib2
request = urllib2.Request('http://www.python.org')
response = urllib2.urlopen(request)
html = response.read()
print html

# Ensembl REST API – urllib2
Use urllib2 to:
	- Find the Ensembl ID of P53
		Print the content of the response
	
	- Parse the response
		Store the Ensembl gene ID in a variable

	- Request all transcripts of P53
		Use the Ensembl gene ID variable
		Print the content of the response

In [None]:
import urllib2
request = urllib2.Request('http://rest.ensembl.org/xrefs/symbol/homo_sapiens/P53?content-type=application/json')
response = urllib2.urlopen(request)
html = response.read()
print html

In [None]:
import re
stable_id = re.search(r'(ENSG[0-9]*)',html).group()
print re.findall('ENSG[0-9]*', html)
print stable_id

In [None]:
request = urllib2.Request('http://rest.ensembl.org/overlap/id/'+ stable_id +'?feature=transcript;content-type=application/json')
response = urllib2.urlopen(request)
transcripts = response.read()
print transcripts

# Ensembl REST API – JSON – parsing
Try to understand the script below:

In [None]:
import json
json_string = '[{"key":"value","key_2":3.0},{"key_3":[2, 4]}]'
json_parsed = json.loads(json_string)
print "JSON_STRING =", json_string
print "JSON_PARSED =", json_parsed
print json_parsed[0]
print json_parsed[1]['key_3']

# Ensembl REST API – JSON exercise
Lets download a JSON file and parse it

Use wget to store, or download via the browser: 
	http://rest.ensembl.org/info/assembly/homo_sapiens?content-type=application/json

Read the file with Python and parse the file

Print the following information:
- Assembly name and date
- Name and length of the chromosomes

In [None]:
# Read the file with Python and parse the file
import json
file = open("../data/homo_sapiens.json","r")
content = file.readline()
json_parsed = json.loads(content)
file.close()

In [None]:
# Print the following information:
# Assembly name and date
print json_parsed["assembly_name"]
print json_parsed["assembly_date"]

In [None]:
for region in json_parsed['top_level_region']:
    if region['coord_system'] == 'chromosome':
        print region['name'], "\t", region['length']

# Ensembl REST API – urllib2 and JSON
Exercise, use and try to understand the following script:

In [None]:
import urllib2
import json

server = "http://rest.ensembl.org"
endpoint = "/xrefs/symbol/homo_sapiens/BRCA2"
headers = {}
headers['Content-Type'] = 'application/json'

request = urllib2.Request(server + endpoint, headers=headers)
response = urllib2.urlopen(request)
content = response.read()
data = json.loads(content)

print data[0]['id']

# Ensembl REST API – Last exercise
Use urllib2 and json to make a small tool:
- Ask the user for a gene symbol (e.g. BRCA2)
- Find the ensembl id of that gene
- Print the gene and Ensembl ID

Request all transcripts of the gene using the Ensembl ID
Print for each transcript (separated by a tab):
- ID
- Location = chr:start-end
- Biotype

Extra - Make functions for:
- Rest requests			-> EnsemblRestRequest(endpoint)
- Get Ensembl stable gene ID 	-> GetEnsemblGeneId(gene_symbol)
- Get transcripts 			-> GetTranscripts(gene_ensembl_id)


In [None]:
import urllib2
import json

def EnsemblRestRequest(endpoint):
    # Set headers
    headers = {}
    headers['Content-Type'] = 'application/json'
    # Create request
    request = urllib2.Request("http://rest.ensembl.org/" + endpoint, headers=headers)
    response = urllib2.urlopen(request)
    content = response.read()
    # Parse json content
    data = json.loads(content)
    return data

def GetEnsemblGeneId(gene_symbol):   
    return EnsemblRestRequest("xrefs/symbol/homo_sapiens/" + gene_symbol)[0]['id']
    
def GetTranscripts(gene_ensembl_id):
    return EnsemblRestRequest("overlap/id/" + gene_ensembl_id + "?feature=transcript")

In [None]:
gene_id = raw_input("Enter a gene symbol: ")

ensembl_id = GetEnsemblGeneId(gene_id)
print "The Ensembl stable id of", gene_id , "=", ensembl_id

transcripts = GetTranscripts(ensembl_id)
for transcript in transcripts:
    print transcript['id'] , "\t" , transcript['seq_region_name'] , 
    print ":" , transcript['start'] , "-" , transcript['end'] , "\t" , transcript['biotype']
