## Notebook purpose

This notebook shows how to obtain Steve and Mee Park's data from EDD through the REST API,

## Setup

Package imports for use with the rest of the notebook:

In [1]:
import io
import pandas

## Access to EDD database (boilerplate authentification)

This is a standard authentification to the REST API that will not change from notebook to notebook:

In [2]:
import getpass
import requests

auth_url = 'https://edd.jbei.org/accounts/login/'
user = getpass.getuser()
client = requests.session()
# need to do an initial request to establish session CSRF token
csrf_response = client.get(auth_url)
csrf_response.raise_for_status()
csrf_token = csrf_response.cookies['csrftoken']
login_headers = {
    'Host': 'edd.jbei.org',
    'Referer': auth_url,
}
login_payload = {
    'csrfmiddlewaretoken': csrf_token,
    'login': user,
    'password': getpass.getpass(prompt=f'Password for {user}: '),
}
login_response = client.post(auth_url, data=login_payload, headers=login_headers)
# don't leave passwords laying around
del login_payload
login_response.raise_for_status()

Password for hgmartin:  ·················


## Obtain study id numbers for Steve's and Mee Park's studies

First get the slugs for the studies of interest.
Slugs are the last part of the study url in EDD. For example if the study url is: 
"https://edd.jbei.org/s/substrate-consumption-by-p-putida-kt2440-05cd/ "
the slug identifying that study is "substrate-consumption-by-p-putida-kt2440-05cd"

In [3]:
slugs = {}
slugs['Proteomics']     = "global-proteomic-analysis-of-p-putida-kt2440"
slugs['Growth']        = "growth-kinetics-of-p-putida-kt2440"
slugs['Consumption'] = "substrate-consumption-by-p-putida-kt2440-05cd"

Next, use the slugs to get the study ids in EDD, and use those to get the export response into a csv that is read into a pandas data frame:

In [4]:
import re
next_pattern = re.compile(r'<([^>]+)>; rel="next"')
dataframes = {}
for name in slugs.keys():
    lookup_response  = client.get(f'https://edd.jbei.org/rest/studies/?slug={slugs[name]}')  # REST API response for lookup
    study_id         = lookup_response.json()["results"][0]["pk"]                            # Study identifer in EDD database
    export_response  = client.get(f'https://edd.jbei.org/rest/export/?study_id={study_id}')  # REST API response for export
    export_csv       = export_response.content.decode('utf-8')                               # Convert response in csv format
    dataframe        = pandas.read_csv(io.StringIO(export_csv))                              # Convert csv page into pandas dataframe
    # find any following pages
    while True:
        match = next_pattern.search(export_response.headers.get('Link', ''))
        if not match:
            break
        export_response = client.get(match.group(1))                            # get next page
        export_csv = export_response.content.decode('utf-8')                    # load response as csv
        next_frame = pandas.read_csv(io.StringIO(export_csv))                   # load csv to pandas dataframe
        dataframe = dataframe.append(next_frame, ignore_index=True)             # merge with previous pages
    # Store compiled final dataframe
    dataframes[name] = dataframe
    

## Pandas data frames for each of the studies in EDD:

In [5]:
dataframes['Proteomics']

Unnamed: 0,Study ID,Study Name,Line ID,Line Name,Line Description,Protocol,Assay ID,Assay Name,Measurement Type,Compartment,Units,Value,Hours
0,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,GCSP1_PSEPK,0,,0.0,8.0
1,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,Q88NZ5_PSEPK,0,,0.0,8.0
2,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,Q88NQ3_PSEPK,0,,0.0,8.0
3,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,Q88FM9_PSEPK,0,,0.0,8.0
4,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,Q88FA5_PSEPK,0,,0.0,8.0
5,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,Q88FH4_PSEPK,0,,1.2296,8.0
6,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,Q88DF4_PSEPK,0,,0.0,8.0
7,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,Q88NV0_PSEPK,0,,0.0,8.0
8,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,RL34_PSEPK,0,,0.0,8.0
9,54423,Global proteomic analysis of P. putida KT2440,54555,wt-Glc-8hr-R3,,,54590,wt-Glc-8hr-R3,Q88DY4_PSEPK,0,,0.0,8.0


In [6]:
dataframes['Growth']

Unnamed: 0,Study ID,Study Name,Line ID,Line Name,Line Description,Protocol,Assay ID,Assay Name,Measurement Type,Compartment,Units,Value,Hours
0,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.016,0.0
1,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.019,2.0
2,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.033,4.0
3,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.097,6.0
4,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.107,8.0
5,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.174,10.0
6,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.214,12.0
7,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.568,24.0
8,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.568,28.0
9,55158,Growth kinetics of P. putida KT2440,55237,wt-Glc-R1,,,55243,wt-Glc-R1,Optical Density,2,,0.568,48.0


In [7]:
dataframes['Consumption']

Unnamed: 0,Study ID,Study Name,Line ID,Line Name,Line Description,Protocol,Assay ID,Assay Name,Measurement Type,Compartment,Units,Value,Hours
0,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,4.5134,0.0
1,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,4.24245,2.0
2,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,3.92089,4.0
3,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,3.29229,6.0
4,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,2.772,8.0
5,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,1.67633,10.0
6,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,1.22069,12.0
7,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,0.0,24.0
8,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,0.0,28.0
9,55249,Substrate consumption by P. putida KT2440,55250,wt-Glc-R1,,,55258,wt-Glc-R1,D-Glucose,2,g/L,0.0,48.0
