# Download EDD Study From Jupyter Notebook
This notebook illustrates how to use python to export an EDD study into a pandas dataframe for downstream analytics and processing for any bioinformatics workflow. It also includes 

First the edd_utils module is imported with the required functions to login and export

In [1]:
#Install a pip package edd-utils in the current Jupyter kernel
import sys
!{sys.executable} -m pip install edd-utils

Collecting edd-utils
  Downloading edd_utils-0.0.6-py3-none-any.whl (5.0 kB)
Installing collected packages: edd-utils
Successfully installed edd-utils-0.0.6


In [54]:
from edd_utils import login, export_study

Each EDD study has a unique identifier called a *slug*. A slug is a string from the end of the URL, between the last two slash signs (``/``). We provide this string to tell our exporter what study to download.
Below is an example.

In [55]:
# Study to Download
study_slug = 'testreinhard'
slug=study_slug

If the desired EDD server is not `edd.jbei.org`, it should be specified (e.g. `public-edd.jbei.org`, `public-edd.agilebiofoundry.org`).

In [56]:
# EDD server
edd_server = 'edd.jbei.org'

Now we use the login function in edd_utils to **Login to EDD** using the default instance (edd.jbei.org)

In [59]:
session = login(edd_server=edd_server)

Password for RGentz: ········


Finally we **Download the Study** using the export_study function.  It returns a pandas dataframe that can be manipulated for downstream data analysis.

In [121]:
try:
    df = export_study(session, study_slug, edd_server=edd_server)
except:
    print("Slugname and/or EDD password are wrong. Please correct before proceding")

HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))




In [127]:
df.head() #Gets the data as by the sample notebook

Unnamed: 0,Study ID,Study Name,Line ID,Line Name,Line Description,Protocol,Assay ID,Assay Name,Formal Type,Measurement Type,Compartment,Units,Value,Hours


Code below gets all the metadata and returns it to the user.

In [128]:
#Get metadata
def export_metadata(session, slug, edd_server='edd.jbei.org', verbose=True):
    '''Export Metadata from EDD as a touple'''

    try:
        lookup_response = session.get(f'https://{edd_server}/rest/studies/?slug={slug}')

    except KeyError:
        if lookup_response.status_code == requests.codes.forbidden:
            print('Access to EDD not granted\n.')
            sys.exit()
        elif lookup_response.status_code == requests.codes.not_found:
            print('EDD study was not found\n.')
            sys.exit()
        elif lookup_response.status_code == requests.codes.server_error:
            print('Server error\n.')
            sys.exit()
        else:
            print('An error with EDD export has occurred\n.')
            sys.exit()

    json_response = lookup_response.json()
    # Catch the error if study slug is not found in edd_server
    try: 
        study_id = json_response["results"][0]["pk"]
    except IndexError:
        if json_response["results"] == []:
            print(f'Slug \'{slug}\' not found in {edd_server}.\n')
            sys.exit()
    # TODO: catch the error if the study is found but cannot be accessed by this user
    
    # Get the metadata value's
    export_response = session.get(f'https://{edd_server}/rest/lines/?study={study_id}')
    metadata=export_response.json()
    metadata=metadata['results'][-1]["metadata"]
    # Get the metadata names
    export_response = session.get(f'https://{edd_server}/rest/metadata_types/?study_id={study_id}')
    rainer_get=export_response.json()
    results=rainer_get['results']
    # Merge data values and names
    output=[]
    for i in results:
        try:
            output.append((i["type_name"],metadata[str(i['pk'])]))
            if verbose:
                print(i["type_name"],metadata[str(i['pk'])])
        except KeyError:
            #when we are in here there is no data entered for that field. This happens because EDD returns all datafileds that exist anywhere in EDD not only those present in this study.
            pass
            #print("Not present",i["type_name"],i)
    while rainer_get["next"]!=None: #Get next page of names
        export_response = session.get(rainer_get["next"])
        rainer_get=export_response.json()
        results=rainer_get['results']
        for i in results:
            try:
                output.append((i["type_name"],metadata[str(i['pk'])]))
                if verbose:
                    print(i["type_name"],metadata[str(i['pk'])])
            except KeyError:
                #when we are in here there is no data entered for that field. This happens because EDD returns all datafileds that exist anywhere in EDD not only those present in this study.
                pass
                #print("Not present",i["type_name"],i)
    return output
    
export_metadata(session, slug, edd_server='edd.jbei.org', verbose=False)

[('Flask Volume', '4'),
 ('Growth temperature', 'NA'),
 ('Date Grown', '6/25/19'),
 ('Date of harvest', '9/25/19'),
 ('Growth Site Type', 'Field'),
 ('Growth Site Location', 'Davis'),
 ('Growth Site Plot ID', '1'),
 ('Tissue type', 'Stem'),
 ('IL Name', 'Cholinium Phosphate'),
 ('IL Anion', 'Phosphate'),
 ('IL Cation', 'Cholinum'),
 ('IL Zwitterion', 'NU'),
 ('DES Name', 'NU'),
 ('DES Hbond acceptor', 'NU'),
 ('DES Hbond donor', 'NU'),
 ('Reactor Agitation', '1000'),
 ('Pretreatment Scale', '2g'),
 ('Pretreatment Temperature', '121'),
 ('Pretreatment Time', '3h'),
 ('Pretreatment IL %', '5%'),
 ('Pretreatment DES %', 'NU'),
 ('Biomass Loading', '15%'),
 ('Particle Size', '1mm'),
 ('Anti-solvent type', 'NU'),
 ('Total volume', 'NA'),
 ('Solids recovery', 'NU'),
 ('Number of washes', '0'),
 ('IL Recovery method', 'NU'),
 ('Saccharification scale', '100%'),
 ('Saccharification IL conc.', '5%'),
 ('Saccharification buffer type', 'pH adjustment'),
 ('Saccharification Buffer conc.', '1molar'