# Import, filter, and save EDD data
EDD has a large amount of unnecessary data (biolecter and proteomics). Download studies, filter out everything but isoprenol, and save locally to save time.

In [1]:
import edd_utils as eddu
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


## Download the data

These are the [Experiment Data Depot](https://pubs.acs.org/doi/full/10.1021/acssynbio.7b00204) server, the corresponding username and the slug (address) for the study to be downloaded:

In [2]:
study_slug_dbtl0 = 'corrected-crispri-automation-for-enhanced-isopreno'
study_slug_dbtl1 = 'crispri-automation-for-enhanced-isoprenol-producti'
edd_server   = 'edd.jbei.org'
username     = 'pckinnunen'

We try connecting to the server with our login and password:

In [3]:
try:
    session = eddu.login(edd_server=edd_server, user=username)
except:
    print('ERROR! Connection to EDD failed. We will try to load data from disk...')
else:
    print('OK! Connection to EDD successful. We will try to load data from EDD...')

Password for pckinnunen:  ········


OK! Connection to EDD successful. We will try to load data from EDD...


And then we try to export the study from the EDD instance:

In [4]:
try:
    df_dbtl0 = eddu.export_study(session, study_slug_dbtl0, edd_server=edd_server)
except (NameError, AttributeError, KeyError):
    print(f'ERROR! Not able to export DBTL0 study.')

100%|██████████| 1166748/1166748 [01:01<00:00, 18998.94it/s]


In [5]:
try:
    df_dbtl1 = eddu.export_study(session, study_slug_dbtl1, edd_server = edd_server)
except (NameError, AttributeError, KeyError):
    print(f'ERROR! Not able to export DBTL1 study.')    

100%|██████████| 55584/55584 [00:03<00:00, 16442.04it/s]


There are lots of data here (particularly the biolector data takes a lot of space):

Let's have a look at the different protocols (types of data) that are includeed in the study:

In [6]:
df_dbtl0['Protocol'].unique()

array(['Global Proteomics', 'Biolector', 'GC-FID'], dtype=object)

In [7]:
df_dbtl1['Protocol'].unique()

array(['GC-FID', 'Biolector'], dtype=object)

In [8]:
df_dbtl0_isoprenol = df_dbtl0[df_dbtl0['Protocol'] == 'GC-FID']
df_dbtl1_isoprenol = df_dbtl1[df_dbtl1['Protocol'] == 'GC-FID']

In [9]:
df_dbtl0_isoprenol.to_pickle('./isoprenol_data/dbtl0_isoprenol.pkl')
df_dbtl1_isoprenol.to_pickle('./isoprenol_data/dbtl1_isoprenol.pkl')