# Data Classes

**Overview**

There are five top level groups in the `.json` file that contain information about a given *Node*. There are read into, and prepared by the `Model` class. The top level groups are:

+ nodeInformation
+ studyFactors
+ studySamples
+ assays
+ comments

*Setting up auto-reloading of the isadream package.*

---

In [1]:
# Path hack to allow imports from the parent directory.
import sys
import os
sys.path.insert(0, os.path.abspath('../../'))

In [2]:
%load_ext autoreload
%autoreload 2

In [306]:
import isadream.isadream.model as IdreamModel
from isadream.isadream.model import SIPOS_DEMO
from isadream.isadream.model import normalize_to_dataframe
from isadream.isadream.model import load_csv
import itertools
import json
import pandas as pd

---

## The Assay Dataframe

The `Assay` is the lowest level of separation in metadata.

In [633]:
MODEL = IdreamModel.Model(SIPOS_DEMO)
# MODEL._Model__study_factor_df

In [634]:
MODEL.assay_metadata

Unnamed: 0_level_0,Unnamed: 1_level_0,comments,comments,samples,samples,samples,samples,samples,$id,description,experimentSubType,...,sources,sources,sources,species,species,studySampleFactors,studySampleFactors,studySampleFactors,body,name
Unnamed: 0_level_1,Unnamed: 1_level_1,body,name,AssaySampleFactors,AssaySampleFactors,AssaySampleFactors,name,species,NaN,NaN,NaN,...,name,species,species,speciesReference,stoichiometry,decimalValue,factorType,unitRef,NaN,NaN
Unnamed: 0_level_2,Unnamed: 1_level_2,NaN,NaN,csvColumnIndex,factorType,unitRef,NaN,stoichiometry,NaN,NaN,NaN,...,NaN,speciesReference,stoichiometry,NaN,NaN,NaN,NaN,NaN,NaN,NaN
dataFile,samples.species.speciesReference,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3
sipos_2006_talanta_fig_3_KOH.csv,K+,I manually pulled this points out with a web t...,Data extraction method.,0.0,Measurement Condition,Molar,Potassium Hydroxide,1.0,https://lampdev02.pnl.gov/bigg006/idreamdrupal/,Extracted figures.,Al_NMR,...,Aluminum Wire,Al(III),1.0,Al(III),1.0,0.005,Measurement Condition,Molar,I manually pulled this points out with a web t...,Comment on Sipos 2006.
sipos_2006_talanta_fig_3_KOH.csv,OH-,I manually pulled this points out with a web t...,Data extraction method.,0.0,Measurement Condition,Molar,Potassium Hydroxide,1.0,https://lampdev02.pnl.gov/bigg006/idreamdrupal/,Extracted figures.,Al_NMR,...,Aluminum Wire,Al(III),1.0,Al(III),1.0,0.005,Measurement Condition,Molar,I manually pulled this points out with a web t...,Comment on Sipos 2006.
sipos_2006_talanta_fig_3_LiOH.csv,Li+,,,0.0,Measurement Condition,Molar,Lithium Hydroxide,1.0,https://lampdev02.pnl.gov/bigg006/idreamdrupal/,Extracted figures.,Al_NMR,...,Aluminum Wire,Al(III),1.0,Al(III),1.0,0.005,Measurement Condition,Molar,I manually pulled this points out with a web t...,Comment on Sipos 2006.
sipos_2006_talanta_fig_3_LiOH.csv,OH-,,,0.0,Measurement Condition,Molar,Lithium Hydroxide,1.0,https://lampdev02.pnl.gov/bigg006/idreamdrupal/,Extracted figures.,Al_NMR,...,Aluminum Wire,Al(III),1.0,Al(III),1.0,0.005,Measurement Condition,Molar,I manually pulled this points out with a web t...,Comment on Sipos 2006.
sipos_2006_talanta_fig_3_NaOH.csv,Na+,,,0.0,Measurement Condition,Molar,Sodium Hydroxide,1.0,https://lampdev02.pnl.gov/bigg006/idreamdrupal/,Extracted figures.,Al_NMR,...,Aluminum Wire,Al(III),1.0,Al(III),1.0,0.005,Measurement Condition,Molar,I manually pulled this points out with a web t...,Comment on Sipos 2006.
sipos_2006_talanta_fig_3_NaOH.csv,OH-,,,0.0,Measurement Condition,Molar,Sodium Hydroxide,1.0,https://lampdev02.pnl.gov/bigg006/idreamdrupal/,Extracted figures.,Al_NMR,...,Aluminum Wire,Al(III),1.0,Al(III),1.0,0.005,Measurement Condition,Molar,I manually pulled this points out with a web t...,Comment on Sipos 2006.


In [635]:
# MODEL.assay_metadata.select_dtypes(object)

In [636]:
# MODEL.assay_metadata.select_dtypes(float)

In [637]:
# MODEL.assay_metadata.select_dtypes(int)

In [638]:
# build_key_df(MODEL.assay_metadata)

In [639]:
# MODEL.csv_data

In [791]:
def retr_csv_columns(working_df):
    
    # This retrieves the top two column levels of groups with the csvColumnIndex label.
#     csv_subset = working_df.xs('csvColumnIndex', axis=1, level=-1, drop_level=True)
    csv_groups = working_df.groupby(level=-1, axis=1)
#     csv_cols = csv_subset.columns.values
#     slices = list(map(slice, *csv_cols))
#     return csv_subset
#     return working_df.loc[:, csv_cols[0]]
#     return working_df[(csv_cols[0], None)]
    return list(csv_groups)

In [796]:
retr_csv_columns(MODEL.assay_metadata)[0]

('RefValue',
                                                                               3  \
                                                                     StudyFactor   
                                                                        RefValue   
 dataFile                          samples.species.speciesReference                
 sipos_2006_talanta_fig_3_KOH.csv  K+                                [KAl(SO4)2]   
                                   OH-                               [KAl(SO4)2]   
 sipos_2006_talanta_fig_3_LiOH.csv Li+                               [KAl(SO4)2]   
                                   OH-                               [KAl(SO4)2]   
 sipos_2006_talanta_fig_3_NaOH.csv Na+                               [KAl(SO4)2]   
                                   OH-                               [KAl(SO4)2]   
 
                                                                                   sources  
                                                    

In [551]:
def create_key_value(in_dataframe=MODEL.assay_df):
    working_df = in_dataframe.copy()
    
    columns = working_df.columns
    discrete = [x for x in columns if working_df[x].dtype == object]
    continuous = [x for x in columns if x not in discrete]
    
    value_dict = {}

    for cont_indexes in continuous:
        for row in working_df.itertuples():
            value_dict[cont_indexes, row] = working_df[cont_indexes]

    return pd.DataFrame(value_dict)

In [65]:
create_key_value()

Unnamed: 0_level_0,samples.AssaySampleFactors.csvColumnIndex,samples.AssaySampleFactors.csvColumnIndex,samples.AssaySampleFactors.csvColumnIndex,samples.AssaySampleFactors.csvColumnIndex,samples.AssaySampleFactors.csvColumnIndex,samples.AssaySampleFactors.csvColumnIndex,samples.species.stoichiometry,samples.species.stoichiometry,samples.species.stoichiometry,samples.species.stoichiometry,samples.species.stoichiometry,samples.species.stoichiometry
Unnamed: 0_level_1,"(0, I manually pulled this points out with a web tool., Data extraction method., sipos_2006_talanta_fig_3_KOH.csv, 0, Measurement Condition, Molar, Potassium Hydroxide, K+, 1.0)","(1, I manually pulled this points out with a web tool., Data extraction method., sipos_2006_talanta_fig_3_KOH.csv, 0, Measurement Condition, Molar, Potassium Hydroxide, OH-, 1.0)","(2, nan, nan, sipos_2006_talanta_fig_3_LiOH.csv, 0, Measurement Condition, Molar, Lithium Hydroxide, Li+, 1.0)","(3, nan, nan, sipos_2006_talanta_fig_3_LiOH.csv, 0, Measurement Condition, Molar, Lithium Hydroxide, OH-, 1.0)","(4, nan, nan, sipos_2006_talanta_fig_3_NaOH.csv, 0, Measurement Condition, Molar, Sodium Hydroxide, Na+, 1.0)","(5, nan, nan, sipos_2006_talanta_fig_3_NaOH.csv, 0, Measurement Condition, Molar, Sodium Hydroxide, OH-, 1.0)","(0, I manually pulled this points out with a web tool., Data extraction method., sipos_2006_talanta_fig_3_KOH.csv, 0, Measurement Condition, Molar, Potassium Hydroxide, K+, 1.0)","(1, I manually pulled this points out with a web tool., Data extraction method., sipos_2006_talanta_fig_3_KOH.csv, 0, Measurement Condition, Molar, Potassium Hydroxide, OH-, 1.0)","(2, nan, nan, sipos_2006_talanta_fig_3_LiOH.csv, 0, Measurement Condition, Molar, Lithium Hydroxide, Li+, 1.0)","(3, nan, nan, sipos_2006_talanta_fig_3_LiOH.csv, 0, Measurement Condition, Molar, Lithium Hydroxide, OH-, 1.0)","(4, nan, nan, sipos_2006_talanta_fig_3_NaOH.csv, 0, Measurement Condition, Molar, Sodium Hydroxide, Na+, 1.0)","(5, nan, nan, sipos_2006_talanta_fig_3_NaOH.csv, 0, Measurement Condition, Molar, Sodium Hydroxide, OH-, 1.0)"
0,0,0,0,0,0,0,1.0,1.0,1.0,1.0,1.0,1.0
1,0,0,0,0,0,0,1.0,1.0,1.0,1.0,1.0,1.0
2,0,0,0,0,0,0,1.0,1.0,1.0,1.0,1.0,1.0
3,0,0,0,0,0,0,1.0,1.0,1.0,1.0,1.0,1.0
4,0,0,0,0,0,0,1.0,1.0,1.0,1.0,1.0,1.0
5,0,0,0,0,0,0,1.0,1.0,1.0,1.0,1.0,1.0


In [11]:
create_key_value(MODEL.study_sample_df)

Unnamed: 0_level_0,"(sources, materialCharacteristic, decimalValue)","(sources, species, stoichiometry)","(species, stoichiometry, nan)","(studySampleFactors, decimalValue, nan)"
Unnamed: 0_level_1,"(0, Aluminate Solution, Purity by Weight, 0.98, Material Property, Percent, Aluminum Wire, Al(III), 1.0, Al(III), 1.0, 0.005, Measurement Condition, Molar)","(0, Aluminate Solution, Purity by Weight, 0.98, Material Property, Percent, Aluminum Wire, Al(III), 1.0, Al(III), 1.0, 0.005, Measurement Condition, Molar)","(0, Aluminate Solution, Purity by Weight, 0.98, Material Property, Percent, Aluminum Wire, Al(III), 1.0, Al(III), 1.0, 0.005, Measurement Condition, Molar)","(0, Aluminate Solution, Purity by Weight, 0.98, Material Property, Percent, Aluminum Wire, Al(III), 1.0, Al(III), 1.0, 0.005, Measurement Condition, Molar)"
0,0.98,1.0,1.0,0.005


In [18]:
MODEL.study_factor_df

Unnamed: 0_level_0,RefValue,csvColumnIndex,decimalValue,unitRef
factorType,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Measurement Condition,,,25.0,Celsius
Measurement,,1.0,,ppm
Measurement Condition,,,78.204,MHz
Measurement Reference,[KAl(SO4)2],,,Reference Compound


In [16]:
create_key_value(MODEL.study_factor_df)

Unnamed: 0_level_0,csvColumnIndex,csvColumnIndex,csvColumnIndex,csvColumnIndex,decimalValue,decimalValue,decimalValue,decimalValue
Unnamed: 0_level_1,"(Measurement Condition, nan, nan, 25.0, Celsius)","(Measurement, nan, 1.0, nan, ppm)","(Measurement Condition, nan, nan, 78.204, MHz)","(Measurement Reference, [KAl(SO4)2], nan, nan, Reference Compound)","(Measurement Condition, nan, nan, 25.0, Celsius)","(Measurement, nan, 1.0, nan, ppm)","(Measurement Condition, nan, nan, 78.204, MHz)","(Measurement Reference, [KAl(SO4)2], nan, nan, Reference Compound)"
factorType,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Measurement Condition,,,,,25.0,25.0,25.0,25.0
Measurement,1.0,1.0,1.0,1.0,,,,
Measurement Condition,,,,,78.204,78.204,78.204,78.204
Measurement Reference,,,,,,,,
