# Visualization Design

By: *Tyler Biggs*

---

**Overview**

This notebook will go over the design of the vizualizations. It should also serve as a reference for future custom visualizations.

In [1]:
%load_ext autoreload
%autoreload 2
from pprint import pprint

In [2]:
import pandas as pd
import numpy as np

import bokeh as bk
import bokeh.io
import bokeh.models
import bokeh.layouts
import bokeh.plotting
bokeh.io.output_notebook()

# import holoviews as hv
# hv.extension('bokeh')

In [3]:
# Path hack to allow imports from the parent directory.
import sys, os
sys.path.insert(0, os.path.abspath('../../'))

In [4]:
from isadream.isadream.models import utils
from isadream.isadream import io

---

## Dataflow

The data is transfered from the Drupal server as a `.json` file. Those files placed into a directory as the user requests them. That is, all the datasets that a user selects for any given visualziation are placed in a directory. These files are condensed into four dataframes per `.json` file.

In [5]:
# A demo json file is provided.
nmr_json_demo = utils.SIPOS_DEMO
demo_base_path = utils.BASE_PATH
print(nmr_json_demo, '\n', demo_base_path)

/home/tyler/git/isadream/isadream/demo_data/demo_json/sipos_2006_talanta_nmr_figs.json 
 /home/tyler/git/isadream/isadream/demo_data/


In [6]:
demo_json = io.read_idream_json(nmr_json_demo)
node = io.parse_json(demo_json)

---

## Viewing the data in each Assay (datafile) per .json

In [23]:
node_dict = dict()
display(id(node))
for assay in node.assays:
#     display(assay.as_dict)
    display(id(assay))
#     node_dict = dict(**node_dict, **{str(k): v for k, v in assay.as_dict.items()})
    

140151066574128

140151066573008

140151066123848

140151066137264

---

### Getting Subsets

In [28]:
df = pd.DataFrame.from_records(node.assays[0].column_data_source)
df.columns = pd.MultiIndex.from_tuples(df.columns)
df = df.T
# df.iloc(axis=1)[0]
df

Unnamed: 0,Unnamed: 1,0,1,2,3,4
"(Material_Property, Density, g/cm^3)","((Fake, 2.0), (Fake, 1.0))",1.05,1.05,1.05,1.05,1.05
"(Material_Property, Poor, Quality)","((Fake, 2.0), (Fake, 1.0))",Poor,Poor,Poor,Poor,Poor
"(Material_Property, Purity_by_Weight, Percent)","((Al(III), 1.0),)",0.98,0.98,0.98,0.98,0.98
"(Measurement, ppm)","((Al(III), 1.0),)",79.9,79.84,79.72,79.66,79.66
"(Measurement, ppm)","((Fake, 2.0), (Fake, 1.0))",79.9,79.84,79.72,79.66,79.66
"(Measurement, ppm)","((K+, 1.0), (OH-, 1.0))",79.9,79.84,79.72,79.66,79.66
"(Measurement_Condition, Molar)","((Al(III), 1.0),)",0.005,0.005,0.005,0.005,0.005
"(Measurement_Condition, Molar)","((Fake, 2.0), (Fake, 1.0))",0.006,0.006,0.006,0.006,0.006
"(Measurement_Condition, Molar)","((K+, 1.0), (OH-, 1.0))",2.93,4.92,6.85,9.13,10.71


In [9]:
molar_df = df.xs(('Measurement_Condition', 'Molar'))
molar_df

Unnamed: 0,0,1,2,3,4
"((Al(III), 1.0),)",0.005,0.005,0.005,0.005,0.005
"((Fake, 2.0), (Fake, 1.0))",0.006,0.006,0.006,0.006,0.006
"((K+, 1.0), (OH-, 1.0))",2.93,4.92,6.85,9.13,10.71


In [10]:
ppm_df = df.xs(('Measurement', 'ppm'))
ppm_df

Unnamed: 0,0,1,2,3,4
"((Al(III), 1.0),)",79.9,79.84,79.72,79.66,79.66
"((Fake, 2.0), (Fake, 1.0))",79.9,79.84,79.72,79.66,79.66
"((K+, 1.0), (OH-, 1.0))",79.9,79.84,79.72,79.66,79.66


**Goal**

Get friendlier formats for `ColumnDataSource`.

In [11]:
def build_array(factor, assay):
    
    assay_df = pd.DataFrame.from_records(assay.column_data_source)
    assay_df.columns = pd.MultiIndex.from_tuples(assay_df.columns)
    assay_df = assay_df.T
    
    factor_df = assay_df.xs(factor)
    factor_df = factor_df.T.melt(var_name='species', value_name=str(factor))
    factor_df = factor_df.set_index('species')
    
    return factor_df

In [12]:
# for assay in node.assays:
#     display(build_array(('Measurement', 'ppm'), assay))

In [13]:
# for assay in node.assays:
#     display(build_array(('Measurement_Condition', 'Molar'), assay))

### Grouby

TODO...

In [14]:
# Groupby examples

---

# Bokeh Model

https://bokeh.pydata.org/en/latest/docs/reference/core/properties.html#container-properties

In [None]:
layout = []

for assay in node.assays:
    xs = build_array(('Measurement_Condition', 'Molar'), assay)
    
    ys = build_array(('Measurement', 'ppm'), assay)
    
    layout.append(hv.Scatter((xs, ys)))
    
hv.Layout(layout)