# Jupyter Notebook

The code in this notebook is designed to assist with the analysis of data collected to tune free parameters for the UREC carbon capture device. Currently, the code produces dummy data for all of the free parameters, and the output of interest from the device is assumed to be the measured CO<sub>2</sub> concentration (in ppm) of the air coming out of the device. 

**Important Note:** I'm very much not an expert on the DACC process or the device constructed by the UREC team! I just like writing analysis code and making pretty plots :)
If anything about any of the variables here, or their units, ranges, etc. seems confusing or off, please just let me know and I'll try to improve things to make the analysis more representative of the actual setup and data-taking process. 

Having said that, I'm trying to keep the code as general as possible to allow for parameters to be easily added or removed from the analysis on-the-fly.

Currently, the code considers the following free parameters:

* **Basket configuration:** It's currently assumed that several different basket configurations will be tested. The different configurations can just be labelled as unique strings (eg. `config_1`, `config_2`, etc.)
* **Spray duration (s):** The length of time that water is sprayed continuously onto the sorbent 
* **Spray interval (s):** The time interval between continuous sprays (i.e. between the beginning of one spray and the beginning of the next one). 
* **Fan on time (s):** The length of time that the fan is on in between sprays (**This is my understanding of what this variable means, but please correct me if wrong!!**)
* **Exposed surface area (cm<sup>2</sup>):** The measured surface area of sorbent explosed to the fan. 

<hr>

## Bokeh Imports

The python cell below performs standard imports for the bokeh package used for data visualization. 

In [1]:
# Standard imports for bokeh visualization package
from bokeh.io import output_notebook, show
output_notebook()

## Making Dummy Data

The following cell makes dummy data for the CO<sub>2</sub> concentration and free parameters listed above. 

**Important note:** When working with actual data from the DACC machine, there will be no need to produce dummy data, so the following cell will not need to be run! Instead, the data will be read in as a csv file, which I anticipate would be exported from an excel spreadsheet. 

### Data Samples

Five basket configurations are considered, currenly labelled `{'config_1', 'config_2', ..., 'config_5'}`. For each of the other free parameters, 10 sample values are tested within a specified range. I then consider all possible combinations of the sample values for all the free parameters to form a complete set of samples. 

I also add some random jitter to the sample locations, to reflect the fact that the sample values tested in practice are unlikely to follow a perfect grid. 

### Multi-dimensional Gaussian Function for CO$_2$ Concentration

Not knowing what the actual data will look like, the CO<sub>2</sub> concentration is constructed to follow a multi-dimensional Gaussian shape as a function of the free parameters, with a peak somewhere within the tested ranges of each free parameters.

The multi-dimensional Gaussian of $n$ dimensions is defined as:

$$f(x_1, ... x_n) = Ae^{-\big[(x_1-\mu_1)^2/(2\sigma_1)^2 + ... + (x_n-\mu_n)^2/(2\sigma_n)^2\big]}$$

where for a given free parameter $x_i$, the $\mu_i$ and $\sigma_i$ represent the central value and width, respectively, of the Gaussian peak in that parameter,  For simplicity, I'm setting A to 1 for the dummy data, since the scaling isn't too important. 

For a given variable $x_i$, I consider several different central values $\mu_i$ for the Gaussian peak, depending on the basket configuration, but they're all located within the range of tested values. The peak width $\sigma_i$ is set to 1/4 of the range of tested values: $\sigma_i = \frac{1}{4}\big(\max\{x_{i, \text{tested}}\} - \min\{x_{i, \text{tested}}\}\big)\}$.

In [78]:
############################################# Making dummy data #################################################

# Import numpy for array construction and manipulation
import numpy as np

# Make arrays of dummy data for spray duration, spray frequency, and basket configuration.
#### For the dummy data, five basket configurations are tested, labelled "config_#"".
spray_duration = np.linspace(1, 10, 10)
spray_interval = np.linspace(10, 60, 10)
basket_configs = ['config_1', 'config_2', 'config_3', 'config_4', 'config_5']
basket_config_indices = 1 + np.arange(len(basket_configs))

# Make a 2D mesh of the durations and spray intervals that covers all combinations of spray intervals and durations
spray_duration_mesh, spray_interval_mesh = np.meshgrid(spray_duration, spray_interval) 

# Flatten the 2D arrays of spray duration and interval to make 1D arrays with all combinations
spray_duration = spray_duration_mesh.ravel()
spray_interval = spray_interval_mesh.ravel()

# For each configuration tested, add some jitter to randomize the spray durations and intervals a bit
spray_durations = {}
spray_intervals = {}
for config in basket_configs:
    spray_durations[config] = spray_duration + np.random.rand(len(spray_duration)) * 0.1 * ( max(spray_duration) - min(spray_duration) )
    spray_intervals[config] = spray_interval + np.random.rand(len(spray_interval)) * 0.1 * ( max(spray_interval) - min(spray_interval) )

###### Construct dummy data for the CO2 ppm as 2D Gaussian functions of spray interval and spray duration. ######
# For each basket configuration, set a different peak location of the Gaussian function in spray duration and interval 
spray_duration_peaks = min(spray_duration) + basket_config_indices * \
                       ( max(spray_duration) - min(spray_duration) ) / (len(basket_config_indices) + 1.)
spray_interval_peaks = min(spray_interval) + basket_config_indices * \
                       ( max(spray_interval) - min(spray_interval) ) / (len(basket_config_indices) + 1.)

# Set the widths of the dummy spray duration and interval Gaussian peaks to 1/4 their total measured range
spray_duration_width = ( max(spray_duration) - min(spray_duration) ) / 4.
spray_interval_width = ( max(spray_interval) - min(spray_interval) ) / 4.

# Make the dummy 2D Gaussian functions for each configuration
CO2_ppms = {}
for i in range(len(basket_configs)):
    CO2_ppms[basket_configs[i]] = np.exp(-( \
                                            (spray_durations[basket_configs[i]] - spray_duration_peaks[i])**2 \
                                            / (2 * (spray_duration_width)**2) + \
                                            (spray_intervals[basket_configs[i]] - spray_interval_peaks[i])**2 \
                                            / (2 * (spray_interval_width)**2) \
                                          ) \
                                        )

## Saving the Dummy Data to a CSV File

The next cell saves the dummy data produced in the previous cell to a CSV (comma-separated-value) file. 

As with the previous cell, this will not need to be run when using actual data from the DACC machine, since in that case the CSV file will be exported from excel and read in directly.

In [97]:
# Organize the data into a dictionary, with one key per variable, to make it easy to export to csv
data_to_export = {
    'configs': [],
    'spray duration': [np.zeros(0)],
    'spray interval': np.zeros(0),
    'CO2 ppm': np.zeros(0)
}

# Populate the dictionary with the dummy data, looping over all basket configurations
for config in basket_configs:
    data_to_export['configs'] += [config] * len(spray_durations[config])  # This makes a list of identical strings
    data_to_export['spray duration'] = np.append(data_to_export['spray duration'], spray_durations[config])
    data_to_export['spray interval'] = np.append(data_to_export['spray interval'], spray_intervals[config])
    data_to_export['CO2 ppm'] = np.append(data_to_export['CO2 ppm'], CO2_ppms[config])
    
# Save the dictionary to a CSV file
import csv
out_file = open('dummy_data.csv', 'w')
writer = csv.DictWriter(out_file, data_to_export.keys(), dialect = 'excel')

writer.writeheader()
for i in range(len(list(data_to_export.values())[0])):
        writer.writerow({key:data_to_export[key][i] for key in data_to_export.keys()}) 

## Reading in the CSV Data

The next cell reads in the CSV data as a dictionary for analysis. If working with real data, you will need to update the name of the CSV file that gets read in.

In [117]:
# Read in the CSV file. 
import csv 
import collections
#with open('dummy_data.csv', mode='r') as in_file:  ## UPDATE FILENAME 'dummy_data.csv' AS NEEDED! ##
#    variable_names = in_file.readline().split(',')
    
#print(variable_names)
input_data = collections.defaultdict(list)

# The following command gives the reader one dictionary per row in the CSV file, 
# where the keys for each row are variable names (eg. 'CO2 ppm') and the values are their 
# corresponding values for the given row
reader = csv.DictReader(open('dummy_data.csv'), dialect='excel') 
#for row in reader: print(row)   # This printout can help to visualize the dictionaries contained in the reader

# The following aggregates the dictionaries in the reader over all the rows to make a single dictionary,
# for which each variable is paired with a single list representing the data in its column
input_data = {key: [] for key in next(reader)}
for row in reader:
    for key, value in row.items():
            input_data[key].append(value)
# print(input_data)

## Visualizing the Data

In [81]:
#### This cell uses the bokeh package to visualize the CO2 ppm as a function of spray frequency and duration #####

# Import packages for model construction and plotting
from bokeh.plotting import figure
from bokeh.models import LinearColorMapper
from bokeh.models import ColumnDataSource
from bokeh.layouts import column
from bokeh.models import ColorBar
from bokeh.models import CustomJS, RadioButtonGroup

# Prepare 2D plot of CO2 ppm as a function of spray duration and frequency for the first basket configuration
iconfig=0
config = basket_configs[iconfig]

# Prepare the data to be plotted, setting the colours of the points to be plotted according to the CO2 ppm
source = ColumnDataSource(data = dict(duration = spray_durations[config], interval = spray_intervals[config], colour = CO2_ppms[config]))

# Map the CO2 ppms onto a colour gradient
exp_cmap = LinearColorMapper(palette="Viridis256", 
                             low = min(CO2_ppms[config]), 
                             high = max(CO2_ppms[config]))

# Plot as a scatter plot, with the location of each point corresponding to the spray duration and frequency 
# and the colour corresponding to the measured CO2 ppm
p = figure(width=400, height=400, title="CO2 concentration (ppm)")
p.circle('duration', 'interval', size=10, source=source, line_color=None,
        fill_color={"field":"colour", "transform":exp_cmap})
p.xaxis.axis_label = 'Spray Duration (s)'
p.yaxis.axis_label = 'Spray Interval (s)'

# Plot the colour bar
bar = ColorBar(color_mapper=exp_cmap, location=(0,0))
p.add_layout(bar, "left")

# Make a button group to update the CO2 ppm in the plot according to the configuration tested
button_group = RadioButtonGroup(labels=basket_configs, active=0)
update_button = CustomJS(args=dict(basket_configs=basket_configs, spray_durations=spray_durations, \
                                   spray_intervals=spray_intervals, CO2_ppms=CO2_ppms, source=source), \
                                   code="""
                                   console.log('radio_button_group: active=' + this.active)
                                   const duration = spray_durations[basket_configs[this.active]]
                                   const interval = spray_intervals[basket_configs[this.active]]
                                   const colour = CO2_ppms[basket_configs[this.active]]
                                   source.data = { duration, interval, colour }
                                   """
                        )

# Update the plot any time a button is clicked
button_group.js_on_click(update_button)

# Display the buttons and plot in one column 
show(column(button_group, p))

In [None]:
# In this cell, I want to visualize the CO2 ppm as a function of either spray duration or frequency.
# In each case, it will be profiled to maximize over the variable that isn't displayed. 