# Notes

Hey Michael, I've gone for Dexpy as the py package for DoE as pyDOE requires a build from source.

Dexpy doesn't appear to have all the functionality we want but maybe that's a good thing as since we're both kinda new to DOE, maybe doing a bit more "from the bottom up" will help us learn.

I'm working on a simplified experiment design as placeholder.

In the end it is likely that we will use a screening-design to test as many factors as possible to identify the most influential and then conduct a response-surface-design to conduct the actual optimisation.

### Dexpy Docs:
https://statease.github.io/dexpy/example-coffee.html


**Correction**: Dexpy is ~incomplete~ barely started, with only one function in the docs actually implemented.... errr we might have to implement ourselves or find an alternative

In [10]:
import dexpy.factorial
import dexpy.power
import dexpy.design as design
from dexpy.factorial import build_full_factorial
import pandas as pd
import numpy as np
import statistics
import os

In [11]:
prefix = "ALTE009"

experiment_file_path = "/app/src/OT2_scripts/" + prefix

In [12]:
# Our current parameters
original_factors = {
    "lysate_aspirate_height_inc" : 0.4,
    "lysate_aspirate_rate" : 0.2,
    "lysate_dispense_rate" : 0.1,
    
    "substrates_aspirate_height_inc" : 0.7,
    "substrates_aspirate_rate" : 1,
    "substrates_dispense_rate" : 1
    }

I've taken our ourrent parameters and straddled them to create a range. For simplicity every paramet will have a range of 5 possible values generated with 0.1 increments. e.g. "lysate_aspirate_height_inc" : "max": 0.6 has a range of 0.2,0.3,0.4,0.5,0.6

In [13]:
# Maximum values 
factors = {
    "lysate_aspirate_height_inc" : {"max": 0.6},
    "lysate_aspirate_rate" : {"max": 0.4},
    "lysate_dispense_rate" : {"max": 0.5},
    
    "substrates_aspirate_height_inc" : {"max": 0.8},
    "substrates_aspirate_rate" : {"max": 1.2},
    "substrates_dispense_rate" : {"max": 1.2}
    }

In [14]:
def parameter_range_generator(maximum_value, increment, length_of_range):
    
    # generate a python list of floats by making a numpy array and then converting.
    parameter_range = list(np.arange(maximum_value - (length_of_range * increment) , maximum_value + increment, increment))
    
    # round values to 1 decimal place  to ensure exact floats
    parameter_range = [round(value, 1) for value in parameter_range]
    
    # drop negative values and zero from the range
    parameter_range = [value for value in parameter_range if value > 0]
    
    return parameter_range


# Use parameter_range_generator to append the parameter_range as a nested value of each parameter of the factor dictionary.
for parameter in factors:
    factors[parameter]["parameter_range"] = parameter_range_generator(factors[parameter]["max"], 0.1, 5)

factors

{'lysate_aspirate_height_inc': {'max': 0.6,
  'parameter_range': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]},
 'lysate_aspirate_rate': {'max': 0.4, 'parameter_range': [0.1, 0.2, 0.3, 0.4]},
 'lysate_dispense_rate': {'max': 0.5,
  'parameter_range': [0.1, 0.2, 0.3, 0.4, 0.5]},
 'substrates_aspirate_height_inc': {'max': 0.8,
  'parameter_range': [0.3, 0.4, 0.5, 0.6, 0.7, 0.8]},
 'substrates_aspirate_rate': {'max': 1.2,
  'parameter_range': [0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3]},
 'substrates_dispense_rate': {'max': 1.2,
  'parameter_range': [0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3]}}

# Adding the minimum value

Adding the minimum value of the parameter_range to the dictionary for transparency when looking up later in the code.

In [15]:
# Use parameter_range_generator to append the parameter_range as a nested value of each parameter of the factor dictionary.
for parameter in factors:
    factors[parameter]["min"] = factors[parameter]["parameter_range"][0]

factors

{'lysate_aspirate_height_inc': {'max': 0.6,
  'parameter_range': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
  'min': 0.1},
 'lysate_aspirate_rate': {'max': 0.4,
  'parameter_range': [0.1, 0.2, 0.3, 0.4],
  'min': 0.1},
 'lysate_dispense_rate': {'max': 0.5,
  'parameter_range': [0.1, 0.2, 0.3, 0.4, 0.5],
  'min': 0.1},
 'substrates_aspirate_height_inc': {'max': 0.8,
  'parameter_range': [0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
  'min': 0.3},
 'substrates_aspirate_rate': {'max': 1.2,
  'parameter_range': [0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3],
  'min': 0.7},
 'substrates_dispense_rate': {'max': 1.2,
  'parameter_range': [0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3],
  'min': 0.7}}

# Factor Metadata
Simply gets some metadata about the factors

In [16]:
# number of factors as an integer
number_of_factors = len(factors)

# a python list of the names of the factors by getting the dictionary keys
names_of_factors = list(factors.keys())

# Total Design Space
# the dot product of the lengths of the parameter ranges 
Total_Design_Space = 1
for parameter in factors:
    Total_Design_Space = Total_Design_Space * len(factors[parameter]["parameter_range"])
Total_Design_Space

35280

# Centerpoints

### Generating the centerpoints
In the case of a parameter_range with odd-numbers, the middle value will be used. For even numbers (0.2, 0,3, 0.4, 0.5) the higher value (0.4) will be used. This is to be a bit bearish and low the chance of OT2 crashes

In [17]:

# initialise a list to store the centerpoint values
centerpoint_list = []

# Also storing the parameter names at the same time.
# this is to guard against any random shuffling by python as dictionaries are unordered.
parameter_name_list = []



for parameter in factors:
    
    # store the name
    parameter_name_list.append(parameter)
    
    # store the length to make the code clearer
    length_of_list = len(factors[parameter]["parameter_range"])
    
    # even number list-length check
    # For odd numbers..
    if (length_of_list % 2) != 0:
        
        
        
        
        # get the middle index by dividing the length by 2, adding 0.5 and taking away 1 (for python indexing)
        middle_idx = int(length_of_list/2 +0.5)-1
        
        # look up the value
        centerpoint_value = factors[parameter]["parameter_range"][middle_idx]
        
        # append to the list
        centerpoint_list.append(centerpoint_value)

        
        
        
        
    #For even numbers..
    elif (length_of_list % 2) == 0:
        
        
        
        # get the middle index by dividing the length by 2 and adding 0.5
        middle_idx = int(length_of_list/2 +0.5)
        
        # look up the value
        centerpoint_value = factors[parameter]["parameter_range"][middle_idx]
        
        # append to the list
        centerpoint_list.append(centerpoint_value)
        
        
    
    else:
        print("Error: length of list is neither odd nor even.")
        
        
        

# generate Pandas Series using both lists
centerpoint_series = pd.Series(centerpoint_list, index = parameter_name_list)
centerpoint_series

lysate_aspirate_height_inc        0.4
lysate_aspirate_rate              0.3
lysate_dispense_rate              0.3
substrates_aspirate_height_inc    0.6
substrates_aspirate_rate          1.0
substrates_dispense_rate          1.0
dtype: float64

In [21]:
from func_central_composite import *
central_composite = build_ccd(6, alpha='rotatable', center_points=1)
central_composite

  factor_data = factorial_runs.append(axial_runs)
  factor_data = factor_data.append(center_runs)


Unnamed: 0,X1,X2,X3,X4,X5,X6
0,-1.0,-1.0,-1.0,-1.0,-1.000000,-1.000000
1,-1.0,-1.0,-1.0,-1.0,-1.000000,1.000000
2,-1.0,-1.0,-1.0,-1.0,1.000000,-1.000000
3,-1.0,-1.0,-1.0,-1.0,1.000000,1.000000
4,-1.0,-1.0,-1.0,1.0,-1.000000,-1.000000
...,...,...,...,...,...,...
8,0.0,0.0,0.0,0.0,-2.828427,0.000000
9,0.0,0.0,0.0,0.0,2.828427,0.000000
10,0.0,0.0,0.0,0.0,0.000000,-2.828427
11,0.0,0.0,0.0,0.0,0.000000,2.828427


In [22]:
dexpy.build_simplex_lattice()

AttributeError: module 'dexpy' has no attribute 'build_simplex_lattice'

# Choosing and generating an experimental design dynamically


If we have 6x replicates then we can fit 64 samples on to a plate


![plan plan](img/plate_plan.png)

Placeholder design : **half factorial**? Important to fit on to one plate.

### Factorial run generation formula:

Number of runs = 2 ** (number of factors-1)

This is the IV on the chart

![Doe_resolution_image](img/doe_resolution_table.png)

In [None]:

# initialise experimental design
experimental_design = dexpy.factorial.build_factorial(number_of_factors, 2**(number_of_factors-1))

# label columns with factor names
experimental_design.columns = names_of_factors

experimental_design

# Converting encoding back to real values

I'll do this simply by simply replacing "1" with max value and "-1" with minimum.

In [None]:
# iterate over the df columns
for parameter in names_of_factors:

    column = experimental_design[parameter]
    
    # then iterate over the series generated
    
    for index, value in column.items():
        
        
        if value == 1:
            
            # replace
            experimental_design.loc[index, parameter] = factors[parameter]["max"]
            
        elif value == -1:
            # replace
            experimental_design.loc[index, parameter] = factors[parameter]["min"]
            
        else:
            print("Error: Encoded value is neither 1 nor -1")

experimental_design

# Appending the centerpoints as row

In [None]:
# pandas concat is not 'columns aware' so have to use this weird centerpoint_series.to_frame(1).T thing to orientate it properly
experimental_design = pd.concat([experimental_design, centerpoint_series.to_frame(1).T], axis=0, ignore_index = True)
experimental_design

# Aliasing

We want to remove rows that are duplicated and columns with are repeated to ensure our experiment is efficent as possible

In [None]:
#rows
# creates a bool series of True/False row-is-duplicate and then filters the DF by dropping those rows
experimental_design = experimental_design[~experimental_design.duplicated()]

#columns
# Transpose the df to allow the same simple function to be used row-wise and then transpose back
experimental_design = experimental_design.T
experimental_design = experimental_design[~experimental_design.duplicated()]
experimental_design = experimental_design.T
experimental_design

# Save to disk

The Experimental Design is now finished and will be exported as a CSV for a downstream parser to put the values into the _pipetting_settings.json file.

In [None]:
#change directory
os.chdir(experiment_file_path)

#save using the experimental prefix
experimental_design.to_csv(prefix+"_experimental_design.csv", header = True)