# Notes

Hey Michael, I've gone for Dexpy as the py package for DoE as pyDOE requires a build from source.

Dexpy doesn't appear to have all the functionality we want but maybe that's a good thing as since we're both kinda new to DOE, maybe doing a bit more "from the bottom up" will help us learn.

I'm working on a simplified experiment design as placeholder.

In the end it is likely that we will use a screening-design to test as many factors as possible to identify the most influential and then conduct a response-surface-design to conduct the actual optimisation.

### Dexpy Docs:
https://statease.github.io/dexpy/example-coffee.html

In [1]:
import dexpy.factorial
import dexpy.power
import pandas as pd
import numpy as np
import statistics

In [2]:
# Our current parameters
original_factors = {
    "lysate_aspirate_height_inc" : 0.4,
    "lysate_aspirate_rate" : 0.2,
    "lysate_dispense_rate" : 0.1,
    
    "substrates_aspirate_height_inc" : 0.7,
    "substrates_aspirate_rate" : 1,
    "substrates_dispense_rate" : 1
    }

I've taken our ourrent parameters and straddled them to create a range. For simplicity every paramet will have a range of 5 possible values generated with 0.1 increments. e.g. "lysate_aspirate_height_inc" : "max": 0.6 has a range of 0.2,0.3,0.4,0.5,0.6

In [3]:
# Maximum values 
factors = {
    "lysate_aspirate_height_inc" : {"max": 0.6},
    "lysate_aspirate_rate" : {"max": 0.4},
    "lysate_dispense_rate" : {"max": 0.5},
    
    "substrates_aspirate_height_inc" : {"max": 0.8},
    "substrates_aspirate_rate" : {"max": 1.2},
    "substrates_dispense_rate" : {"max": 1.2}
    }

In [4]:
def parameter_range_generator(maximum_value, increment, length_of_range):
    
    # generate a python list of floats by making a numpy array and then converting.
    parameter_range = list(np.arange(maximum_value - (length_of_range * increment) , maximum_value + increment, increment))
    
    # round values to 1 decimal place  to ensure exact floats
    parameter_range = [round(value, 1) for value in parameter_range]
    
    # drop negative values and zero from the range
    parameter_range = [value for value in parameter_range if value > 0]
    
    return parameter_range


# Use parameter_range_generator to append the parameter_range as a nested value of each parameter of the factor dictionary.
for parameter in factors:
    factors[parameter]["parameter_range"] = parameter_range_generator(factors[parameter]["max"], 0.1, 5)
    
factors

{'lysate_aspirate_height_inc': {'max': 0.6,
  'parameter_range': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]},
 'lysate_aspirate_rate': {'max': 0.4, 'parameter_range': [0.1, 0.2, 0.3, 0.4]},
 'lysate_dispense_rate': {'max': 0.5,
  'parameter_range': [0.1, 0.2, 0.3, 0.4, 0.5]},
 'substrates_aspirate_height_inc': {'max': 0.8,
  'parameter_range': [0.3, 0.4, 0.5, 0.6, 0.7, 0.8]},
 'substrates_aspirate_rate': {'max': 1.2,
  'parameter_range': [0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3]},
 'substrates_dispense_rate': {'max': 1.2,
  'parameter_range': [0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3]}}

# Factor Metadata
Simply gets some metadata about the factors

In [5]:
# number of factors as an integer
number_of_factors = len(factors)

# a python list of the names of the factors by getting the dictionary keys
names_of_factors = list(factors.keys())

# Total Design Space
# the dot product of the lengths of the parameter ranges 
Total_Design_Space = 1
for parameter in factors:
    Total_Design_Space = Total_Design_Space * len(factors[parameter]["parameter_range"])
Total_Design_Space

35280

# Centerpoints

### Generating the centerpoints
In the case of a parameter_range with odd-numbers, the middle value will be used. For even numbers (0.2, 0,3, 0.4, 0.5) the higher value (0.4) will be used. This is to be a bit bearish and low the chance of OT2 crashes

In [21]:

# initialise a list to store the centerpoint values
centerpoint_list = []

# Also storing the parameter names at the same time.
# this is to guard against any random shuffling by python as dictionaries are unordered.
parameter_name_list = []



for parameter in factors:
    
    # store the name
    parameter_name_list.append(parameter)
    
    # store the length to make the code clearer
    length_of_list = len(factors[parameter]["parameter_range"])
    
    # even number list-length check
    # For odd numbers..
    if (length_of_list % 2) != 0:
        
        
        
        
        # get the middle index by dividing the length by 2, adding 0.5 and taking away 1 (for python indexing)
        middle_idx = int(length_of_list/2 +0.5)-1
        
        # look up the value
        centerpoint_value = factors[parameter]["parameter_range"][middle_idx]
        
        # append to the list
        centerpoint_list.append(centerpoint_value)

        
        
        
        
    #For even numbers..
    elif (length_of_list % 2) == 0:
        
        
        
        # get the middle index by dividing the length by 2 and adding 0.5
        middle_idx = int(length_of_list/2 +0.5)
        
        # look up the value
        centerpoint_value = factors[parameter]["parameter_range"][middle_idx]
        
        # append to the list
        centerpoint_list.append(centerpoint_value)
        
        
    
    else:
        print("Error: length of list is neither odd nor even.")
        
        
        

# generate Pandas Series using both lists
centerpoint_series = pd.Series(centerpoint_list, index = parameter_name_list)
centerpoint_series

lysate_aspirate_height_inc        0.4
lysate_aspirate_rate              0.3
lysate_dispense_rate              0.3
substrates_aspirate_height_inc    0.6
substrates_aspirate_rate          1.0
substrates_dispense_rate          1.0
dtype: float64

# Choosing and generating an experimental design dynamically

Placeholder: full factorial

### Full factorial run generation formula:

Number of runs = 2** number of factors

![Doe_resolution_image](img/doe_resolution_table.png)

In [7]:

# initialise experimental design
experimental_design = dexpy.factorial.build_factorial(number_of_factors, 2**number_of_factors)

# label columns with factor names
experimental_design.columns = names_of_factors

In [8]:
experimental_design

Unnamed: 0,lysate_aspirate_height_inc,lysate_aspirate_rate,lysate_dispense_rate,substrates_aspirate_height_inc,substrates_aspirate_rate,substrates_dispense_rate
0,-1,-1,-1,-1,-1,-1
1,-1,-1,-1,-1,-1,1
2,-1,-1,-1,-1,1,-1
3,-1,-1,-1,-1,1,1
4,-1,-1,-1,1,-1,-1
...,...,...,...,...,...,...
59,1,1,1,-1,1,1
60,1,1,1,1,-1,-1
61,1,1,1,1,-1,1
62,1,1,1,1,1,-1


# Aliasing

We want to remove rows that are duplicated and columns with are repeated to ensure our experiment is efficent as possible

In [9]:
#rows
# creates a bool series of True/False row-is-duplicate and then filters the DF by dropping those rows
experimental_design = experimental_design[~experimental_design.duplicated()]

#columns
# Transpose the df to allow the same simple function to be used row-wise and then transpose back
experimental_design = experimental_design.T
experimental_design = experimental_design[~experimental_design.duplicated()]
experimental_design = experimental_design.T
experimental_design

Unnamed: 0,lysate_aspirate_height_inc,lysate_aspirate_rate,lysate_dispense_rate,substrates_aspirate_height_inc,substrates_aspirate_rate,substrates_dispense_rate
0,-1,-1,-1,-1,-1,-1
1,-1,-1,-1,-1,-1,1
2,-1,-1,-1,-1,1,-1
3,-1,-1,-1,-1,1,1
4,-1,-1,-1,1,-1,-1
...,...,...,...,...,...,...
59,1,1,1,-1,1,1
60,1,1,1,1,-1,-1
61,1,1,1,1,-1,1
62,1,1,1,1,1,-1
