# Generating Design of Experiments 

This is not an optmized method, but it is good enough for a simple 2 factor experiment like this one as long as we do some validation on our design as shown below.

#### Imports

In [6]:
import numpy as np
import pandas as pd

#### Define design parameters

- 2 factors with 3 and 4 levels
- 18 tanks available
- 2 replicates per condition

In [7]:
levels = [3, 4]
conditions = 18 // 2
# Generate the full factorial design (3 levels x 4 levels)

#### Generate your DOE

We'll use pyDOE to generate the full factorial, and because this is a pretty simple experiment with only 2 levels, we're going to use the simplest method to generate the partial factorial: just select the first 9 conditions.

In [8]:
full_factorial = np.array(np.meshgrid(*[range(level) for level in levels])).T.reshape(
    -1, len(levels)
)

# Since we need only 9 runs, select a subset using a fractional factorial approach
np.random.seed(105)
sampled_rows = np.random.choice(full_factorial.shape[0], size=9, replace=False)
design_subset = full_factorial[sampled_rows]
# sample 9 random values from an array


# Convert numeric levels to conditions for better readability
factor_1_conditions = [1, 4, 8]
factor_2_conditions = [10, 15, 20, 25]

# Map numeric levels to actual conditions
design_subset[:, 0] = [factor_1_conditions[int(x)] for x in design_subset[:, 0]]
design_subset[:, 1] = [factor_2_conditions[int(x)] for x in design_subset[:, 1]]

# Convert to DataFrame for better readability
design_df = pd.DataFrame(design_subset, columns=["Concentration A", "Concentration B"])

print("Experimental Design:\n", design_df)

Experimental Design:
    Concentration A  Concentration B
0                1               25
1                1               20
2                1               15
3                8               20
4                8               15
5                8               10
6                4               25
7                8               25
8                4               10


### Validate the DOE
This seems crude, but we will calculate the **D-efficiency** of our experimental design to give us confidence that we are capturing as much information as possible.

More complex experiments might require finding the design with the highest possible D-Efficiency, we call this **D-Optimal** score.

In [9]:
# Convert factors to numerical values for X matrix
factor_1_numeric = pd.Categorical(design_df["Concentration A"]).codes
factor_2_numeric = pd.Categorical(design_df["Concentration B"]).codes

# Construct the design matrix X (including intercept)
X = np.vstack((np.ones(len(factor_1_numeric)), factor_1_numeric, factor_2_numeric)).T

# Calculate the determinant of X'X
XtX = np.dot(X.T, X)
det_XtX = np.linalg.det(XtX)

# Calculate D-efficiency
N = X.shape[0]  # Number of runs
k = X.shape[1]  # Number of parameters
d_efficiency = (det_XtX ** (1 / k)) / N
d_efficiency_percentage = d_efficiency * 100

print(f"D-efficiency: {d_efficiency_percentage:.2f}%")

D-efficiency: 99.54%


#### Include Replicates & Randomize 

We want to run this experiment with 2 replicates per condition, but we also need to randomize the order of our experimental design.This will avoid introducing any temporal/spatial biases.

In [10]:
experimental_df = pd.concat([design_df, design_df]).reset_index(drop=True)
randomized_rows = np.random.permutation(experimental_df.index)
randomized_df = experimental_df.loc[randomized_rows].reset_index(drop=True)
randomized_df

Unnamed: 0,Concentration A,Concentration B
0,1,25
1,8,25
2,8,20
3,8,10
4,1,15
5,1,20
6,1,20
7,4,10
8,4,10
9,8,15
