# 08 Design of experiments

DOE, DOX, or experimental design is an entire field of expertise with dedicated courses in the EMSE department for example.  

Experiments aim to test variations of a system under conditions that are hypothetize to create the variations.  In other words, it is the study of a system output (_dependent/output variables_ ) by controlling the input conditions (_independent/input variables_ ) in a predetermined way.  Additionally, there are _control variables_ that must be kept constant or monitored to prevent external factors from affecting the experiment results.  In an experiment, by design the input variables are controlled.  In a quasi-experiment, natural conditions are monitored while the output variables are observed.

Therefore, the goal of an experiment are set the input variables range or discrete values to test the system and monitor the system output.  

The main concerns of DOE are:
- Validity of results

Did I test the system over a range of input parameters for which it will be used?  Can I do a wind tunnel test of a model aircraft at the same Mach and Reynolds numbers expected in flight conditions?

''The first step is clearly defining what it is you're after, because without knowing that, you'll never get it.''
Halle Berry, actress

- Reliability

Are the results consistent?  It is necessary to perform repeat runs to test reliability.  Repeat runs enable to perform ensemble averaging and identify potential outliers.

- Reproducibility

Based on published report (or scientific manuscript), can another group reproduce the same experimental conditions and obtain similar results?

- Statistical significance

Measurements will have inherent variations and measurement uncertainty.  If we are dealing with a nondeterministic stationary system, datasets must be sufficiently large to be satistically converged in order to have estimate of populations statistics with confidence levels.  

_Question:_
What should be the minimum number of repeat runs one needs to perform?

## Design steps

Designing an experiments involves several steps that must be treated symbiotically.  Here is a simplified highlight of those steps.  Remember that when you have to conduct an experiment, you have incompressible parameters: your budget, timeline, and expertize.  

- Define the problem

This is sometimes the hardest part. 

First we need to define __output of the experiment__. This could be determining the operating range of an instrument, an optimum component in a system, or measuring the drag on an airfoil.


- Design the experiment
<img src="img/DesignExperiment.png" width="360">

- Construct the experiment

- Acquire data

- Analyze data

- Do confirmatory experiments (if necessary)

- Interpret and report results and methodology.  

Think about the wisdom pyramid introduced in the first lecture.  Running the experiment produced data.  The analysis of the data produced information.  Can we develop or test a model with the data?  Then we would have reached the knowledge step.  If we can test the validity of the model, then we would have wisdom.

If we can reach at least the knowledge step of the pyramid, we can answer the question "so what?"



## Test Matrix design
This module is focused on methodologies to determine the optimal number of experimental runs, ie the test matrix.  Here we are looking for the optimum output/dependent variable ($X$) as a function of input/independent variables ($a, b, c, ...$).

We are trying to determine the minimal number of runs to perform to save on cost and time.

The input parameters can be dependent (i.e. $a$ affects $b$) or independent of each others (i.e. $a$ has no effect on $b$). Dependence/independence between the parameters will affect the design of the test matrix significantly.  This is best visualized with the graphs below for a function $X(a,b)$.

<img src="img/IndependentInputVar.png" width="360">
This plot is an _isocontour plot_.  The lines are contours for which $X$ is constant. Here if one changes _one parameter at a time_ we keep $b$ constant and find the optimum for $a$.  Then keeping $a$ at this value, one finds the optimum for $b$.  We can then iterate, keeping $b$ to its optimal value and optimizing for $a$.  If the variables are indepdendent from each other (like in the graph above), we should already be at the optimum for $a$, so the iteration will stop.  


In the case when the two variables are not independent of each other we are in the case below.  Here the isocontour lines are titled with respect to the axes.
<img src="img/DependentInputVar.png" width="360">
Here if we use the _one parameter at a time_ approach, we will first find a local optimum for $a$, keeping $b$ constant.  Then we keep $a$ constant optimize for $b$.  When doing another iteration, we will reach another optimal value for $a$ and we must continue the iteration, which can be tedious.

Changing _one parameter at a time_ can be a very tedious process and lead to a very large test matrix.  The Taguchi technique helps in achieving optimal matrices by changing all the parameters simultaneously.  


### Full factorial analysis
Let's use the following notation.  
- $P$: number of parameters
- $L$: number of levels each parameter is tested at

In a full factorial analysis, each parameter $P$ is tested to $L$ values, while keeping the other parameters constant.  This leads to $N = L^P$ tests.

The graph below is for $P=3$, $L=2$.

<img src="img/DOE_Cube.png" width="300">

_Question_: define the matrix of experiment for a full factorial in this case.  

Here we have to do $N = 2^3 = 8$ runs.
\begin{array}{c |c|c|c| c}
Run\, \# & a & b & c & X \\
\hline
1 & 1 & 1 & 1& X_1\\
2 & 2 & 1 & 1& X_2\\
3 & 1 & 2 & 1& X_3\\
4 & 2 & 2 & 1& X_4\\
5 & 1 & 1 & 2& X_5\\
6 & 2 & 1 & 2& X_6\\
7 & 1 & 2 & 2& X_7\\
8 & 2 & 2 & 2& X_8\\
\hline
\end{array}
So in this case, we have to take a point at each corner of the cube.

### Fractional factorial analysis
In fractional factorial analysis, one runs only a fraction of the full factorial analysis.  

Let's go back to the $P=3$, $L=2$ example above.  We can skip some test point and only do four runs $(X_1, X_2, X_3, X_4)$ illustrated on the figure below. Here the runs are represented with red dots.
<img src="img/DOE_CubeFrac.png" width="300">

We can still recover how $X$ is affected by $a$ aloneby performing a level average.  The level average for $a$ at level 1 is the average of all the tests for which $a$ is at level 1: $\overline{X}_{a1} = 1/2 (X_1 + X_2)$.  Likewise the level average for $a$ at level 2 is  $\overline{X}_{a2} = 1/2 (X_3 + X_4)$.

Here the matrix of experiment reduces to:
\begin{array}{c |c|c|c| c}
Run\, \# & a & b & c & X \\
\hline
1 & 1 & 1 & 1 & X_1\\
2 & 1 & 2 & 2 & X_2\\
3 & 2 & 1 & 2 & X_3\\
4 & 2 & 2 & 1 & X_4\\
\hline
\end{array}
### Taguchi design array


Taguchi design array or test matrices/design arrays/orthogonal arrays 

__Optimum Taguchi Design Array for fractional factorial analysis__
> Each level of each parameter appears the same number of times in the array.

> Repetitions of parameter-level combinations are minimized as much as possible.

_Question_ Check that the two test matrices we developed above for the $P = 3$, $L = 2$ satisfy design rules for Taguchi arrays.



### Taguchi design arrays
Here are design arrays that have been compiled already.  Use them when you need to design a test matrix.

<img src="img/TaguchiArrays.png" width="400">

### Examples

One student proposes this experimental design array for 3 parameters and 4 levels for each parameter, choosing to test each level of each parameter _twice_.

\begin{array}{c |c|c|c| c}
Run\, \# & a & b & c & X \\
\hline
1 & 1 & 1 & 1& X_1\\
2 & 1 & 2 & 3& X_2\\
\hline
3 & 2 & 3 & 4& X_3\\
4 & 2 & 4 & 2& X_4\\
\hline
5 & 3 & 1 & 2& X_5\\
6 & 3 & 2 & 3& X_6\\
\hline
7 & 4 & 2 & 3& X_7\\
8 & 4 & 4 & 4& X_8\\
\hline
\end{array}

In [6]:
print('Full factorial analysis will require:')
N=4**3
print('N = ', N, ' tests')

Full factorial analysis will require:
N =  64  tests


Explain why this array is not a proper Taguchi array. How would you fix it?

Another student has used the following Taguchi array with 3 parameters and 4 levels for each parameter, choosing to test each level of each parameter $twice$.  The resulting Taguchi array is valid, and the experimental results are given here:
\begin{array}{c |c|c|c| c}
Run\, \# & a & b & c & X \\
\hline
1 & 1 & 1 & 1& 1.51\\
2 & 1 & 2 & 4& 1.84\\
\hline
3 & 2 & 3 & 3& 2.11\\
4 & 2 & 4 & 2& 0.58\\
\hline
5 & 3 & 1 & 2& 2.35\\
6 & 3 & 2 & 3& 2.44\\
\hline
7 & 4 & 3 & 1& 1.40\\
8 & 4 & 4 & 4& 1.98\\
\hline
\end{array}

Calculate level average $\overline{X}_{b3}$ and $\overline{X}_{c1}$

X_b3 = 1/2*(X_3 + X_7)

X_c1 = 1/2*(X_1 + X_7)

In [8]:
X_1 = 1.51;  X_2 = 1.84; X_3 = 2.11; X_4 = 0.58; X_5 = 2.35; X_6 = 2.44; X_7 = 1.40; X_8 = 1.98

X_b3 = 1/2*(X_3 + X_7)
X_c1 = 1/2*(X_1 + X_7)
print(X_b3, X_c1)

1.755 1.455


X_b1, X_b2, X_b4?

X_b1 = 1/2*(X_1 + X_5)

In [8]:
X_b1 = 1/2*(X_1 + X_5); X_b2 = 1/2*(X_2 + X_6); X_b4 = 1/2*(X_4 + X_8)
print(X_b1, X_b2, X_b3, X_b4)

1.9300000000000002 2.14 1.755 1.28
