© Copyright 2013–2014, Abraham Lee

© Copyright 2019, Dataiku

# Design of Experiments tutorial based on pyDOE

Copied from the pyDOE web: site https://pythonhosted.org/pyDOE/index.html

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [2]:
import dataiku
from pyDOE2 import *

## Factorial design

### General Full-Factorial (fullfact)

This kind of design offers full flexibility as to the number of discrete levels for each factor in the design. Its usage is simple:



In [3]:
levels = np.array([2, 3])
levels.astype(int)
fullfact( levels )

array([[0., 0.],
       [1., 0.],
       [0., 1.],
       [1., 1.],
       [0., 2.],
       [1., 2.]])

where levels is an array of integers.
As can be seen in the output, the design matrix has as many columns as items in the input array.

### 2-Level Full-Factorial (ff2n)

This function is a convenience wrapper to fullfact that forces all the factors to have two levels each, you simply tell it how many factors to create a design for:




In [4]:
ff2n(3)

array([[-1., -1., -1.],
       [ 1., -1., -1.],
       [-1.,  1., -1.],
       [ 1.,  1., -1.],
       [-1., -1.,  1.],
       [ 1., -1.,  1.],
       [-1.,  1.,  1.],
       [ 1.,  1.,  1.]])

### 2-Level Fractional-Factorial (fracfact)

This function requires a little more knowledge of how the confounding will be allowed (this means that some factor effects get muddled with other interaction effects, so it’s harder to distinguish between them).

Let’s assume that we just can’t afford (for whatever reason) the number of runs in a full-factorial design. We can systematically decide on a fraction of the full-factorial by allowing some of the factor main effects to be confounded with other factor interaction effects. This is done by defining an alias structure that defines, symbolically, these interactions. These alias structures are written like “C = AB” or “I = ABC”, or “AB = CD”, etc. These define how one column is related to the others.

For example, the alias “C = AB” or “I = ABC” indicate that there are three factors (A, B, and C) and that the main effect of factor C is confounded with the interaction effect of the product AB, and by extension, A is confounded with BC and B is confounded with AC. A full- factorial design with these three factors results in a design matrix with 8 runs, but we will assume that we can only afford 4 of those runs. To create this fractional design, we need a matrix with three columns, one for A, B, and C, only now where the levels in the C column is created by the product of the A and B columns.

The input to fracfact is a generator string of symbolic characters (lowercase or uppercase, but not both) separated by spaces, like:

In [5]:
gen = 'a b ab'

This design would result in a 3-column matrix, where the third column is implicitly defined as "c = ab". This means that the factor in the third column is confounded with the interaction of the factors in the first two columns. The design ends up looking like this:

In [6]:
fracfact(gen)

array([[-1., -1.,  1.],
       [ 1., -1., -1.],
       [-1.,  1., -1.],
       [ 1.,  1.,  1.]])

Fractional factorial designs are usually specified using the notation $2^{k-p}$, where $k$ is the number of columns and $p$ is the number of effects that are confounded. In terms of resolution level, higher is “better”. The above design would be considered a $2^{3-1}$ fractional factorial design, a 1/2-fraction design, or a Resolution III design (since the smallest alias “I=ABC” has three terms on the right-hand side). Another common design is a Resolution III, $2^{7-4}$ fractional factorial and would be created using the following string generator:

In [7]:
fracfact('a b ab c ac bc abc')

array([[-1., -1.,  1., -1.,  1.,  1., -1.],
       [ 1., -1., -1., -1., -1.,  1.,  1.],
       [-1.,  1., -1., -1.,  1., -1.,  1.],
       [ 1.,  1.,  1., -1., -1., -1., -1.],
       [-1., -1.,  1.,  1., -1., -1.,  1.],
       [ 1., -1., -1.,  1.,  1., -1., -1.],
       [-1.,  1., -1.,  1., -1.,  1., -1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.]])

More sophisticated generator strings can be created using the “+” and “-” operators. The “-” operator swaps the levels of that column like this:

In [8]:
fracfact('a b -ab')

array([[-1., -1., -1.],
       [ 1., -1.,  1.],
       [-1.,  1.,  1.],
       [ 1.,  1., -1.]])

In order to reduce confounding, we can utilize the fold function:

In [9]:
m = fracfact('a b ab')
fold(m)

array([[-1., -1.,  1.],
       [ 1., -1., -1.],
       [-1.,  1., -1.],
       [ 1.,  1.,  1.],
       [ 1.,  1., -1.],
       [-1.,  1.,  1.],
       [ 1., -1.,  1.],
       [-1., -1., -1.]])

Applying the fold to all columns in the design breaks the alias chains between every main factor and two-factor interactions. This means that we can then estimate all the main effects clear of any two-factor interactions. Typically, when all columns are folded, this “upgrades” the resolution of the design.

By default, fold applies the level swapping to all columns, but we can fold specific columns (first column = 0), if desired, by supplying an array to the keyword columns:

In [10]:
fold(m, columns=[2])

array([[-1., -1.,  1.],
       [ 1., -1., -1.],
       [-1.,  1., -1.],
       [ 1.,  1.,  1.],
       [-1., -1., -1.],
       [ 1., -1.,  1.],
       [-1.,  1.,  1.],
       [ 1.,  1., -1.]])

#### Note

Care should be taken to decide the appropriate alias structure for your design and the effects that folding has on it.

### Plackett-Burman (pbdesign)

Another way to generate fractional-factorial designs is through the use of Plackett-Burman designs. These designs are unique in that the number of trial conditions (rows) expands by multiples of four (e.g. 4, 8, 12, etc.). The max number of columns allowed before a design increases the number of rows is always one less than the next higher multiple of four.

For example, I can use up to 3 factors in a design with 4 rows:

In [11]:
pbdesign(3)

array([[-1., -1.,  1.],
       [ 1., -1., -1.],
       [-1.,  1., -1.],
       [ 1.,  1.,  1.]])

But if I want to do 4 factors, the design needs to increase the number of rows up to the next multiple of four (8 in this case):

In [12]:
 pbdesign(4)

array([[-1., -1.,  1., -1.],
       [ 1., -1., -1., -1.],
       [-1.,  1., -1., -1.],
       [ 1.,  1.,  1., -1.],
       [-1., -1.,  1.,  1.],
       [ 1., -1., -1.,  1.],
       [-1.,  1., -1.,  1.],
       [ 1.,  1.,  1.,  1.]])

Thus, an 8-run Plackett-Burman design can handle up to (8 - 1) = 7 factors.

As a side note, It just so happens that the Plackett-Burman and 2^(7-4) fractional factorial design are identical:


In [13]:
np.all(pbdesign(7)==fracfact('a b ab c ac bc abc'))

True

### More Information

If the user needs more information about appropriate designs, please consult the following articles on Wikipedia:

- [Factorial designs](http://en.wikipedia.org/wiki/Factorial_experiment)
- [Plackett-Burman designs](http://en.wikipedia.org/wiki/Plackett-Burman_design)

There is also a wealth of information on the [NIST](http://www.itl.nist.gov/div898/handbook/pri/pri.htm) website about the various design matrices that can be created as well as detailed information about designing/setting-up/running experiments in general.

Any questions, comments, bug-fixes, etc. can be forwarded to the author or the pyDOE package.

## Response Surface Designs

### Box-Behnken (bbdesign)

![Box-Behnken image](http://www.itl.nist.gov/div898/handbook/pri/section3/gifs/bb.gif)

Box-Behnken designs can be created using the following simple syntax:

In [14]:
n=3
bbdesign(n, center=1)

array([[-1., -1.,  0.],
       [ 1., -1.,  0.],
       [-1.,  1.,  0.],
       [ 1.,  1.,  0.],
       [-1.,  0., -1.],
       [ 1.,  0., -1.],
       [-1.,  0.,  1.],
       [ 1.,  0.,  1.],
       [ 0., -1., -1.],
       [ 0.,  1., -1.],
       [ 0., -1.,  1.],
       [ 0.,  1.,  1.],
       [ 0.,  0.,  0.]])

where n is the number of factors (at least 3 required) and center is the number of center points to include. If no inputs given to center, then a pre-determined number of points are automatically included.

## Central Composite (ccdesign)
![Central Composite image](http://www.itl.nist.gov/div898/handbook/pri/section3/gifs/fig5.gif)
Central composite designs can be created and customized using the syntax:


In [15]:
n=3
ccdesign(3, center=(0, 1), alpha='r', face='cci')

array([[-0.59460356, -0.59460356, -0.59460356],
       [ 0.59460356, -0.59460356, -0.59460356],
       [-0.59460356,  0.59460356, -0.59460356],
       [ 0.59460356,  0.59460356, -0.59460356],
       [-0.59460356, -0.59460356,  0.59460356],
       [ 0.59460356, -0.59460356,  0.59460356],
       [-0.59460356,  0.59460356,  0.59460356],
       [ 0.59460356,  0.59460356,  0.59460356],
       [-1.        ,  0.        ,  0.        ],
       [ 1.        ,  0.        ,  0.        ],
       [ 0.        , -1.        ,  0.        ],
       [ 0.        ,  1.        ,  0.        ],
       [ 0.        ,  0.        , -1.        ],
       [ 0.        ,  0.        ,  1.        ],
       [ 0.        ,  0.        ,  0.        ]])



where
- n is the number of factors,
- center is a 2-tuple of center points (one for the factorial block, one for the star block, default (4, 4)),
- alpha is either “orthogonal” (or “o”, default) or “rotatable” (or “r”)
- face is either “circumscribed” (or “ccc”, default), “inscribed” (or “cci”), or “faced” (or “ccf”).

![cc2 image](http://www.itl.nist.gov/div898/handbook/pri/section3/gifs/ccd2.gif)

The two optional keyword arguments alpha and face help describe how the variance in the quadratic approximation is distributed. Please see the NIST web pages if you are uncertain which options are suitable for your situation.

#### Note

‘ccc’ and ‘cci’ can be rotatable designs, but ‘ccf’ cannot.
If face is specified, while alpha is not, then the default value of alpha is ‘orthogonal’.

### More Information
If the user needs more information about appropriate designs, please consult the following articles on Wikipedia:

- [Box-Behnken designs](http://en.wikipedia.org/wiki/Box-Behnken_design)
- [Central composite designs](http://en.wikipedia.org/wiki/Central_composite_design)

There is also a wealth of information on the [NIST](http://www.itl.nist.gov/div898/handbook/pri/pri.htm) website about the various design matrices that can be created as well as detailed information about designing/setting-up/running experiments in general.

Any questions, comments, bug-fixes, etc. can be forwarded to the author of the package.

## Randomized Designs

### Latin-Hypercube (lhs)

![Latin-Hypercube image](https://pythonhosted.org/pyDOE/_images/lhs.png)

Latin-hypercube designs can be created using the following simple syntax:


In [16]:
n = 4
lhs(n, samples=10, criterion='center')

array([[0.25, 0.75, 0.65, 0.45],
       [0.05, 0.25, 0.35, 0.95],
       [0.15, 0.45, 0.05, 0.05],
       [0.85, 0.65, 0.75, 0.65],
       [0.75, 0.95, 0.95, 0.25],
       [0.55, 0.35, 0.45, 0.85],
       [0.95, 0.05, 0.85, 0.75],
       [0.65, 0.15, 0.15, 0.15],
       [0.35, 0.55, 0.25, 0.55],
       [0.45, 0.85, 0.55, 0.35]])

where

- `n`: an integer that designates the number of factors (required)
- `samples`: an integer that designates the number of sample points to generate for each factor (default: n)
criterion: a string that tells lhs how to sample the points (default: None, which simply randomizes the points within the intervals):
- `"center"` or `"c"`: center the points within the sampling intervals
- `“maximin”` or `“m”`: maximize the minimum distance between points, but place the point in a randomized location within its interval
- `“centermaximin”` or `“cm”`: same as `“maximin”`, but centered within the intervals
- `“correlation”` or `“corr”`: minimize the maximum correlation coefficient
The output design scales all the variable ranges from zero to one which can then be transformed as the user wishes (like to a specific statistical distribution using the `scipy.stats.distributions` `ppf` (inverse cumulative distribution) function. An example of this is shown below.

For example, if I wanted to transform the uniform distribution of 8 samples to a normal distribution (mean=0, standard deviation=1), I would do something like:

In [17]:
from scipy.stats.distributions import norm
lhd = lhs(2, samples=5)
lhd = norm(loc=0, scale=1).ppf(lhd)  # this applies to both factors here

Graphically, each transformation would look like the following, going from the blue sampled points (from using lhs) to the green sampled points that are normally distributed:

![LHS custom distribution](https://pythonhosted.org/pyDOE/_images/lhs_custom_distribution.png)

#### Customizing with Statistical Distributions

Now, let’s say we want to transform these designs to be normally distributed with means = [1, 2, 3, 4] and standard deviations = [0.1, 0.5, 1, 0.25]:

In [18]:
design = lhs(4, samples=10)
from scipy.stats.distributions import norm
means = [1, 2, 3, 4]
stdvs = [0.1, 0.5, 1, 0.25]
for i in range(4):
     design[:, i] = norm(loc=means[i], scale=stdvs[i]).ppf(design[:, i])

design

array([[0.9344455 , 1.71150008, 2.63947157, 4.01253087],
       [0.95689517, 1.95756214, 4.08693223, 3.46280742],
       [0.90589406, 1.32548035, 0.69810848, 3.68744378],
       [1.02888701, 1.55931855, 3.52395375, 4.14825359],
       [1.00969689, 2.9871124 , 1.87219992, 3.84125315],
       [1.20057404, 2.00031489, 2.99116498, 3.92192039],
       [1.05781123, 2.57684236, 4.76172202, 4.32737695],
       [0.98936778, 2.31990058, 3.69554855, 4.07213995],
       [1.09154272, 2.25534676, 3.05944951, 3.96697131],
       [0.77283359, 1.74279091, 2.31553605, 4.31923263]])

#### Note

Methods for “space-filling” designs and “orthogonal” designs are in the works, so stay tuned! However, simply increasing the samples reduces the need for these anyway.

### More Information

If the user needs more information about appropriate designs, please consult the following articles on Wikipedia:

- [Latin-Hypercube designs](http://en.wikipedia.org/wiki/Latin_hypercube_sampling)

There is also a wealth of information on the [NIST](http://www.itl.nist.gov/div898/handbook/pri/pri.htm) website about the various design matrices that can be created as well as detailed information about designing/setting-up/running experiments in general.

Any questions, comments, bug-fixes, etc. can be forwarded to the author of the pyDOE package.