# dexpy

[[code]](https://github.co/statease/dexpy)
[[doc]](https://statease.github.io/dexpy/)  
Dexpy is a package for constructing experimental designs, featuring screening, response surface, mixture and optimal designs.  
It is sadly no longer maintained and does not look fully polished, but is certainly useful as a resource, e.g. for the coordinate-exchange method or the hit-and-run sampler.

In [1]:
from util import compare_repos
compare_repos(["statease/dexpy"])

Unnamed: 0_level_0,stars,forks,contributors,commits,open_issues,closed_issues,created,last_commit,license
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
dexpy,10,2,2,247,8,51,2016-09-17,2018-06-17,NOASSERTION


In [2]:
import dexpy
import dexpy.factorial
import dexpy.optimal

## Full factorial / fractional designs

In [3]:
# full factorial design for 3 factors with 2 levels 
dexpy.factorial.build_factorial(factor_count=3, run_count=2**3)

Unnamed: 0,X1,X2,X3
0,-1,-1,-1
1,-1,-1,1
2,-1,1,-1
3,-1,1,1
4,1,-1,-1
5,1,-1,1
6,1,1,-1
7,1,1,1


In [4]:
# fractional design
dexpy.factorial.build_factorial(factor_count=3, run_count=2**2)

Unnamed: 0,X1,X2,X3
0,-1,-1,1
1,-1,1,-1
2,1,-1,-1
3,1,1,1


In [6]:
import dexpy.ccd
dexpy.ccd.build_ccd(factor_count=3, alpha='rotatable', center_points=1)

Unnamed: 0,X1,X2,X3
0,-1.0,-1.0,-1.0
1,-1.0,-1.0,1.0
2,-1.0,1.0,-1.0
3,-1.0,1.0,1.0
4,1.0,-1.0,-1.0
5,1.0,-1.0,1.0
6,1.0,1.0,-1.0
7,1.0,1.0,1.0
0,-1.681793,0.0,0.0
1,1.681793,0.0,0.0


## Mixture design

For mixtures, where the sum of components equals 1, dexpy offers a function to build a full lattice to estimate models of the given order, and a centroid design.

In [9]:
import dexpy.simplex_lattice
dexpy.simplex_lattice.build_simplex_lattice(factor_count=3, model_order=ModelOrder.quadratic)

Unnamed: 0,X1,X2,X3
0.0,1.0,0.0,0.0
1.0,0.0,1.0,0.0
2.0,0.0,0.0,1.0
3.0,0.5,0.5,0.0
4.0,0.5,0.0,0.5
5.0,0.0,0.5,0.5


In [5]:
import dexpy.simplex_centroid
dexpy.simplex_centroid.build_simplex_centroid(factor_count=3)

Unnamed: 0,X1,X2,X3
0.0,1.0,0.0,0.0
1.0,0.0,1.0,0.0
2.0,0.0,0.0,1.0
3.0,0.5,0.5,0.0
4.0,0.5,0.0,0.5
5.0,0.0,0.5,0.5
6.0,0.333333,0.333333,0.333333


## Optimal designs

Dexpy can build D-optimal designs for polynomial models which are optimized using the coordinate-exchange algorithm.  
* Order (of the polynomial) needs to be a `model.ModelOrder` object (constant / linear / quadratic / cubic). Models with interaction effects only, are not supported. 
* The number of runs `run_count` has to equal or higher than the model rank.

Not supported are models that include discrete / categorical factors and constraints on the factors.

In [8]:
import dexpy.optimal
from dexpy.model import ModelOrder

dexpy.optimal.build_optimal(factor_count=1, order=ModelOrder.quadratic)

Unnamed: 0,X1
0,-1.0
1,1.0
2,-0.078066


In [None]:
design = dexpy.optimal.build_optimal(factor_count=2, order=ModelOrder.cubic)
design

## Coffee Taste Example
* https://github.com/statease/dexpy
* https://statease.github.io/dexpy/example-coffee.html
* https://www.statease.com/publications/newsletter/stat-teaser-09-16#article1

### Problem Description
A coffee taste test was conducted at the Stat-Ease office to improve the taste of the coffee.
We will look at 5 input factors:
 * Amount of Coffee (2.5 to 4.0 oz.)
 * Grind size (8-10mm)
 * Brew time (3.5 to 4.5 minutes)
 * Grind Type (burr vs blade)
 * Coffee beans (light vs dark)

With one output, or `response`, variable:
 * Average overall liking (1-9)

The liking is an average of the scores of a panel of 5 office coffee drinkers.

In [18]:
import dexpy
from dexpy import factorial, power
import pandas as pd
import numpy as np
import patsy
import statsmodels.formula.api as smf

column_names = ['amount', 'grind_size', 'brew_time', 'grind_type', 'beans']

### Full Factorial Design
A full factorial, that is, running all combinations of lows and highs, would take $2^5 = 32$ taste tests.
We want to add 8 center point runs to check for curvature, bringing the total number of runs up to 40.  We can only do 3 per day, so as not to over-caffienate our testers, and can only do the tests on days when all 5 testers are in the office. That means the test will probably take a month or so.

```dexpy.power.f_power``` calculates the probability that the F-statistic is above its critical value (alpha) given an effect of some size (signal-to-noise ratio).

Here we calculate the power, assuming a signal to noise ratio of 2.

In [19]:
design = dexpy.factorial.build_factorial(5, 2**5)
design.columns = column_names
center_points = [
    [0, 0, 0, -1, -1],
    [0, 0, 0, -1, 1],
    [0, 0, 0, 1, -1],
    [0, 0, 0, 1, 1]
]
design = design.append(pd.DataFrame(2 * center_points, columns=columns))

# estimate power
model = ' + '.join(design.columns)  # linear model
sn = 2.0  # signal to noise ratio
alpha = 0.05  # significance
est_power = dexpy.power.f_power(model, design, sn, alpha)
est_power.pop(0)  # remove intercept

print("\nEstimated power for full factorial:")
pd.DataFrame({'Power': est_power}, index=design.columns)


Estimated power for full factorial:


Unnamed: 0,Power
amount,0.999793
grind_size,0.999793
brew_time,0.999793
grind_type,0.999985
beans,0.999985


### Fractional Design

This means we have a 99.97% chance of detecting a change of 2 taste rating, assuming a standard deviation of 1 taste rating for the experiment. 
This is high enough that we decide to run a fraction instead, and get the experiment done more quickly. We can create a $2^{5-1}$ fractional factorial, which will have 16 runs, along with the 8 center points for a total of 24. As you can see the power is still quite good.

In [20]:
design = dexpy.factorial.build_factorial(5, 2**(5-1))
design.columns = column_names
center_points = [
    [0, 0, 0, -1, -1],
    [0, 0, 0, -1, 1],
    [0, 0, 0, 1, -1],
    [0, 0, 0, 1, 1]
]
design = design.append(pd.DataFrame(2 * center_points, columns=design.columns))

# estimate power
est_power = dexpy.power.f_power(model, design, sn, alpha)
est_power.pop(0) # remove intercept

print("\nPower for fractional factorial:")
pd.DataFrame({'Power': est_power}, index=design.columns)


Power for fractional factorial:


Unnamed: 0,Power
amount,0.965528
grind_size,0.965528
brew_time,0.965528
grind_type,0.99614
beans,0.99614


We can also check the power for the interaction model:

In [21]:
twofi_model = "(" + '+'.join(columns) + ")**2"
print(twofi_model)
desc = patsy.ModelDesc.from_formula(twofi_model)
est_power = dexpy.power.f_power(twofi_model, design, sn, alpha)
est_power.pop(0) # remove intercept

print("\nPower for fractional factorial (2FI model):")
pd.DataFrame({'Power': est_power}, index=desc.describe().strip("~ ").split(" + "))

(amount+grind_size+brew_time+grind_type+beans)**2

Power for fractional factorial (2FI model):


Unnamed: 0,Power
amount,0.936745
grind_size,0.936745
brew_time,0.936745
grind_type,0.98912
beans,0.98912
amount:grind_size,0.936745
amount:brew_time,0.936745
amount:grind_type,0.936745
amount:beans,0.936745
grind_size:brew_time,0.936745


### Run the Experiment
We can build the $2^{5-1}$ design using build_factorial, then appending the center point runs.
It is convenient to print out the design in actual values, rather than the coded -1 and +1 values, for when we make the coffee.

In [22]:
design = dexpy.factorial.build_factorial(5, 2**(5-1))
design.columns = column_names
center_points = [
    [0, 0, 0, -1, -1],
    [0, 0, 0, -1, 1],
    [0, 0, 0, 1, -1],
    [0, 0, 0, 1, 1]
]
design = design.append(pd.DataFrame(2 * center_points, columns=design.columns))

actual_lows = { 'amount' : 2.5, 'grind_size' : 8, 'brew_time': 3.5,
                'grind_type': 'burr', 'beans': 'light' }
actual_highs = { 'amount' : 4, 'grind_size' : 10, 'brew_time': 4.5,
                 'grind_type': 'blade', 'beans': 'dark' }
actual_design = dexpy.design.coded_to_actual(design, actual_lows, actual_highs)
actual_design

Unnamed: 0,amount,grind_size,brew_time,grind_type,beans
0,2.5,8.0,3.5,burr,dark
1,2.5,8.0,3.5,blade,light
2,2.5,8.0,4.5,burr,light
3,2.5,8.0,4.5,blade,dark
4,2.5,10.0,3.5,burr,light
5,2.5,10.0,3.5,blade,dark
6,2.5,10.0,4.5,burr,dark
7,2.5,10.0,4.5,blade,light
8,4.0,8.0,3.5,burr,light
9,4.0,8.0,3.5,blade,dark


### Results of the experiment
All that is left is to drink 24 pots of coffee and record the results. Note that, while the tables in this example are in a sorted order, the actual experiment was run in random order. This is done to reduce the possibility of incidental variables influencing the results. For example, if the temperature in the office for the first 8 runs was cold, the testers may have rated the taste higher. Hot coffee being more pleasing in a cold environment. If the first 8 runs were the only runs where amount was at its low setting, as it is in the sorted table above, we would confound the low amount effect with the effect of the cold office, and incorrectly conclude that a lower amount of coffee is better.

In [23]:
design['taste_rating'] = [
    4.4, 2.6, 2.4, 8.6, 1.6, 2.8, 7.2, 3.4,
    6.8, 3.4, 3.8, 9.0, 5.2, 3.6, 8.2, 7.0,
    5.4, 6.8, 3.6, 5.4, 4.8, 6.2, 4.4, 5.8
]

### Fit 2-factor interaction model
The statsmodels package has an OLS fitting routine that takes a patsy formula.
We can reduce this model by keeping only terms that have a p-value below 0.05 (bolded in the table above).

In [24]:
lm = smf.ols("taste_rating ~" + twofi_model, data=design).fit()
lm.summary()

0,1,2,3
Dep. Variable:,taste_rating,R-squared:,0.959
Model:,OLS,Adj. R-squared:,0.883
Method:,Least Squares,F-statistic:,12.56
Date:,"Mon, 17 Feb 2020",Prob (F-statistic):,0.000589
Time:,15:28:07,Log-Likelihood:,-12.392
No. Observations:,24,AIC:,56.78
Df Residuals:,8,BIC:,75.63
Df Model:,15,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,5.1000,0.143,35.572,0.000,4.769,5.431
amount,0.8750,0.176,4.983,0.001,0.470,1.280
grind_size,-0.1250,0.176,-0.712,0.497,-0.530,0.280
brew_time,1.2000,0.176,6.834,0.000,0.795,1.605
grind_type,-0.1333,0.143,-0.930,0.380,-0.464,0.197
beans,0.4500,0.143,3.139,0.014,0.119,0.781
amount:grind_size,0.2500,0.176,1.424,0.192,-0.155,0.655
amount:brew_time,-0.0750,0.176,-0.427,0.681,-0.480,0.330
amount:grind_type,-0.1750,0.176,-0.997,0.348,-0.580,0.230

0,1,2,3
Omnibus:,6.634,Durbin-Watson:,2.36
Prob(Omnibus):,0.036,Jarque-Bera (JB):,4.674
Skew:,0.757,Prob(JB):,0.0966
Kurtosis:,4.543,Cond. No.,1.22


### Fit reduced model

In [25]:
reduced_model = "amount + grind_size + brew_time + beans + grind_size:beans"
lm = smf.ols("taste_rating ~" + reduced_model, data=design).fit()
lm.summary()

0,1,2,3
Dep. Variable:,taste_rating,R-squared:,0.44
Model:,OLS,Adj. R-squared:,0.285
Method:,Least Squares,F-statistic:,2.831
Date:,"Mon, 17 Feb 2020",Prob (F-statistic):,0.0467
Time:,15:28:08,Log-Likelihood:,-43.837
No. Observations:,24,AIC:,99.67
Df Residuals:,18,BIC:,106.7
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,5.1000,0.354,14.394,0.000,4.356,5.844
amount,0.8750,0.434,2.016,0.059,-0.037,1.787
grind_size,-0.1250,0.434,-0.288,0.777,-1.037,0.787
brew_time,1.2000,0.434,2.765,0.013,0.288,2.112
beans,0.4500,0.354,1.270,0.220,-0.294,1.194
grind_size:beans,0.3750,0.434,0.864,0.399,-0.537,1.287

0,1,2,3
Omnibus:,1.335,Durbin-Watson:,2.746
Prob(Omnibus):,0.513,Jarque-Bera (JB):,0.567
Skew:,-0.367,Prob(JB):,0.753
Kurtosis:,3.164,Cond. No.,1.22
