# Design of Experiment

In [1]:
from pyDOE import *
import pandas as pd
import statsmodels.api as sm
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

## Experiment generation

We use pyDOE to generate multiple samples in the experiment space with a uniform distribution (https://pythonhosted.org/pyDOE/randomized.html#randomized)

In [2]:
a = lhs(11, samples=400, criterion='center')

We then format the samples in order to beb able to feed them to the website https://arnaud-legrand.shinyapps.io/design_of_experiments/?user_f604

In [3]:
samples = []
for i in a:
    samples.append(["%.3f" % j for j in i])
for i in samples:
    print(','.join(i))

0.284,0.051,0.219,0.674,0.734,0.664,0.569,0.284,0.971,0.129,0.409
0.966,0.321,0.639,0.434,0.301,0.609,0.964,0.214,0.629,0.589,0.621
0.774,0.316,0.501,0.554,0.701,0.164,0.289,0.861,0.621,0.009,0.556
0.959,0.759,0.426,0.016,0.761,0.124,0.949,0.071,0.461,0.206,0.156
0.791,0.586,0.951,0.949,0.884,0.001,0.044,0.724,0.331,0.739,0.966
0.604,0.934,0.214,0.641,0.489,0.749,0.541,0.716,0.739,0.604,0.756
0.716,0.044,0.479,0.371,0.349,0.091,0.589,0.641,0.974,0.579,0.429
0.679,0.156,0.766,0.574,0.329,0.556,0.204,0.319,0.434,0.094,0.796
0.831,0.421,0.126,0.571,0.451,0.244,0.274,0.661,0.184,0.436,0.234
0.346,0.191,0.891,0.216,0.626,0.239,0.856,0.931,0.096,0.841,0.284
0.654,0.481,0.961,0.984,0.891,0.621,0.181,0.289,0.886,0.861,0.131
0.871,0.921,0.094,0.149,0.524,0.781,0.626,0.499,0.561,0.019,0.826
0.319,0.551,0.131,0.321,0.956,0.031,0.976,0.196,0.646,0.631,0.119
0.534,0.906,0.489,0.979,0.506,0.694,0.216,0.229,0.446,0.879,0.296
0.466,0.999,0.109,0.841,0.156,0.509,0.404,0.001,0.936,0.989,0.809
0.326,0.37

## Result analysis

In [4]:
dataset = pd.read_csv("user_f604")

In [5]:
y = dataset['y']
x = dataset.drop('y', axis=1).drop('Date', axis=1)
x = sm.add_constant(x)

In [6]:
model = sm.OLS(y, x).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.582
Model:                            OLS   Adj. R-squared:                  0.576
Method:                 Least Squares   F-statistic:                     99.76
Date:                Wed, 10 Jan 2024   Prob (F-statistic):          4.46e-141
Time:                        17:52:42   Log-Likelihood:                -683.25
No. Observations:                 800   AIC:                             1391.
Df Residuals:                     788   BIC:                             1447.
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.1606      0.123      9.448      0.0

From what we see on the linear regression, it seems that a few parameters have very low to low impact on the outcome. This is the case of x2, x3, x5, x6, x8, x10 and x11 (P > |t| > 5%).
Let's try to remove them from the regression process and redo the regression.

In [11]:
x2 = dataset[['x1', 'x4', 'x7', 'x9']]
x2 = sm.add_constant(x2)
model2 = sm.OLS(y, x2).fit()
print(model2.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.578
Model:                            OLS   Adj. R-squared:                  0.576
Method:                 Least Squares   F-statistic:                     272.7
Date:                Wed, 10 Jan 2024   Prob (F-statistic):          1.74e-147
Time:                        18:00:30   Log-Likelihood:                -686.67
No. Observations:                 800   AIC:                             1383.
Df Residuals:                     795   BIC:                             1407.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.1676      0.074     15.795      0.0

Le R² semble assez mauvais, ce qui tendrait à laisser penser que la modélisation linéaire n'est pas très bonne. Un petit peu de connaissance préalable sur ce que la simulat