
# Exercise 2
In a process for the production of metal laminates we collected 100 sequential measurements of laminate width (time series ‘A’ “Statistical Control by monitoring and feedback adjustment” Box Luceño – J. Wiley)


Identify and fit a model for the data.

<t1 style="color:red">
In a future class:
  
Design a SCC control chart and a FVC control chart  
</t1>


In [None]:
# Import the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
import seaborn as sns

# Import the dataset
data = pd.read_csv('ESE4_ex2.csv')

# Inspect the dataset
data.head()

In [None]:
# Plot the data 
plt.plot(data['EXE2'], 'o-')
plt.xlabel('Index')
plt.ylabel('EXE2')
plt.title('Time series plot of EXE2')
plt.grid()
plt.show()

> let's verify the time dependence assumption with runs test and ACF/PACF

In [None]:
# Import the necessary libraries for the runs test
from statsmodels.sandbox.stats.runs import runstest_1samp

_, pval_runs = runstest_1samp(data['EXE2'], correction=False)
print('Runs test p-value = {:.3f}'.format(pval_runs))

# Plot the acf and pacf using the statsmodels library
import statsmodels.graphics.tsaplots as sgt

fig, ax = plt.subplots(2, 1)
sgt.plot_acf(data['EXE2'], lags = int(len(data)/3), zero=False, ax=ax[0])
fig.subplots_adjust(hspace=0.5)
sgt.plot_pacf(data['EXE2'], lags = int(len(data)/3), zero=False, ax=ax[1], method = 'ywm')
plt.show()

> The process is NON-STATIONARY.
>
> Let's try to apply the difference operator.

In [None]:
data['diff1'] = data['EXE2'].diff(1)

plt.plot(data['diff1'], 'o-')
plt.xlabel('Index')
plt.ylabel('DIFF 1')
plt.title('Time series plot of DIFF 1')
plt.grid()
plt.show()

> Let's verify again the time dependence assumption with runs test and ACF/PACF on the DIFF1 data

In [None]:
_, pval_runs = runstest_1samp(data['diff1'][1:], correction=False)
print('Runs test p-value = {:.3f}'.format(pval_runs))

fig, ax = plt.subplots(2, 1)
sgt.plot_acf(data['diff1'][1:], lags = int(len(data)/3), zero=False, ax=ax[0])
fig.subplots_adjust(hspace=0.5)
sgt.plot_pacf(data['diff1'][1:], lags = int(len(data)/3), zero=False, ax=ax[1], method = 'ywm')
plt.show()

> After the differencing operation, the most suitable model seems to be an MA(1). Thus the investigated model is ARIMA(0,1,1)

In [None]:
# calculate an ARIMA model: import the necessary library
import qda

> The function `qda.ARIMA()` requires as inputs:
> 1. The dataframe with the data.
> 2. The `order` parameter, i.e., the $(p, d, q)$ of the model: $AR(p)$, $I(d)$, $MA(q)$.
> 3. The `add_constant` parameter, i.e. the presence of a constant term in the model:
>    - `False`, for no constant term.
>    - `True`, for a constant term.

In [None]:
# fit model ARIMA with constant term
x = data['EXE2']
model = qda.ARIMA(x, order=(0,1,1), add_constant = True) 

qda.ARIMAsummary(model)

> The calculated ARIMA model is in the form:
>
> $Y_t - Y_{t-1} = \nabla Y_t = \mu - \theta_{1}  \epsilon_{t-1} + \epsilon_t $
>
> The constant term ha a p-value of 0.169. Let's remove the constant value by omitting the `trend` parameter.

In [None]:
# fit model ARIMA with constant term
x = data['EXE2']
model = qda.ARIMA(x, order=(0,1,1), add_constant=False) # ARIMA(p,d,q), no constant term

qda.ARIMAsummary(model)

> The calculated ARIMA model is in the form:
>
> $Y_t - Y_{t-1} = \nabla Y_t = \theta_{1}  \epsilon_{t-1} + \epsilon_t $
>
> Let's check the assumptions on the residuals

In [None]:
#extract the residuals
residuals = model.resid[1:]

# Perform the Shapiro-Wilk test
_, pval_SW = stats.shapiro(residuals)
print('Shapiro-Wilk test p-value = %.3f' % pval_SW)

# Plot the qqplot
stats.probplot(residuals, dist="norm", plot=plt)
plt.show()

In [None]:
fig, ax = plt.subplots(2, 1)
sgt.plot_acf(residuals, lags = int(len(data)/3), zero=False, ax=ax[0])
fig.subplots_adjust(hspace=0.5)
sgt.plot_pacf(residuals, lags = int(len(data)/3), zero=False, ax=ax[1], method = 'ywm')
plt.show()

Try at home: Bartlett test and LBQ test on ARIMA model residuals

In [None]:
fig, axs = plt.subplots(2, 2)
fig.suptitle('Residual Plots')
stats.probplot(residuals, dist="norm", plot=axs[0,0])
axs[0,0].set_title('Normal probability plot')
axs[0,1].scatter(model.fittedvalues[1:], residuals)
axs[0,1].set_title('Versus Fits')
fig.subplots_adjust(hspace=0.5)
axs[1,0].hist(residuals)
axs[1,0].set_title('Histogram')
axs[1,1].plot(np.arange(1, len(residuals)+1), residuals, 'o-')
plt.show()

> The model is adequate. 