<h1 style="color:orange">EXERCISE CLASS 2 - SPC for iid data </h1>

### EXERCISE 1

The gears in wind turbine gearboxes are essential for converting rotational energy from the turbine blades into electrical energy. Data in `gears_phase1.csv` represent measurements of the gear diameters. These diameters are sequentially sampled in groups of n = 5 from the manufacturing process and must meet a tolerance of 24 ± 1 mm to ensure reliable performance and durability.
1. Assuming that the distribution of the gear diameters is unknown, design an Xbar-R chart to verify if the process is in control. For any out-of-control points detected in Phase I, assume that an assignable cause is found. Check if data contained in `gears_phase2.csv` is in control.
2. Redesign the X-bar and R chart in order to achieve in both the charts an Average Run Length (ARL0) of 1000 (assuming that the normal approximation applies for both of them).
3. Determine the operating characteristic curve (OC) for the X-bar chart (by using K=3 and expressing the shift of the mean in standard deviation units)
4. Determine the corresponding ARL curve. 
5. Estimate the standard deviation through the statistic $R$ (consider original Phase I data).
6. Design the confidence interval on the process mean that corresponds to the control limits computed in point 1 (consider original Phase I data).
7. Verify if the process is in control by using an X-bar and S chart. For any out-of-control points detected in Phase I, assume that an assignable cause is found. Check if data contained in `gears_phase2.csv` is in control.
8. Knowing that the gear diameter is distributed as a normal distribution with mean 24 mm and standard deviation 0.26 mm, design an Xbar and S chart. For any out-of-control points detected, assume that an assignable cause is found. Check if Phase II data is in control.



### Point 1 - Xbar-R charts
Assuming that the distribution of the gear diamters is unknown, design an Xbar-R chart to verify if the process is in control. For any out-of-control points detected in Phase I, assume that an assignable cause is found. Check if data contained in `gears_phase2.csv` is in control.



In [None]:
# Import the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
import qdatoolkit as qda

# Import the dataset
phase_1 = pd.read_csv('../Data/gears_phase1.csv')

# Inspect the dataset
phase_1.head()

> Inspect the data by plotting the individual datapoints.

In [None]:
# Make a scatter plot of all the columns against the index
plt.plot(phase_1['x1'], linestyle='none', marker='o', label = 'x1')
plt.plot(phase_1['x2'], linestyle='none', marker='o', label = 'x2')
plt.plot(phase_1['x3'], linestyle='none', marker='o', label = 'x3')
plt.plot(phase_1['x4'], linestyle='none', marker='o', label = 'x4')
plt.plot(phase_1['x5'], linestyle='none', marker='o', label = 'x5')
# place the legend outside the plot
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()

> Doesn't look like strange patterns or outliers are present except for one extreme value.

> Verify the assumption of normality, assuming all the data are from the same population.

In [None]:
# Stack the data into a single column
phase_1_stack = phase_1.stack()

# We can use the qdatoolkit module to verify the normality of the data
sw_statistic, sw_pvalue = qda.Assumptions(phase_1_stack).normality()

# Plot the histogram of the data
phase_1_stack.hist(bins=10)


> With a significance level of 5% (0.05) we fail to reject the null hypothesis of the Shapiro-Wilk test. Note that one extreme value is responsible for borderline normality. 

> We might also check randomness, but we must know within-sample order!

> Let's design the Xbar-R chart.
>
> Let's compute the mean and the range for each sample. 
> 
> *Note: we need to apply the mean and range functions to each row of the data frame.*

In [None]:
# Make a copy of the data
XR = phase_1.copy()
# Add a column with the mean of the rows
XR['sample_mean'] = XR.mean(axis=1)
# Add a column with the range of the rows
XR['sample_range'] = XR.max(axis=1) - XR.min(axis=1)

# Inspect the dataset
XR.head()

> Now compute the grand mean and the mean of the ranges.

In [None]:
Xbar_mean = XR['sample_mean'].mean()
R_mean = XR['sample_range'].mean()

print('Mean of the sample mean: %.3f' % Xbar_mean)
print('Mean of the sample range: %.3f' % R_mean)

> Since there is no constraint on the choice of Type I error $\alpha$, we can set K = 3 ($\alpha$ = 0.0027)
>
> Remember the formulas for the control limits for unknown parameters.
>
> **$\mathbf{\bar{X}}$ chart**:
> - $UCL = \overline{\overline{X}} + A_2(n) \overline{R}$
> - $CL = \overline{\overline{X}}$
> - $LCL = \overline{\overline{X}} - A_2(n) \overline{R}$
>
> **$R$ chart**:
> - $UCL = D_4(n) \overline{R}$
> - $CL = \overline{R}$
> - $LCL = D_3(n) \overline{R}$

![image](images/FactorsforConstructingVariableControlCharts.png)

In [None]:
n = 5
A2 = 0.577
D3 = 0
D4 = 2.114

# Now we can compute the CL, UCL and LCL for Xbar and R
XR['Xbar_CL'] = Xbar_mean
XR['Xbar_UCL'] = Xbar_mean + A2 * R_mean
XR['Xbar_LCL'] = Xbar_mean - A2 * R_mean

XR['R_CL'] = R_mean
XR['R_UCL'] = D4 * R_mean
XR['R_LCL'] = D3 * R_mean

# Inspect the dataset
XR.head()

> Add two columns to store the violations of the control limits.

In [None]:
XR['Xbar_TEST1'] = np.where((XR['sample_mean'] > XR['Xbar_UCL']) | 
                (XR['sample_mean'] < XR['Xbar_LCL']), XR['sample_mean'], np.nan)
XR['R_TEST1'] = np.where((XR['sample_range'] > XR['R_UCL']) | 
                (XR['sample_range'] < XR['R_LCL']), XR['sample_range'], np.nan)

> Now plot the limits and the data in the charts.

In [None]:
# Plot the Xbar chart
plt.title('Xbar chart')
plt.plot(XR['sample_mean'], color='b', linestyle='--', marker='o')
plt.plot(XR['Xbar_UCL'], color='r')
plt.plot(XR['Xbar_CL'], color='g')
plt.plot(XR['Xbar_LCL'], color='r')
plt.ylabel('Sample mean')
plt.xlabel('Sample number')
# add the values of the control limits on the right side of the plot
plt.text(len(XR)+.5, XR['Xbar_UCL'].iloc[0], 'UCL = {:.3f}'.format(XR['Xbar_UCL'].iloc[0]), verticalalignment='center')
plt.text(len(XR)+.5, XR['Xbar_CL'].iloc[0], 'CL = {:.3f}'.format(XR['Xbar_CL'].iloc[0]), verticalalignment='center')
plt.text(len(XR)+.5, XR['Xbar_LCL'].iloc[0], 'LCL = {:.3f}'.format(XR['Xbar_LCL'].iloc[0]), verticalalignment='center')
# highlight the points that violate the alarm rules
plt.plot(XR['Xbar_TEST1'], linestyle='none', marker='s', color='r', markersize=10)
plt.xlim(-1, len(XR))
plt.show()

In [None]:
# Plot the R chart
plt.title('R chart')
plt.plot(XR['sample_range'], color='b', linestyle='--', marker='o')
plt.plot(XR['R_UCL'], color='r')
plt.plot(XR['R_CL'], color='g')
plt.plot(XR['R_LCL'], color='r')
plt.ylabel('Sample range')
plt.xlabel('Sample number')
# add the values of the control limits on the right side of the plot
plt.text(len(XR)+.5, XR['R_UCL'].iloc[0], 'UCL = {:.3f}'.format(XR['R_UCL'].iloc[0]), verticalalignment='center')
plt.text(len(XR)+.5, XR['R_CL'].iloc[0], 'CL = {:.3f}'.format(XR['R_CL'].iloc[0]), verticalalignment='center')
plt.text(len(XR)+.5, XR['R_LCL'].iloc[0], 'LCL = {:.3f}'.format(XR['R_LCL'].iloc[0]), verticalalignment='center')
# highlight the points that violate the alarm rules
plt.plot(XR['R_TEST1'], linestyle='none', marker='s', color='r', markersize=10)
plt.xlim(-1, len(XR))
plt.show()

> One observation is signalled as out-of-control in both the X-bar and R charts. According to the text, when an out-of-control observation is detected, we must assume that an assignable cause has been identified. Consequently, we cannot attribute this alarm to a random false positive (i.e. false alarm). We must remove the observation and recalculate the control limits.

In [None]:
# Let's find the array of indexes corresponding to OOC points (one in this case)

# Find the index of the I_TEST1 column different from NaN
OOC_idx = np.where(XR['Xbar_TEST1'].notnull())[0]
# Print the index of the OOC points
print('The index of the OOC point is: {}'.format(OOC_idx))

In [None]:
# Let's substitute the OOC points with NaN

# make a copy of the data
phase_1_cleaned = phase_1.copy()
# replace the OOC point with NaN
phase_1_cleaned.iloc[OOC_idx] = np.nan

# Inspect the dataset
phase_1_cleaned.head(7)

> Once we have removed the OOC point we must recompute control limits. In this case we will exploit the qdatoolkit module.

In [None]:
XR_cleaned = qda.ControlCharts.XbarR(phase_1_cleaned)

In [None]:
# Note that XR_cleaned is a dataframe
XR_cleaned.head()

> The process is in control.

> Let's proceed with Phase II. We need to import the new observations and compare them to the control limits established during Phase I.

In [None]:
# Import the dataset
phase_2 = pd.read_csv('../Data/gears_phase2.csv')

# Inspect the dataset
phase_2.head()

In [None]:
# Concatenate phase_2 data to phase_1_cleaned
phase_1_2 = pd.concat([phase_1_cleaned, phase_2], ignore_index=True)

# Plot the control chart with qdatoolkit
XR_phase_1_2 = qda.ControlCharts.XbarR(phase_1_2, subset_size=len(phase_1_cleaned)) # subset_size is the number of samples on which the control limits are computed

> The R chart signals an out-of-control point, indicating that one sample is showing unusual variability. Since we are in Phase II, we should investigate the process to determine if it is a false alarm or if there is an assignable cause that requires intervention.

### Point 2 - Redesign the Xbar-R charts
Redesign the X-bar and R chart in order to achieve in both the charts an Average Run Length (ARL0) of 1000 (assuming that the normal approximation applies for both of them).

> Find the Type I error ($\alpha$) for which the Average Run Length (ARL) is 1000:
>$$\alpha = \frac{1}{ARL_{0}}$$

In [None]:
# Compute the new alpha value
ARL = 1000
alpha = 1/ARL

print('alpha = %.3f' % (alpha))

> Assuming that the normal approximation applies for both charts, we need to find the value of K such that $\alpha = 0.001$:
> $$K = z_{\alpha/2}$$

In [None]:
# Compute the new K_alpha value
alpha = 0.001
K_alpha = stats.norm.ppf(1-alpha/2)

print('K = %.3f' % K_alpha)

> Now let's design the control charts with the new value of K. 
>
> Remember the formulas for the control limits for $K \neq 3$.
>
> **$\mathbf{\bar{X}}$ chart**:
> - $UCL = \overline{\overline{X}} + z_{\alpha/2} \frac{1}{d_2 \sqrt{n}} \overline{R}$
> - $CL = \overline{\overline{X}}$
> - $LCL = \overline{\overline{X}} - z_{\alpha/2} \frac{1}{d_2 \sqrt{n}} \overline{R}$
>
> **$R$ chart**:
> - $UCL = \overline{R} + z_{\alpha/2} \frac{d_3}{d_2} \overline{R}$
> - $CL = \overline{R}$
> - $LCL = max(0;\ \overline{R} - z_{\alpha/2} \frac{d_3}{d_2} \overline{R})$

In [None]:
# We can use the same function again. This time we need to specify the new K_alpha value
XR = qda.ControlCharts.XbarR(phase_1, K = K_alpha)

> Note that using alpha = 0.001 has resulted in both X-bar and R charts without any out-of-control samples. These charts will signal fewer false alarms, but they may miss process non-random variations.

> Let's proceed with Phase II anyway.

In [None]:
# Concatenate the Phase I and Phase II data
phase_1_2 = pd.concat([phase_1, phase_2], ignore_index=True)

# Plot the control chart with qdatoolkit
XR_phase_1_2 = qda.ControlCharts.XbarR(phase_1_2, subset_size=len(phase_1), K=K_alpha)

> The observation previously signaled as out-of-control in Phase II (using alpha = 0.0027) is now in-control (using alpha = 0.001).

> The choice of the tolerated Type I error depends on the balance between detecting true out-of-control conditions and minimizing false alarms. A lower alpha (e.g., 0.001) reduces false alarms but may miss subtle process variations. The decision should be based on the specific requirements and tolerance for risk in the process being monitored.

### Point 3 - OC curve
Determine the operating characteristic curve (OC) for the X-bar chart (by using $K=3$ and expressing the shift of the mean in standard deviation units).

> To determine the OC curve, we need to compute the probability of $\beta$ for each value of the shift $\mu$.
> 
> We are testing the null hypothesis $H_0$ that the sample mean $\bar{X}$ is normally distributed with mean $\mu_0$ and variance $\sigma^2 / n$.
> $$H_0: \bar{X} \sim N(\mu_0, \sigma^2 / n)$$
>
> The alternative hypothesis is that the sample mean is normally distributed with mean $\mu_1$ and variance $\sigma^2 / n$.
> $$H_1: \bar{X} \sim N(\mu_1, \sigma^2 / n)$$
>
> So $\beta$ is the probability of not rejecting $H_0$ when $H_1$ is true.
> $$\beta = P(LCL \leq \bar{X} \leq UCL | H_1)$$
> $$\beta = P(Z \leq \frac{UCL - \mu_1}{\sigma / \sqrt{n}}) - P(Z \leq \frac{LCL - \mu_1}{\sigma / \sqrt{n}})$$
> If we define $\delta = (\mu_1 - \mu_0) / \sigma$, we can write:
> $$\beta = P(Z \leq 3 - \delta \sqrt{n}) - P(Z \leq -3 - \delta \sqrt{n})$$

In [None]:
# Define a range of values for beta
delta = np.linspace(0, 4, 100)
# Compute the corresponding beta values
beta = stats.norm.cdf(3 - delta*np.sqrt(n)) - stats.norm.cdf(-3 - delta*np.sqrt(n))

# Plot the beta values
plt.plot(delta, beta)
plt.xlabel('Delta')
plt.ylabel('Beta')
plt.title('Operating characteristic curve')
plt.show()

### Point 4 - ARL curve
Determine the corresponding ARL curve. 

> We know that the ARL curve is defined as:
> $$ARL = \frac{1}{1-\beta}$$

In [None]:
# Compute ARL using the previous values of beta
ARL = 1/(1-beta)

# Plot the ARL values
plt.plot(delta, ARL)
plt.xlabel('Delta')
plt.ylabel('ARL')
plt.title('Average run length')
plt.show()

### Point 5 - Estimate the stdev
Estimate the standard deviation through the statistic $R$ (consider original Phase 1 data).

> The standard deviation is estimated through the statistic $R$ as:
> $$\hat{\sigma} = \frac{\bar{R}}{d_2(n)}$$

In [None]:
# You can use the function `getd2` from `qda.constants` to get the value of d2(n)

d2 = qda.constants.getd2(n)
sigma_hat = R_mean / d2
print('Sigma_hat = %.3f' % sigma_hat)

### Point 6 - Control limits and confidence interval
Design the confidence interval on the process mean that corresponds to the control limits computed in Point 1 (consider original Phase I data)

> The confidence interval corresponding to the control limits computed in point 1 uses: 
> - $n = 5$
> - $\alpha = 0.0027$
> - $\hat{\sigma} = 0.222$ (computed from the data)
> - $\overline{X} = 24.033$ (computed from the data)
>
> Remember the formula of the confidence interval (assume that $\hat{\sigma}$ is the real population variance):
> $$\bar{X} - z_{\alpha/2} \frac{\hat{\sigma}}{\sqrt{n}} \leq \mu \leq \bar{X} + z_{\alpha/2} \frac{\hat{\sigma}}{\sqrt{n}}$$

In [None]:
# You can compute the CI using the formula or using the `interval` function from the `stats.norm` package.
alpha = 0.0027

CI = stats.norm.interval(1-alpha, loc=Xbar_mean, scale=sigma_hat/np.sqrt(n))
print('CI = (%.3f, %.3f)' % CI)

### Point 7 - Xbar-S charts
Verify if the process is in control by using an X-bar and S chart. For any out-of-control points detected in Phase I, assume that an assignable cause is found. Check if data contained in `gears_phase2.csv` is in control.

> We have already checked normality of data during point 1.1, so we can directly design the X-bar and S charts.
>
> Since there is no constraint on the choice of Type I error $\alpha$, we can set K = 3 ($\alpha$ = 0.0027)

![image](images/Schart1.png)

![image](images/Schart2.png)

> Let's compute the mean and the standard deviation for each sample. 
> 
> *Note: we need to apply the mean and std functions to each row of the data frame.*

In [None]:
# Make a copy of the data
XS = phase_1.copy()
# Add a column with the mean of the rows
XS['sample_mean'] = phase_1.mean(axis=1)
# Add a column with the range of the rows
XS['sample_std'] = phase_1.std(axis=1)

# Inspect the dataset
XS.head()

> Now compute the grand mean and the mean of the standard deviations.

In [None]:
Xbar_mean = XS['sample_mean'].mean()
S_mean = XS['sample_std'].mean()

print('Mean of the sample mean: %.3f' % Xbar_mean)
print('Mean of the sample range: %.3f' % S_mean)

In [None]:
n = 5
K = 3
A3 = K * 1 / (qda.constants.getc4(n) * np.sqrt(n))
B3 = np.maximum(1 - K * (np.sqrt(1-qda.constants.getc4(n)**2)) / (qda.constants.getc4(n)), 0)
B4 = 1 + K * (np.sqrt(1-qda.constants.getc4(n)**2)) / (qda.constants.getc4(n))

# Now we can compute the CL, UCL and LCL for Xbar and S
XS['Xbar_CL'] = Xbar_mean
XS['Xbar_UCL'] = Xbar_mean + A3 * S_mean
XS['Xbar_LCL'] = Xbar_mean - A3 * S_mean

XS['S_CL'] = S_mean
XS['S_UCL'] = B4 * S_mean
XS['S_LCL'] = B3 * S_mean

# Inspect the dataset
XS.head()

> Add two columns to store the violations of the control limits.

In [None]:
XS['Xbar_TEST1'] = np.where((XS['sample_mean'] > XS['Xbar_UCL']) | 
                (XS['sample_mean'] < XS['Xbar_LCL']), XS['sample_mean'], np.nan)
XS['S_TEST1'] = np.where((XS['sample_std'] > XS['S_UCL']) | 
                (XS['sample_std'] < XS['S_LCL']), XS['sample_std'], np.nan)

> Now plot the limits and the data in the charts.

In [None]:
# Plot the Xbar chart
plt.title('Xbar chart')
plt.plot(XS['sample_mean'], color='b', linestyle='--', marker='o')
plt.plot(XS['Xbar_UCL'], color='r')
plt.plot(XS['Xbar_CL'], color='g')
plt.plot(XS['Xbar_LCL'], color='r')
plt.ylabel('Sample mean')
plt.xlabel('Sample number')
# add the values of the control limits on the right side of the plot
plt.text(len(XS)+.5, XS['Xbar_UCL'].iloc[0], 'UCL = {:.3f}'.format(XS['Xbar_UCL'].iloc[0]), verticalalignment='center')
plt.text(len(XS)+.5, XS['Xbar_CL'].iloc[0], 'CL = {:.3f}'.format(XS['Xbar_CL'].iloc[0]), verticalalignment='center')
plt.text(len(XS)+.5, XS['Xbar_LCL'].iloc[0], 'LCL = {:.3f}'.format(XS['Xbar_LCL'].iloc[0]), verticalalignment='center')
# highlight the points that violate the alarm rules
plt.plot(XS['Xbar_TEST1'], linestyle='none', marker='s', color='r', markersize=10)
plt.show()

In [None]:
# Plot the S chart
plt.title('S chart')
plt.plot(XS['sample_std'], color='b', linestyle='--', marker='o')
plt.plot(XS['S_UCL'], color='r')
plt.plot(XS['S_CL'], color='g')
plt.plot(XS['S_LCL'], color='r')
plt.ylabel('Sample S')
plt.xlabel('Sample number')
# add the values of the control limits on the right side of the plot
plt.text(len(XS)+.5, XS['S_UCL'].iloc[0], 'UCL = {:.3f}'.format(XS['S_UCL'].iloc[0]), verticalalignment='center')
plt.text(len(XS)+.5, XS['S_CL'].iloc[0], 'CL = {:.3f}'.format(XS['S_CL'].iloc[0]), verticalalignment='center')
plt.text(len(XS)+.5, XS['S_LCL'].iloc[0], 'LCL = {:.3f}'.format(XS['S_LCL'].iloc[0]), verticalalignment='center')
# highlight the points that violate the alarm rules
plt.plot(XS['S_TEST1'], linestyle='none', marker='s', color='r', markersize=10)
plt.show()

> The X-bar chart is signalling an out-of-control point (observation 5), the same detected by the Xbar chart at point 1.1. However, the S chart is not signalling an alarm for observation 5, unlike the R chart. This is because the S chart considers the overall spread of data points within each sample and is less influenced by extreme values. In contrast, the range statistic is highly sensitive to extreme values within the sample since it considers only the lowest and highest values. 

> Let's recompute the control limits on the Phase I dataset, cleaned of the out-of-control point.

In [None]:
XS_phase_1_cleaned = qda.ControlCharts.XbarS(phase_1_cleaned) # subset_size is the number of samples on which the control limits are computed

> Let's add the Phase II data in the charts using the qdatoolkit module.

In [None]:
# Let's concatenate the phase_1_cleaned and phase_2 data
phase_1_2 = pd.concat([phase_1_cleaned, phase_2], ignore_index=True)

# Compute the Xbar-S chart
XS_phase_1_2 = qda.ControlCharts.XbarS(phase_1_2, subset_size=len(phase_1_cleaned)) # subset_size is the number of samples on which the control limits are computed

> The S chart signals an out-of-control point, the same detected by the R chart, indicating that one sample is showing unusual variability. As already discussed, we should investigate the process to determine if it is a false alarm or if there is an assignable cause that requires intervention.

### Point 8 - Xbar-S charts (known distribution)
Knowing that the gear diameter ($X$) is distributed as a normal distribution with mean 24 mm and standard deviation 0.26 mm, $X∼N(\mu=24,\sigma=0.26)$, design an Xbar and S chart. For any out-of-control points detected, assume that an assignable cause is found. Check if Phase II data is in control.

> Since there is no constraint on the choice of Type I error $\alpha$, we can set K = 3 ($\alpha$ = 0.0027)
>
> Remember the formulas for the control limits for known parameters (K=3):
>
>**$\mathbf{\bar{X}}$ chart**:
>- $UCL = \mu + K \frac{\sigma}{\sqrt{n}} = \mu + A(n) \sigma$
>- $CL = \mu$
>- $LCL = \mu - K \frac{\sigma}{\sqrt{n}} = \mu - A(n) \sigma$
>
>**$S$ chart**:
>- $UCL = B_6(n) c_4 \sigma$
>- $CL = c_4 \sigma$
>- $LCL = B_5(n) c_4 \sigma$

In [None]:
# Make a copy of the data
XS = phase_1.copy()
# Add a column with the mean of the rows
XS['sample_mean'] = phase_1.mean(axis=1)
# Add a column with the range of the rows
XS['sample_std'] = phase_1.std(axis=1)

# Inspect the dataset
XS.head()

In [None]:
mu = 24
sigma = 0.26
n = 5
K = 3
A = K * 1 / (np.sqrt(n))
c4 = qda.constants.getc4(n)
B5 = np.maximum(c4 - K * np.sqrt(1 - c4**2), 0)
B6 = c4 + K * np.sqrt(1 - c4**2)

# Now we can compute the CL, UCL and LCL for Xbar and S
XS['Xbar_CL'] = mu
XS['Xbar_UCL'] = mu + A * sigma
XS['Xbar_LCL'] = mu - A * sigma

XS['S_CL'] = c4 * sigma
XS['S_UCL'] = B6 * sigma
XS['S_LCL'] = B5 * sigma

# Inspect the dataset
XS.head()

> Add two columns to store the violations of the control limits.

In [None]:
XS['Xbar_TEST1'] = np.where((XS['sample_mean'] > XS['Xbar_UCL']) | 
                (XS['sample_mean'] < XS['Xbar_LCL']), XS['sample_mean'], np.nan)
XS['S_TEST1'] = np.where((XS['sample_std'] > XS['S_UCL']) | 
                (XS['sample_std'] < XS['S_LCL']), XS['sample_std'], np.nan)

> Now plot the limits and the data in the charts.

In [None]:
# Plot the Xbar chart
plt.title('Xbar chart')
plt.plot(XS['sample_mean'], color='b', linestyle='--', marker='o')
plt.plot(XS['Xbar_UCL'], color='r')
plt.plot(XS['Xbar_CL'], color='g')
plt.plot(XS['Xbar_LCL'], color='r')
plt.ylabel('Sample mean')
plt.xlabel('Sample number')
# add the values of the control limits on the right side of the plot
plt.text(len(XS)+.5, XS['Xbar_UCL'].iloc[0], 'UCL = {:.3f}'.format(XS['Xbar_UCL'].iloc[0]), verticalalignment='center')
plt.text(len(XS)+.5, XS['Xbar_CL'].iloc[0], 'CL = {:.3f}'.format(XS['Xbar_CL'].iloc[0]), verticalalignment='center')
plt.text(len(XS)+.5, XS['Xbar_LCL'].iloc[0], 'LCL = {:.3f}'.format(XS['Xbar_LCL'].iloc[0]), verticalalignment='center')
# highlight the points that violate the alarm rules
plt.plot(XS['Xbar_TEST1'], linestyle='none', marker='s', color='r', markersize=10)
plt.show()

In [None]:
# Plot the S chart
plt.title('S chart')
plt.plot(XS['sample_std'], color='b', linestyle='--', marker='o')
plt.plot(XS['S_UCL'], color='r')
plt.plot(XS['S_CL'], color='g')
plt.plot(XS['S_LCL'], color='r')
plt.ylabel('Sample S')
plt.xlabel('Sample number')
# add the values of the control limits on the right side of the plot
plt.text(len(XS)+.5, XS['S_UCL'].iloc[0], 'UCL = {:.3f}'.format(XS['S_UCL'].iloc[0]), verticalalignment='center')
plt.text(len(XS)+.5, XS['S_CL'].iloc[0], 'CL = {:.3f}'.format(XS['S_CL'].iloc[0]), verticalalignment='center')
plt.text(len(XS)+.5, XS['S_LCL'].iloc[0], 'LCL = {:.3f}'.format(XS['S_LCL'].iloc[0]), verticalalignment='center')
# highlight the points that violate the alarm rules
plt.plot(XS['S_TEST1'], linestyle='none', marker='s', color='r', markersize=10)
plt.show()

> Even if one point is borderline in the Xbar chart, no observation is OOC when designing the charts with the known parameters.

> We can obtain the same result with qdatoolkit.

In [None]:
# Compute the Xbar-S chart with known parameters using qdatoolkit
XS = qda.ControlCharts.XbarS(phase_1, K = 3, mean=mu, sigma=sigma)

> Let's add Phase II data in the plots.

In [None]:
# Concatenate the Phase I and Phase II data
phase_1_2 = pd.concat([phase_1, phase_2], ignore_index=True)

# Plot the control chart with qdatoolkit
XS_phase_1_2 = qda.ControlCharts.XbarS(phase_1_2, subset_size=len(phase_1), K=3, mean=mu, sigma=sigma)

> The process in in-control.