<center><h2><span style="font-weight:bolder; color:#52b69a; font-size:85%",>Quality Control & Production Quality Prediction</span></h2></center>

<a id="content"></a>
<div style="border-radius:20px; padding: 15px; font-size:110%; text-align:left">

<center><h2><span style="font-weight:bolder; color:black; font-size:70%">    Table of Contents:</span></h2></center>

 *  **[- | Introduction](#in)**
 *  **[- | Statistical Process Control](#sp)**
 *  **[- | Control Charts](#cc)**
 *  **[- | Prediction](#aboutds)**

<a id="in"></a>
# <p style="background-color:#52b69a;font-family:newtimeroman;font-size:100%;color:black;text-align:center;border-radius:15px 50px; padding:7px;border: 1px solid black;">Introduction</p>

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:110%; text-align:left">
    
I'm writing this notebook to implementation of control charts for monitoring processes along with the utilization of random data to illustrate their practical application. Furthermore, I aim to delve into predicting the quality of products within a dataset.

Initially, I will define control charts and their significance in ensuring process stability and identifying variations. Subsequently, I will utilize random data to demonstrate the functionality and efficacy of control charts in real-world scenarios.

Following this, I intend to shift the focus towards quality prediction by analyzing relevant datasets. By leveraging advanced statistical techniques and predictive models, I delve into predicting the quality of products in a dataset with the aim of providing insights into forecasting the quality of products and optimizing production processes proactively.

<a id="sp"></a>
# <p style="background-color:#52b69a;font-family:newtimeroman;font-size:100%;color:black;text-align:center;border-radius:15px 50px; padding:7px;border: 1px solid black;">Statistical Process Control</p>

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:110%; text-align:left">
    
Statistical Process Control (SPC) 📊🔍 is a method used in manufacturing and other industries to monitor and control processes to ensure they operate efficiently and produce products of consistent quality. It involves using statistical methods to monitor and control the variation in a process.

Key elements of SPC include 🔑:
- Collecting data from the process
- Plotting the data on control charts 📈
- Analyzing the data for trends, patterns, and unusual variations
- Making data-driven decisions to improve and maintain process quality

Some common tools used in SPC include control charts, histograms, Pareto charts, and scatter diagrams. SPC helps in identifying process variability, detecting issues early, reducing defects, and achieving process stability and improvement over time. It is an essential tool in quality management 🛠️👍.

<a id="cc"></a>
# <p style="background-color:#52b69a;font-family:newtimeroman;font-size:100%;color:black;text-align:center;border-radius:15px 50px; padding:7px;border: 1px solid black;">Control Charts</p>

In [None]:
!pip install GaugeRnR

In [None]:
import os
import keras
import datetime
import GaugeRnR
import numpy as np
import pandas as pd
import seaborn as sns
from scipy.stats import norm
import matplotlib.pyplot as plt
from keras.datasets import mnist
from keras.optimizers import RMSprop
from keras.models import Sequential
from keras.layers import Dense, Dropout
from sklearn.tree import DecisionTreeRegressor
from matplotlib.ticker import PercentFormatter
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import cross_val_score,train_test_split

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:110%; text-align:left">
    
A Pareto chart is a type of chart that combines a bar graph with a line graph in order to highlight the most important among a set of factors. The left vertical axis represents the frequency of occurrence, while the right vertical axis represents the cumulative percentage of the total occurrences. 📊

In business and quality management, Pareto charts are often used to identify the most significant factors that contribute to a particular effect or issue. They help in focusing efforts on areas where the greatest improvements can be made to have the most significant impact. 💡
    
- It's often used to prioritize issues or identify the most significant factors contributing to a problem. 📉
- The Pareto principle states that roughly 80% of effects come from 20% of the causes, thus highlighting the most critical areas for improvement. 🎯

In [None]:
# Pareto Chart
def Pareto(problems, values):
    df = pd.DataFrame({'Values': values})
    df.index = problems
    df = df.sort_values(by='Values',ascending=False)
    df["cumpercentage"] = df["Values"].cumsum()/df["Values"].sum()*100
    plt.figure(figsize=(15,5))
    plt.bar(df.index, df["Values"], color="C0")
    plt.xticks(rotation=90)
    ax = plt.twinx()
    ax.plot(df.index, df["cumpercentage"], color="C9", marker=".")
    ax.yaxis.set_major_formatter(PercentFormatter())
    plt.title("Pareto Diagram")
    plt.tick_params(axis="y", colors="k")
    plt.tick_params(axis="y", colors="k")
    plt.grid()
    plt.show()

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:110%; text-align:left">
    
The XR control chart, or individuals control chart, is a powerful tool for monitoring process stability over time. 📊 
    
Each data point on the chart represents a unit of time or subgroup, allowing us to spot trends, shifts, and patterns indicating process variation. The centerline (average) and control limits (upper and lower boundaries) on the chart help us determine if our process is in control or out of control. The Upper Control Limit (UCL) and the Lower Control Limit (LCL) form a corridor within which a quality characteristic meets the desired value or a normal deviation. Outside the limitations of UCL and LCL, the quality measured is considered as abnormal and requires intervention in the relevant process. 📈

When deviations from these limits occur, it indicates that intervention may be necessary to maintain process stability and consistency. 🛠️ 

- This chart is essential for ensuring quality and efficiency in processes. 🚦

In [None]:
class XR:
    def fit(self,data):
        self.sample_size = len(data[0])
        self.number_of_sample = len(data)
        self.X = np.zeros((self.number_of_sample,1))
        self.R = np.zeros((self.number_of_sample,1))
        for i in range(self.number_of_sample):
            self.X[i] = data[i].mean()
            self.R[i] = data[i].max() - data[i].min()
        self.data = data
    def ControlChart(self,A2,D3,D4):
        ucl_X   = self.X.mean() + A2*self.R.mean()
        cl_X    = self.X.mean() 
        lcl_X   = self.X.mean() - A2*self.R.mean()
        ucl_R   = D4*self.R.mean()
        cl_R    = self.R.mean() 
        lcl_R   = D3*self.R.mean()
        plt.figure(figsize=(15,5))
        plt.title("Boxplot for {} Observations\nSample Size {}".format(len(self.data),len(self.data[0])))
        plt.boxplot(self.data.T)
        plt.show()
        plt.figure(figsize=(15,5))
        plt.plot(self.X,marker="o",color="k",label="X")
        plt.plot([ucl_X]*len(self.X),color="r",label="UCL={}".format(ucl_X.round(2)))
        plt.plot([cl_X]*len(self.X),color="c",label="CL={}".format(cl_X.round(2)))
        plt.plot([lcl_X]*len(self.X),color="r",label="LCL={}".format(lcl_X.round(2)))
        plt.title("X Chart")
        plt.xticks(np.arange(len(self.data)))
        plt.legend()
        plt.show()
        plt.figure(figsize=(15,5))
        plt.plot(self.R,marker="o",color="k",label="R")
        plt.plot([ucl_R]*len(self.X),color="r",label="UCL={}".format(ucl_R.round(2)))
        plt.plot([cl_R]*len(self.X),color="c",label="CL={}".format(cl_R.round(2)))
        plt.plot([lcl_R]*len(self.X),color="r",label="LCL={}".format(lcl_R.round(2)))
        plt.title("R Control Chart")
        plt.xticks(np.arange(len(self.data)))
        plt.legend()
        plt.show()
        plt.figure(figsize=(15,5))
        plt.subplot(1,2,1)
        plt.boxplot(x=self.X)
        plt.title("Boxplot Of X")
        plt.xlabel("X")
        plt.subplot(1,2,2)
        plt.boxplot(x=self.R)
        plt.title("Boxplot Of R")
        plt.xlabel("R")
        plt.show()

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:110%; text-align:left">
    
An XS control chart, also known as an Individuals and Moving Range (I-MR) chart, is a type of statistical process control chart used to monitor the stability of a process over time. 📉 

- The Individuals Chart (I-chart) displays individual data points over time. It is used to monitor the average of the process. 📊
  
- The Moving Range Chart (MR-chart) displays the range between consecutive data points. It is used to monitor the variability within the process. 📍

- The P-chart (for proportion) is used to monitor the proportion of nonconforming units or defects in a process over time. It is commonly used when dealing with attributes data. 📋

- U-chart is also known as the control chart for defects per unit chart. It is generally used to monitor the count type of data where the sample size is greater than one. ❕

- An exponentially weighted moving average (EWMA) chart is a type of control chart used to monitor small shifts in the process mean. 🔰

These charts are particularly useful in detecting shifts or trends in a process that may indicate special causes of variation. By analyzing the data and determining control limits, quality professionals can determine if a process is in control or if there are issues that need to be addressed. 📌

<img src="https://sixsigmastudyguide.com/wp-content/uploads/2021/02/p8.png" width="1200" height="400">
<img src="https://sixsigmastudyguide.com/wp-content/uploads/2021/02/u3.png" width="1200" height="400">

In [None]:
class XS:
    def fit(self,data):
        self.sample_size = len(data[0])
        self.number_of_sample = len(data)
        self.X = np.zeros((self.number_of_sample,1))
        self.S = np.zeros((self.number_of_sample,1))
        for i in range(self.number_of_sample):
            self.X[i] = data[i].mean()
            self.S[i] = data[i].std()
        self.data = data
    def ControlChart(self,A3,B3,B4):
        ucl_X   = self.X.mean() + A3*self.S.mean()
        cl_X    = self.X.mean() 
        lcl_X   = self.X.mean() - A3*self.S.mean()
        ucl_S   = B4*self.S.mean()
        cl_S    = self.S.mean() 
        lcl_S   = B3*self.S.mean()
        plt.figure(figsize=(15,5))
        plt.title("Boxplot for {} Observations\nSample Size {}".format(len(self.data),len(self.data[0])))
        plt.boxplot(self.data.T)
        plt.show()
        plt.figure(figsize=(15,5))
        plt.plot(self.X,marker=".",color="k",label="X")
        plt.plot([ucl_X]*len(self.X),color="r",label="UCL={}".format(ucl_X.round(2)))
        plt.plot([cl_X]*len(self.X),color="c",label="CL={}".format(cl_X.round(2)))
        plt.plot([lcl_X]*len(self.X),color="r",label="LCL={}".format(lcl_X.round(2)))
        plt.title("X Chart")
        plt.xticks(np.arange(len(self.data)))
        plt.legend()
        plt.show()
        plt.figure(figsize=(15,5))
        plt.plot(self.S,marker=".",color="k",label="S")
        plt.plot([ucl_S]*len(self.X),color="r",label="UCL={}".format(ucl_S.round(2)))
        plt.plot([cl_S]*len(self.X),color="c",label="CL={}".format(cl_S.round(2)))
        plt.plot([lcl_S]*len(self.X),color="r",label="LCL={}".format(lcl_S.round(2)))
        plt.title("S Control Chart")
        plt.xticks(np.arange(len(self.data)))
        plt.legend()
        plt.show() 
        plt.figure(figsize=(15,5))
        plt.subplot(1,2,1)
        plt.boxplot(x=self.X)
        plt.title("Boxplot of X")
        plt.xlabel("X")
        plt.subplot(1,2,2)
        plt.boxplot(x=self.S)
        plt.title("Boxplot of S")
        plt.xlabel("S")
        plt.show()

In [None]:
class MR:
    def fit(self,data):
        self.X = data
        self.number_of_sample = len(self.X)
        self.mR = np.zeros(((self.number_of_sample-1),1))
        for i in range(len(self.mR)):
            self.mR[i] = abs(self.X[i+1] - self.X[i])
    def ControlChart(self,d2,D4,D3):
        ucl_X   = self.X.mean() + (3/d2*np.sqrt(self.number_of_sample))*self.mR.mean()
        cl_X    = self.X.mean() 
        lcl_X   = self.X.mean() - (3/d2*np.sqrt(self.number_of_sample))*self.mR.mean()
        ucl_mR   = D4*self.mR.mean()
        cl_mR    = self.mR.mean() 
        lcl_mR   = D3*self.mR.mean()
        plt.figure(figsize=(15,5))
        plt.plot(self.X,marker=".",color="k",label="X")
        plt.plot([ucl_X]*len(self.X),color="r",label="UCL={}".format(ucl_X.round(2)))
        plt.plot([cl_X]*len(self.X),color="c",label="CL={}".format(cl_X.round(2)))
        plt.plot([lcl_X]*len(self.X),color="r",label="LCL={}".format(lcl_X.round(2)))
        plt.title("X Chart")
        plt.xticks(np.arange(len(self.X)))
        plt.legend()
        plt.show()
        plt.figure(figsize=(15,5))
        plt.plot(self.mR ,marker=".",color="k",label="MR")
        plt.plot([ucl_mR ]*len(self.X),color="r",label="UCL={}".format(ucl_mR.round(2)))
        plt.plot([cl_mR ]*len(self.X),color="c",label="CL={}".format(cl_mR.round(2)))
        plt.plot([lcl_mR ]*len(self.X),color="r",label="LCL={}".format(lcl_mR.round(2)))
        plt.title("MR Control Chart")
        plt.xticks(np.arange(len(self.X)))
        plt.legend()
        plt.show()
        plt.figure(figsize=(15,5))
        plt.subplot(1,2,1)
        plt.boxplot(x=self.X)
        plt.title("Boxplot of X")
        plt.xlabel("X")
        plt.subplot(1,2,2)
        plt.boxplot(x=self.mR )
        plt.title("Boxplot of MR")
        plt.xlabel("MR")
        plt.show()

In [None]:
class p:
    def fit(self,D,n):
        self.D = D
        self.n = n 
        self.p = np.zeros(len(self.D)) 
        for i in range(len(D)):
            self.p[i] = D[i] / n
    def ControlChart(self):
        m      = len(self.p)
        p_mean = self.p.sum() / m
        ucl = p_mean + 3*np.sqrt((p_mean*(1-p_mean))/(self.n))
        cl  = p_mean
        lcl = p_mean - 3*np.sqrt((p_mean*(1-p_mean))/(self.n)) 
        plt.figure(figsize=(15,5))
        plt.plot(self.p,marker=".",color="k",label="$p_i$")
        plt.plot([ucl]*(len(self.D)),label="UCL",color="r")
        plt.plot([cl]*(len(self.D)),label="CL",color="k",alpha=0.4)
        plt.plot([lcl]*(len(self.D)),label="LCL",color="r")
        plt.legend(loc="best")
        plt.xticks(np.arange(len(self.D)))
        plt.title("P Control Chart")
        plt.show()
        plt.figure(figsize=(15,5))
        plt.subplot(1,2,1)
        plt.boxplot(self.D,vert=False)
        plt.title("Boxplot of Data")
        plt.ylabel("Data")
        plt.subplot(1,2,2)
        plt.hist(self.D,bins=int(len(self.D)/3),density=True,color="#52b69a")
        plt.ylabel("Frequency")
        plt.title("Histogram of Data")
        plt.show()

In [None]:
class u:
    def fit(self,c,n):
        self.c      = c
        self.n      = n
        self.u_mean = sum(self.c) / sum(self.n)
        self.value  = np.array(c) /  np.array(n)
    def ControlChart(self):
        ucl = self.u_mean + 3 * np.sqrt(self.u_mean/np.mean(self.n))
        cl  = self.u_mean
        lcl = self.u_mean - 3 * np.sqrt(self.u_mean/np.mean(self.n))     
        plt.figure(figsize=(15,5))
        plt.plot(self.value,marker=".",color="k",label="$C_i$")
        plt.plot([ucl]*len(self.n),color="r",label="UCL{}".format(ucl.round(2)))
        plt.plot([cl]*len(self.n),color="k",alpha=0.4,label="CL{}".format(cl.round(2)))
        plt.plot([lcl]*len(self.n),color="r",label="LCL{}".format(lcl.round(2)))
        plt.xticks(np.arange(len(self.c)))
        plt.legend(loc="best")
        plt.title("u - Control Chart")
        plt.show()
        plt.figure(figsize=(15,5))
        plt.subplot(1,2,1)
        plt.boxplot(self.c,vert=False)
        plt.title("Boxplot of Data")
        plt.ylabel("Data")
        plt.subplot(1,2,2)
        plt.hist(self.c,bins=int(len(self.c)/3),density=True,color="#52b69a")
        plt.ylabel("Frequency")
        plt.title("Histogram of Data")
        plt.show()   

In [None]:
class EWMA:
    def fit(self,data,lamda,mean):
        self.X     = data
        self.z     = np.zeros(len(data))
        self.lamda = lamda
        self.mean  = mean
        self.z[0]  = self.mean
        for i in range(1,len(self.z)):
            self.z[i] = self.lamda*self.X[i] + (1-self.lamda)*self.z[i-1] 
    def  ControlChart(self,L,sigma):
        ucl = np.zeros(len(self.X))
        lcl = np.zeros(len(self.X))
        I   = np.arange(1,len(self.X)+1)  
        for i in range(len(self.X)):
            ucl[i] = self.mean + L*sigma*np.sqrt((self.lamda / (2 - self.lamda))*(1-(1-self.lamda)**(I[i])))
            lcl[i] = self.mean - L*sigma*np.sqrt((self.lamda / (2 - self.lamda))*(1-(1-self.lamda)**(I[i])))
        plt.figure(figsize=(15,5))
        plt.plot(self.z,marker=".",color="k",label="$Z_i$")
        plt.plot([self.mean]*len(self.X),color="k",alpha=0.35)
        plt.plot(ucl,color="r",label="UCL {}".format(ucl[len(ucl)-1].round(2)))
        plt.plot(lcl,color="r",label="LCL {}".format(lcl[len(lcl)-1].round(2)))
        plt.title("EWMA Conrol Chart")
        plt.legend(loc="upper left")
        plt.show()
        plt.figure(figsize=(15,5))
        plt.subplot(1,2,1)
        plt.boxplot(self.X,vert=False)
        plt.title("Boxplot of Data")
        plt.ylabel("Data")
        plt.subplot(1,2,2)
        plt.hist(self.X,bins=int(len(self.X)/3),density=True,color="#52b69a")
        plt.ylabel("Frequency")
        plt.title("Histogram of Data")
        plt.show()

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:170%; text-align:Center">
    
Using random datas to illustrate the use of control charts:

In [None]:
problems=['Technique error', 'Wrong time', 'Wrong calculation', 'Wrong patient', 'Over dose', 
            'Under dose', 'Wrong route', 'Duplicated drugs' ,
            'Wrong drug', 'Wrong IV rate', 'Dose missed' , 'Unauthorised drug']
values=[3, 83, 16, 53, 59, 7, 27, 9, 76 , 4 , 92 , 1]
c = np.array([2,3,8,1,1,4,1,4,5,1,8,2,4,3,4,1,8,3,7,4])
n = np.array([50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50,
       50, 50, 50])
data = np.array([[57, 46, 62, 23, 19],
       [52, 49, 42, 60, 32],
       [64, 53, 33, 20, 32],
       [46, 61, 55, 24, 44],
       [26, 47, 21, 62, 48],
       [36, 64, 63, 42, 38],
       [22, 52, 44, 49, 43],
       [56, 38, 56, 44, 46],
       [52, 33, 40, 30, 65],
       [57, 55, 30, 35, 46],
       [53, 24, 63, 49, 43],
       [24, 33, 38, 67, 24],
       [65, 36, 32, 48, 35],
       [38, 61, 48, 43, 38],
       [68, 42, 21, 29, 43],
       [60, 48, 44, 19, 60],
       [43, 28, 32, 65, 22],
       [57, 47, 69, 56, 24],
       [31, 29, 48, 63, 42],
       [39, 68, 20, 51, 26]])
data2 = np.array([150.01, 150.03,150.02,150.03,150.05,150.04,149.99,150.01,149.98,149.98,149.99,150.00,
150.01,150.03,150.03,150.04,149.99,150.04,149.95,150.02,150.03,149.99,149.98,149.99,
150.01,150.02,150.00,150.01,150.02,149.99,150.00,150.00,150.02,150.00,150.01,149.94,
150.02,150.03,150.03,150.03,150.02,150.02,149.97,150.00,149.97,150.01,150.00,149.98,
150.00,150.01,150.00,150.05,150.03,150.02,150.00,150.01,150.00,149.96,149.98,150.02,
150.00,150.03,150.03,150.02,150.00,150.05,149.99,150.00,149.99,149.97,150.01,149.98,
149.99,150.04,150.02,150.04,149.99,150.00,149.99,150.01,150.01,150.00,149.99,150.02,
150.02,150.01,150.01,150.04,150.01,150.07,150.00,149.95,149.98,149.97,150.03,149.94,
150.04,150.05,150.05,150.03,150.01,150.04,150.02,149.99,149.96,149.99,149.98,149.98,
150.04,150.03,149.98,150.02,150.03,150.06,149.96,149.99,149.97,150.00,150.01,149.97])
data3 = np.array([ 9.86309233,  9.84000103, 10.97886276,  9.50805567,  9.79770921,
       10.3763538 , 10.77708283, 10.91984387, 10.58749389, 10.55658341,
       10.56227153,  9.23660779, 10.66084511, 10.12406454,  9.22176616,
       10.23525939,  9.63873061, 10.63521265,  9.34684212,  9.74626569,
        9.55167571,  9.203874  ,  9.11321254,  9.28478856, 10.21514137,
       10.93835811,  9.00417726, 10.20495895, 10.12245382,  9.46752498])
data5 = np.array(           
    [[[37, 38, 37],   
      [42, 42, 43],   
      [30, 31, 31],  
      [42, 43, 42],    
      [28, 30, 29],
      [42, 42, 43],   
      [25, 26, 27],  
      [40, 40, 40],    
      [25, 25, 25],
      [35, 34, 34]],   
     [[41, 41, 40],   
      [42, 42, 42],   
      [31, 31, 31],   
      [43, 43, 43],    
      [29, 30, 29],
      [45, 45, 45],
      [28, 28, 30],   
      [43, 42, 42],   
      [27, 29, 28],   
      [35, 35, 34]],   
     [[41, 42, 41],   
      [43, 42, 43],   
      [29, 30, 28],   
      [42, 42, 42],    
      [31, 29, 29],
      [44, 46, 45],
      [29, 27, 27],   
      [43, 43, 41],   
      [26, 26, 26],   
      [35, 34, 35]]])
data4 = np.array(          
    [[[3.29, 3.41, 3.64], 
      [2.44, 2.32, 2.42], 
      [4.34, 4.17, 4.27],
      [3.47, 3.5, 3.64],  
      [2.2, 2.08, 2.16]], 
     [[3.08, 3.25, 3.07], 
      [2.53, 1.78, 2.32],  
      [4.19, 3.94, 4.34],  
      [3.01, 4.03, 3.2],   
      [2.44, 1.8, 1.72]],  
     [[3.04, 2.89, 2.85],   
      [1.62, 1.87, 2.04],  
      [3.88, 4.09, 3.67], 
      [3.14, 3.2, 3.11],    
      [1.54, 1.93, 1.55]]]) 
data6 =  np.array([12,15,8,10,4,7,16,9,14,10,5,6,17,12,22,8,10,5,13,11,20,18,24,15,9,12,7,13,9,6])

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:170%; text-align:Center">
    
Pareto Chart:

In [None]:
import warnings
warnings.filterwarnings("ignore")
df = pd.DataFrame({'Type of Medication error': [3, 83, 16, 53, 59, 7, 27, 9, 76, 4, 92, 1]})
df.index = ['Technique error', 'Wrong time', 'Wrong calculation', 'Wrong patient', 'Over dose',
            'Under dose', 'Wrong route', 'Duplicated drugs',
            'Wrong drug', 'Wrong IV rate', 'Dose missed', 'Unauthorised drug']
df = df.sort_values(by='Type of Medication error', ascending=False)
df['Percent of Errors'] = df['Type of Medication error'] / df['Type of Medication error'].sum() * 100
df['Cumulative Percent'] = df['Percent of Errors'].cumsum()
sns.set_style("whitegrid") 
fig, ax1 = plt.subplots(figsize=(10, 6))
sns.barplot(x=df.index, y='Type of Medication error', color='cadetblue', data=df) 
sns.lineplot(x=df.index, y='Cumulative Percent', color='teal', data=df, marker='o', linestyle='--')
plt.xlabel('Type of Medication Error')
plt.ylabel('Count/Cumulative Percentage (%)')
plt.title('Pareto Chart of Medication Errors')
plt.xticks(rotation=45, ha='right')
plt.grid(True) 
plt.tight_layout()
plt.show()

In [None]:
Pareto(problems,values)

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:170%; text-align:Center">
    
MR Chart:

In [None]:
chart = MR()
chart.fit(data2)
chart.ControlChart(d2 = 1.128,D3 = 0 ,D4 = 3.267)

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:170%; text-align:Center">
    
XR Chart:

In [None]:
chart = XR()
chart.fit(data)
chart.ControlChart(A2 = 0.577,D3 = 0 ,D4 = 2.115)

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:170%; text-align:Center">
    
XS Chart:

In [None]:
chart = XS()
chart.fit(data)
chart.ControlChart(A3 = 1.427 ,B3 = 0 ,B4 = 2.089)

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:170%; text-align:Center">
    
P Chart:

In [None]:
chart = p()
chart.fit(D=data6,n=50)
chart.ControlChart()

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:170%; text-align:Center">

U Chart:

In [None]:
chart = u()
chart.fit(c,n)
chart.ControlChart()

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:170%; text-align:Center">
    
EWMA Chart:

In [None]:
chart = EWMA()
chart.fit(data=data3,lamda=0.1,mean=10)
chart.ControlChart(L=2.7,sigma=1)

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:110%; text-align:left">
    
The operators box plot is a graphical representation that displays the distribution of a dataset based on an operator, such as addition, subtraction, multiplication, or division. Each box in the plot represents the interquartile range of the data for a specific operator, with the median shown as a line inside the box. 📊

The usage of operators box plots can help in comparing the performance or behavior of different operators within a dataset. It can provide insights into the spread of values, central tendency, and variability for each operator, allowing for a visual comparison of their impact on the data. 🔍 

This type of visualization is commonly used in data analysis, mathematics, statistics, and machine learning to understand how different operators contribute to the overall distribution of a dataset and to identify any outliers or patterns associated with specific operators. 🔢 

Overall, operators box plots serve as a useful tool for exploring and analyzing data that involves different mathematical operations, offering a clear and concise way to visualize the relationship between operators and the data distribution. 📈 

In [None]:
g = GaugeRnR.GaugeRnR(data4)
g.calculate()
print(g.summary())

In [None]:
g.creatPartsBoxPlot()

In [None]:
g.creatOperatorsBoxPlot()

<a id="aboutds"></a>
# <p style="background-color:#52b69a;font-family:newtimeroman;font-size:100%;color:black;text-align:center;border-radius:15px 50px; padding:7px;border: 1px solid black;">Prediction</p>

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:110%; text-align:left">
    
<h3 align="left"><font color='#52b69a'>Context: </font></h3>
    
You need to build a model that, on the basis of data arriving every minute, determines the quality of products produced on a roasting machine.
    
<h3 align="left"><font color='#52b69a'>Content: </font></h3>

The roasting machine is an aggregate consisting of 5 chambers of equal size, each chamber has 3 temperature sensors. In addition, for this task, you have collected data on the height of the raw material layer and its moisture content. Layer height and humidity are measured when raw materials enter the machine. Raw materials pass through the kiln in an hour.
   
<h3 align="left"><font color='#52b69a'>Acknowledgements:</font></h3>

Product quality is measured in the laboratory by samples that are taken every hour, data on known analyzes are contained in the file data_Y.csv. The file indicates the time of sampling, the sample is taken at the exit of the roasting machine.

<h3 align="left"><font color='#52b69a'>Inspiration:</font></h3>

You agreed with the customer that the model will be estimated by the MAE indicator, to evaluate the model, it is necessary to generate predictions for the period specified in the file sample_submission.csv (5808 predictions).
    
We have a machine with 5 chambers, each equipped with 3 temperature sensors, humidity sensors, and a layer height sensor. The positioning of the temperature sensors within the chamber ensures even roasting by monitoring temperature variations across different parts. Humidity levels indicate the required roasting environment, while the layer height sensor provides volume information, which can impact the roasting process and serve as an indicator of material changes.

Seeds are sampled every hour during the 1-hour roasting process, and the quality of the sampled material is based on measurements from the previous hour. To analyze the quality measurements effectively, creating a process matrix by transposing the data could be beneficial. This would involve representing each minute of measurement as a high-dimensional row, significantly increasing the data's dimension.

In [None]:
X = pd.read_csv("/kaggle/input/production-quality/data_X.csv")
submission = pd.read_csv("/kaggle/input/production-quality/sample_submission.csv")
Y = pd.read_csv("/kaggle/input/production-quality/data_Y.csv")

In [None]:
X.columns, Y.columns, submission.columns

In [None]:
X.shape, Y.shape, submission.shape

In [None]:
X.info(), Y.info(), submission.info()

In [None]:
plt.figure(figsize=(22,4))
sns.heatmap((X.isna().sum()).to_frame(name='').T,cmap='GnBu', annot=True,
             fmt='0.0f').set_title('Count of Missing Values', fontsize=18)
plt.show()

In [None]:
X.describe()[1:].T.style.background_gradient(cmap='GnBu', axis=1)

In [None]:
plt.figure(figsize=(22,4))
sns.heatmap((Y.isna().sum()).to_frame(name='').T,cmap='GnBu', annot=True,
             fmt='0.0f').set_title('Count of Missing Values', fontsize=18)
plt.show()

In [None]:
Y.describe()[1:].T.style.background_gradient(cmap='GnBu', axis=1)

In [None]:
plt.figure(figsize=(22,4))
sns.heatmap((submission.isna().sum()).to_frame(name='').T,cmap='GnBu', annot=True,
             fmt='0.0f').set_title('Count of Missing Values', fontsize=18)
plt.show()

In [None]:
submission.describe()[1:].T.style.background_gradient(cmap='GnBu', axis=1)

In [None]:
X["date_time"]=pd.to_datetime(X["date_time"])
X["date_hour"] = X["date_time"].apply(lambda x: x.strftime("%d-%m-%Y-%H"))
X

In [None]:
Y["date_shifted"] = pd.to_datetime(Y["date_time"]) - datetime.timedelta(hours=1)
Y["date_shifted"] = pd.to_datetime(Y["date_shifted"])
Y["date_shifted"] = Y["date_shifted"].apply(lambda x: x.strftime("%d-%m-%Y-%H"))

In [None]:
training = pd.merge(X,Y[["date_shifted","quality"]],left_on="date_hour",right_on="date_shifted",how="inner")
training

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:170%; text-align:Center">

DecisionTree:

In [None]:
submission["date_hour"] = pd.to_datetime(submission["date_time"]).apply(lambda x: x.strftime("%d-%m-%Y-%H"))
validation = pd.merge(X,submission[["date_hour","quality"]],left_on="date_hour",right_on="date_hour",how="inner")

In [None]:
from sklearn.tree import DecisionTreeRegressor
dt_model = DecisionTreeRegressor(random_state=0)
X_val = training.drop(["date_hour", "date_shifted", "quality"], axis=1)
y_val = training["quality"]

In [None]:
X_val['date_time'] = X_val['date_time'].astype('int64')

In [None]:
cv_dt = cross_val_score(dt_model,X_val,y_val,cv=5,scoring=('neg_mean_absolute_error'),error_score='raise')

In [None]:
print("Average Decision Tree Cross Validation MAE: {0}".format(np.abs(cv_dt.mean())))
print("Best Decision Tree Cross Validation MAE: {0}".format(np.abs(cv_dt.max())))

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X_val, y_val, test_size=0.15, random_state=0)
dt_model.fit(X_train,y_train)

In [None]:
predictions_dt = dt_model.predict(X_test)
results = pd.DataFrame()
results["True"] = y_test
results["Predicted DecisionTree"]=predictions_dt
results

In [None]:
print("DecisionTree Regressor MAE: {0}".format(mean_absolute_error(y_test,predictions_dt)))

<div style="border-radius:10px; border:#52b69a solid; padding: 15px; background-color:#d0f4de
            ; font-size:110%; text-align:left">

In statistics and machine learning, Mean Absolute Error (MAE) is a measure of the difference between two continuous variables. It gives an idea of how wrong the predictions are on average. A low MAE value indicates that the predictions are close to the actual values, while a high MAE value implies larger errors in the predictions. 📊

In [None]:
model = DecisionTreeRegressor(random_state=0)
model.fit(X_val,y_val)

In [None]:
validation = validation.drop(["date_hour","quality"],axis=1)
submission = submission[["date_time","quality"]]
submission

In [None]:
X.hist(figsize=(35, 30));

In [None]:
sns.pairplot(data=X,palette='GnBu');