![manufacturing gears](manufacturing.jpg)

Manufacturing processes for any product is like putting together a puzzle. Products are pieced together step by step, and keeping a close eye on the process is important.

For this project, you're supporting a team that wants to improve how they monitor and control a manufacturing process. The goal is to implement a more methodical approach known as statistical process control (SPC). SPC is an established strategy that uses data to determine whether the process works well. Processes are only adjusted if measurements fall outside of an acceptable range. 

This acceptable range is defined by an upper control limit (UCL) and a lower control limit (LCL), the formulas for which are:

$ucl = avg\_height + 3 * \frac{stddev\_height}{\sqrt{5}}$

$lcl = avg\_height - 3 * \frac{stddev\_height}{\sqrt{5}}$

The UCL defines the highest acceptable height for the parts, while the LCL defines the lowest acceptable height for the parts. Ideally, parts should fall between the two limits.

Using SQL window functions and nested queries, you'll analyze historical manufacturing data to define this acceptable range and identify any points in the process that fall outside of the range and therefore require adjustments. This will ensure a smooth running manufacturing process consistently making high-quality products.

## The data
The data is available in the `manufacturing_parts` table which has the following fields:
- `item_no`: the item number
- `length`: the length of the item made
- `width`: the width of the item made
- `height`: the height of the item made
- `operator`: the operating machine

Import modules

In [1]:
# SQL Engine imports
from dotenv import load_dotenv
import os
import psycopg2
from sqlalchemy import create_engine
from sqlalchemy.sql import text
import warnings
warnings.filterwarnings("ignore")

# Python data analysis imports
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)

Initialize SQL

In [2]:
load_dotenv()
user = os.environ.get("USER")
pw = os.environ.get("PASS")
db = os.environ.get("DB")
host = os.environ.get("HOST")
api = os.environ.get("API")
port = 5432

In [3]:
uri = f"postgresql+psycopg2://{user}:{pw}@{host}:{port}/{db}"
alchemyEngine = create_engine(uri)
conn = alchemyEngine.connect()

Load data

In [4]:
df = pd.read_csv('parts.csv')
df.to_sql('manufacturing_parts', conn, if_exists='replace', index=False)

500

In [6]:
def query(stmt: str):
    """Executes a given SQL statement and returns a Pandas DataFrame given the results.
    
    Parameters
    ----------
    stmt: str
        The SQL statement to be executed
    """
    global conn
    result = pd.read_sql_query(stmt, conn.connection)
    return result

Exploring the data

In [7]:
query('''SELECT * FROM manufacturing_parts''')

Unnamed: 0,item_no,length,width,height,operator
0,1,102.67,49.53,19.69,Op-1
1,2,102.50,51.42,19.63,Op-1
2,3,95.37,52.25,21.51,Op-1
3,4,94.77,49.24,18.60,Op-1
4,5,104.26,47.90,19.46,Op-1
...,...,...,...,...,...
495,496,101.24,49.03,20.96,Op-20
496,497,98.37,52.12,19.68,Op-20
497,498,96.49,48.78,19.19,Op-20
498,499,94.16,48.39,21.60,Op-20


# TASK:

Analyze the manufacturing_parts table and determine whether the manufacturing process is performing within acceptable control limits:.
- Create an alert that flags whether the height of a product is within the control limits for each operator using the formulas provided in the workbook.
- The final query should return the following fields: operator, row_number, height, avg_height, stddev_height, ucl, lcl, alert, and be ordered by the item_no.
- The alert column will be your boolean flag.
- Use a window function of length 5 to calculate the control limits, considering rows up to and including the current row; incomplete windows should be excluded from the final query output.

In [9]:
df = query('''
    WITH avg_std_cte AS (
		SELECT 
			operator,
			item_no,
			ROW_NUMBER() OVER(PARTITION BY operator ORDER BY item_no 
							ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS row_number,
			height,
			AVG(height) OVER(PARTITION BY operator ORDER BY item_no 
							ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS avg_height,
			STDDEV(height) OVER(PARTITION BY operator ORDER BY item_no 
							ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS stddev_height
		FROM
			manufacturing_parts
	),
	lcl_ucl_cte AS (
		SELECT
			operator,
			item_no,
			row_number,
			height,
			avg_height,
			stddev_height,
			avg_height + 3 * (stddev_height / SQRT(5)) AS ucl,
			avg_height - 3 * (stddev_height / SQRT(5)) AS lcl
		FROM
			avg_std_cte
	)
		SELECT
			operator,
			row_number,
			height,
			avg_height,
			stddev_height,
			ucl,
			lcl,
			CASE WHEN height BETWEEN lcl and ucl THEN FALSE ELSE TRUE END AS alert
		FROM
			lcl_ucl_cte
		WHERE
			row_number >= 5
		ORDER BY 
			item_no
''')
df

Unnamed: 0,operator,row_number,height,avg_height,stddev_height,ucl,lcl,alert
0,Op-1,5,19.46,19.778,1.062812,21.203912,18.352088,False
1,Op-1,6,20.36,19.912,1.090812,21.375477,18.448523,False
2,Op-1,7,20.22,20.030,1.084574,21.485108,18.574892,False
3,Op-1,8,21.03,19.934,0.931225,21.183369,18.684631,False
4,Op-1,9,19.78,20.170,0.598832,20.973418,19.366582,False
...,...,...,...,...,...,...,...,...
415,Op-20,17,20.96,20.370,0.853698,21.515356,19.224644,False
416,Op-20,18,19.68,20.362,0.861464,21.517775,19.206225,False
417,Op-20,19,19.19,20.098,0.996454,21.434883,18.761117,False
418,Op-20,20,21.60,20.146,1.075119,21.588423,18.703577,True


Number of instances where the height is within range and out of range

In [10]:
df['alert'].value_counts()

alert
False    363
True      57
Name: count, dtype: int64