<a href="https://www.kaggle.com/code/absndus/data-science-portfolio-detecting-defect-notebook?scriptVersionId=134553641" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## Data Science Portfolio - Detecting Product Defects Using Statistical Functions Notebook ##

### Created by: Albert Schultz ###

### Date Created: 05/23/2023 ###

### Version: 1.00 ###

### Executive Summary ###
This notebook goes over the product defects and the probability of defects throughout the 24 hour duration using various statistical functions provided in Python Scipy library and in Numpy library modules. 

## Table of Contents ##

1. [Introduction](#1.-Introduction)
2. [Vision and Goals](#2.-Vision-and-Goals)
3. [Exploration of Aspects of the Probability of Average of 9 Defects Throughout the Day](#3.-Exploration-of-Aspects-of-the-Probability-of-Average-of-9-Defects-Throughout-the-Day)
4. [Summary](#4.-Summary)

## 1. Introduction ##

In this test scenario, I am in charge of monitoring the number of defective products from a specific factory. I have been told that the number of defects on a given day follows the Poisson distribution with the rate parameter (lambda) equal to 9. I want to get a feel for what it means to follow the Poisson(9) distribution. I remember that the Poisson distribution is special because the rate parameter represents the expected value of the distribution, so in this case, the expected value of the Poisson(9) distribution is 9 defects per day.

I will investigate certain attributes of the Poisson(9) distribution to get an intuition for how many defective objects I should expect to see in a given amount of time. I will also practice and apply what I know about the Poisson distribution on a practice data set.

**Initialize the Notebook for data access, import library modules, and set the working directory for this project.**

In [100]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import scipy.stats as stats # Import the needed statistical functions for use in this notebook. 
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## 2. Vision and Goals ##

This notebook contains step by step to find various probabilities to make sense of the attributes of the Possion (9) defects expected per day. 

**Vision:** To provide investigative cleaned data about the aspects of the probabilities of the likely chances of products having defects throughout the day when the expected lambda is 9 defects per day. 

**Goals:**
1. Go through the steps below to understand the aspects of the probabilities of getting 9 defects (lambda) of products throughout the day. 

## 3. Exploration of Aspects of the Probability of Average of 9 Defects Throughout the Day ##

In this section, I go over the steps of reviewing the aspects of the probabilities of the 9 expected defects within the products throughout the 24 hour period. 

1. Create a variable called **lam** that represents the rate parameter of our distribution and set it to an average of 9 defects expected per day in the products throughout the manufacturing process.  

In [101]:
lam = 9

2. To find how often I might observe the exact expected number of defects of 9, perform the calculation using the **probability mass function (pmf)** formula. 

In [102]:
#Use the Probability Mass Function (pmf) to find the EXACT expected number of defects and multiply by 100 to get the full percentage. 
expected_probability_lam = '{:.2f}'.format(stats.poisson.pmf(lam, lam) * 100)

#Print out the expected_probability_lam expected defect. 
print(f"The expected probability of getting 9 defects in products throughout the 24 hour period exactly is {expected_probability_lam}%.")

The expected probability of getting 9 defects in products throughout the 24 hour period exactly is 13.18%.


3. Using the **Contineous Density Function (CDF)**, calculate and print the pobability of having one of these days with 4 or fewer defects on a given day out of 7 expected defects. 

In [103]:
#Use the Contineous Density Function (CDF) to find the probability of getting the probability of defects or fewer than 4 with the avg Lamda expct defects of 9 and multiply by 100 to get the full percentage.  
expected_few_cdf_lam = '{:.2f}'.format(stats.poisson.cdf(4, lam) * 100)

#Print out the expected_few_cdf_lam of 4 or fewer defects throughout the day out of 9 avg defects. 
print(f"The expected defects of 4 or fewer in products throughout the day out of 9 avg defects is {expected_few_cdf_lam}%.")

The expected defects of 4 or fewer in products throughout the day out of 9 avg defects is 5.50%.


4. The manager said that having more than 9 defects on any given day is considered a bad day. Calculate and print the probablity 

In [104]:
#Calculate the probability of getting mroe than 9 manufacturing defects in percentage and multiply by 100 to get the full percentage to print it out to the console. 
prob_m_9_defects = '{:.2f}'.format((1 - stats.poisson.cdf(9, lam)) * 100)

#Print out the probability of getting more than 9 defects in any given day. 
print(f"The likely probability of having more than 9 defects throughout the day is approx {prob_m_9_defects}%.")

The likely probability of having more than 9 defects throughout the day is approx 41.26%.


5. Create a new variable called **year_defects** that has 365 random values from the Poisson Distribution.  

In [105]:
#Calculate the sample size of 365 days of likely chances of 9 defects on any given days throguhout the year.
year_defects = stats.poisson.rvs(lam, size = 365)

6. Look at the new dataset **year_defects** to see the first 20 values of the set. 

In [106]:
print(year_defects[0:20])

[ 9 10 12 10  6 10 12 10  8  5  9  5  7 13  9  8 10  9 10 11]


7. If I expected 7 defects on a given day, what is the total number of defects I could expect over the course of 365 days?

In [107]:
# create a Poisson distribution with mu as its parameter
expected_defects_7 = stats.poisson.rvs(7, size = 365)

#Print out the probability of the 7 defects throughout the 365 days. 
print(expected_defects_7)

[ 9  8  5 17  9  4  3  7  6  5  9 10  8  6 10  2  9  6  9  4  4  7 10  6
  9  9  7  6  6  5  2  6  4 10  7  5  6  8 12  2 10  7 13  6  5  8  9  7
  4  7  3 10  6  6  8 10  6  8  6  6  9  9  7 10  7 11  2  5  4  9  7  8
 11  6  6  9  3  4  5  6  3  0  4  4  4  5 13  3  8  6  6  7 12  6  9  7
  8  4  9  7  8  6  4  7  6  7  7  6 13  8 13  3  8  8 16  5  6  6  4  7
  6  2  5 10  7 10  8  6  7  7  7  6 11  7  9  4  7  9 11  7  4  5  6  8
  5  8  9  6 17  4  6  4  6  4  5 10 11  9  4 11  4  7  5  6  9  6  8  5
  6  8  8 16  8  4  9  6  8  8  6  7  9  8  9  9  2  4  8 10  1  5  5 11
  9  3  3 12 11  6  7  9  5  2  4  5 10  5  6 13  5  9 10  9  7  4  7 10
  4  6  5  7  7  5  6  4  5  9  6  8  4  7  8  5  6  4  7  5  6  4  5  9
  4 10  9  6  5 12 12  6  9  3  6  6 10  9  8  7  5  6  3  9  5  8  5  7
  8  4  8 10  8 16  7  5  8  6  8  9  8  3  5  8  9 10 12  7  5  5  3  9
  7  4  8  7  8  9  9  4  9  5  4  5  5  7  5  7 10  7  9 10  8 12  6  7
  6  5  7  4  2  4  9  7 10 10  6 15  7  9 12  7  9

8. Calculate and print the total sum of the data set of year_defects. How does this compare to the total number of defects we expected over 365 days?

In [108]:
year_defects_9_f = '{:.0f}'.format(sum(year_defects))
year_defects_7_f = '{:.0f}'.format(sum(expected_defects_7))
print(f"The total sum of the yearly total defects is approximately {year_defects_9_f} products for expected defects of 9 products per day.")
print(f"The total sum of the yearly total defects for 7 expected defects per day is {year_defects_7_f} defected products over the course of the year.")

The total sum of the yearly total defects is approximately 3263 products for expected defects of 9 products per day.
The total sum of the yearly total defects for 7 expected defects per day is 2559 defected products over the course of the year.


9. Calculate and print the average of the number of defects per day of expected 9 defects (lambda) from the simulated dataset. 

In [109]:
#Calculate the average defects per day from the simulated dataset. 
avg_defects_per_day_365 = '{:.0f}'.format(np.mean(int(year_defects_9_f)/365))

#Print the avg defects per day of the 365 days with the expected defects of 9 per day. 
print(f"The average defects per day whose expected defects is 9 defects per day throughout the 365 days is {avg_defects_per_day_365} products.")

The average defects per day whose expected defects is 9 defects per day throughout the 365 days is 9 products.


10. Print out the amx value of defects in any given day from the year_defects dataset. 

In [110]:
print(f"The max defects in any given day of the year of expected 9 defects per day is approximately {year_defects.max()} items.")

The max defects in any given day of the year of expected 9 defects per day is approximately 21 items.


11. Calculate and print th probability of observing the max value or more from the Poisson(7) distribution. 

In [111]:
#Find the probability of observing the max value of mroe than 7 defects distribution. 
max_distribution_prob_7 = '{:.2f}'.format((1 - stats.poisson.cdf(7, 7))*100)

#Print the probability below. 
print(f"The probability of observing the value greater than 7 defects is {max_distribution_prob_7}%.")

The probability of observing the value greater than 7 defects is 40.13%.


## Summary ##

In this short notebook lab, I went through the ways of finding various aspects of the probabilities of having fewer expected defects than 9 and or 7 and above the expected defects to understand how the statistics works in Python. Also, I went over the process of undestanding how the range makes a differences in likely chances of getting the expected lambda decreases when the sample size is greater than a small number of tests. 