# Detecting Product Defects with Probability

This project abou of monitoring the number of defective products from a specific factory. The number of defects on a given day follows the Poisson distribution with the rate parameter (lambda) equal to 7. We  want to get a feel for what it means to follow the Poisson(7) distribution. Fore the reminder that the Poisson distribution is special because the rate parameter represents the expected value of the distribution, so in this case, the expected value of the Poisson(7) distribution is 7 defects per day.

We will investigate certain attributes of the Poisson(7) distribution to get an intuition for how many defective objects we should expect to see in a given amount of time. We will also practice and apply knowledge about the Poisson distribution on a practice data set that we will simulate for us.

In [4]:
# importing necessary libraries
import pandas as pd
import numpy as np
import scipy.stats as stats

# setting options
pd.set_option('display.max_columns', None)
np.set_printoptions(suppress=True, precision = 2)

## Distribution in Theory

In [5]:
# task 1: Create a variable called lam that represents the rate parameter of our distribution
lam = 7

In [6]:
# task 2: Calculate and print the probability of observing exactly lam defects on a given day
# Using the Probability Mass Function
prob_lam = stats.poisson.pmf(lam, lam)
print(f"The probability of observing exactly {lam} defects on a given day is: {prob_lam:.4f}")

The probability of observing exactly 7 defects on a given day is: 0.1490


In [7]:
# task 3: Calculate and print the probability of having 4 or fewer defects on a given day
# Using the Cumulative Distribution Function
prob_4_or_fewer = stats.poisson.cdf(4, lam)
print(f"The probability of having 4 or fewer defects on a given day is: {prob_4_or_fewer:.4f}")

The probability of having 4 or fewer defects on a given day is: 0.1730


In [8]:
# task 4: Calculate and print the probability of having more than 9 defects on a given day
prob_9_or_more = 1- stats.poisson.cdf(9, lam)
print(f"The probability of having more than 9 defects on a given day is: {prob_9_or_more:.4f}")

The probability of having more than 9 defects on a given day is: 0.1695


## Distribution in Practice

In [9]:
# task 5: Create a variable called year_defects that has 365 random values from the Poisson distribution
# Using the .rvs() method
year_defects = stats.poisson.rvs(lam, size = 365)

In [10]:
# task 6: Print the first 20 values in this data set
print(f"The first 20 values in year_defects data set are:\n{year_defects[:20]}")

The first 20 values in year_defects data set are:
[ 3 11  8  2  5  9  3 11  4  7  6  5  5  5  8 10  7  9  2  7]


In [11]:
# task 7: The total number of defects we would expect over 365 days
total_expect_defects = lam * 365
print(f"The total number of defects we would expect over 365 days is: {total_expect_defects}")

The total number of defects we would expect over 365 days is: 2555


In [12]:
# task 8: Calculate and print the total sum of the data set year_defects
total_defects = np.sum(year_defects)
print(f"The total sum of the data set year_defects is: {total_defects}")

The total sum of the data set year_defects is: 2493


The total number of defects we observed is less/greater (underline needed) than the sum of the expected number of defects.

In [13]:
# task 9: Calculate and print the average number of defects per day from our simulated dataset
print(f"The average number of defects per day from our simulated dataset is: {year_defects.mean():.2f}")
print(f"The expected value of defects per day is: {lam:.2f}")

The average number of defects per day from our simulated dataset is: 6.83
The expected value of defects per day is: 7.00


The resulting value of the average number of defects per day is pretty close to the expected average from the Poisson distribution.

In [14]:
# task 10: Print the maximum value of year_defects
defects_max = np.max(year_defects)
print(f"The maximum value of year_defects is: {defects_max}")

The maximum value of year_defects is: 20


In [15]:
# task 11: Calculate and print the probability of observing that maximum value or more from the Poisson(7) distribution
prob_max_or_more = 1- stats.poisson.cdf(defects_max, lam)
print(f"The probability of observing that maximum value or more is: {prob_max_or_more:.4f}")

The probability of observing that maximum value or more is: 0.0000


## Extra

task 12: We want to know how many defects in a given day would put us in the 90th percentile of the Poisson(7) distribution.

In [21]:
defects_90_percentile = stats.poisson.ppf(0.9, lam)
print(f"On 90% of days, we will observe fewer than {defects_90_percentile} defects")

On 90% of days, we will observe fewer than 10.0 defects


In [39]:
# task 13: Calculate the proportion of our simulated dataset year_defects is greater than or equal to the number defects_90_percentile
year_defects_90ppf_or_more = [i for i in year_defects if i >= defects_90_percentile]
# print(year_defects_90ppf_or_more)
count_90ppf_or_more = len(year_defects_90ppf_or_more)
print(f"The number of values in the dataset that are greater than or equal to the 90th percentile value is: {count_90ppf_or_more}")

prop_90ppf_or_more = count_90ppf_or_more/len(year_defects)
print(f"The proportion of our simulated dataset year_defects is greater than or equal to defects_90_percentile is: {prop_90ppf_or_more:.2f}")

The number of values in the dataset that are greater than or equal to the 90th percentile value is: 61
The proportion of our simulated dataset year_defects is greater than or equal to defects_90_percentile is: 0.17
