# Product Defects

You are in charge of monitoring the number of defective products from a specific factory. You’ve been told that the number of defects on a given day follows the Poisson distribution with the rate parameter (lambda) equal to 7. You’re new here, so you want to get a feel for what it means to follow the Poisson(7) distribution. You remember that the Poisson distribution is special because the rate parameter represents the expected value of the distribution, so in this case, the expected value of the Poisson(7) distribution is 7 defects per day.

You will investigate certain attributes of the Poisson(7) distribution to get an intuition for how many defective objects you should expect to see in a given amount of time. You will also practice and apply what you know about the Poisson distribution on a practice data set that you will simulate yourself.


## Tasks

### Distribution in Theory

1. Create a variable called `lam` that represents the rate parameter of our distribution.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    If the average number of defects was 9, we would do:

    ```python
    lam = 9
    ```
    </details>

2. You know that the rate parameter of a Poisson distribution is equal to the expected value. So in our factory, the rate parameter would equal the expected number of defects on a given day. You are curious about how often we might observe the exact expected number of defects.

    Calculate and print the probability of observing exactly `lam` defects on a given day.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    We could do the following to calculate the probability of having exactly `val` defects on a given day:

    ```python
    print(stats.poisson.pmf(val, val))
    ```
    </details>

3. Our boss said that having 4 or fewer defects on a given day is an exceptionally good day. You are curious about how often that might happen.

    Calculate and print the probability of having one of these days.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    The probability of having 2 or fewer defects can be calculated by doing:

    ```python
    stats.poisson.cdf(2, lam)
    ```
    </details>

4. On the other hand, our boss said that having more than 9 defects on any given day is considered a bad day.

    Calculate and print the probability of having one of these bad days.

    <details>
        <summary>Stuck? Get a hint</summary>

    The probability of having 6 or more defects can be calculated by doing:

    ```python
    1 - stats.poisson.cdf(6, lam)
    ```
    </details>

### Distribution in Practice

5. You’ve familiarized yourself a little bit about how the Poisson distribution works in theory by calculating different probabilities. But let’s look at what this might look like in practice.

    Create a variable called `year_defects` that has 365 random values from the Poisson distribution.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    If we wanted to simulate 30 days worth of data, we could use the following:

    ```python
    month_defects = stats.poisson.rvs(lam, size = 30)
    ```
    </details>

6. Let’s take a look at our new dataset. Print the first 20 values in this data set.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    We can print the first 7 values from `year_defects` by doing:

    ```python
    print(year_defects[0:7])
    ```
    </details>

7. If we expect 7 defects on a given day, what is the total number of defects we would expect over 365 days?

    Calculate and print this value to the output terminal.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    If we expect `lambda` defects on a given day, the number of defects we would expect over a 30-day month would be:

    ```python
    print(lambda*30)
    ```
    </details>

8. Calculate and print the total sum of the data set `year_defects`. How does this compare to the total number of defects we expected over 365 days?

    <details>
        <summary>Stuck? Get a hint</summary>
    
    If we had data for a month in a variable called `month_defects`, we could count the total number of defects by doing:

    ```python
    sum(month_defects)
    ```
    </details>

9. Calculate and print the average number of defects per day from our simulated dataset.

    How does this compare to the expected average number of defects each day that we know from the given rate parameter of the Poisson distribution?

    <details>
        <summary>Stuck? Get a hint</summary>
    
    We can use the `np.mean()` function to calculate the average number of defects across all days. The resulting value should be pretty close to the expected average from the Poisson distribution.
    </details>

10. You’re worried about what the highest amount of defects in a single day might be because that would be a hectic day.

    Print the maximum value of `year_defects`.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    If the data set were called `month_defects`, we could print the maximum value by using:

    ```python
    print(month_defects.max())
    ```
    </details>

11. Wow, it would probably be super busy if there were that many defects on a single day. Hopefully, it is a rare event!

    Calculate and print the probability of observing that maximum value or more from the Poisson(7) distribution.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    If our maximum value is 30 from a Poisson(10) distribution, we could calculate the probability of observing that value or greater using:

    ```python
    1 - stats.poisson.cdf(29, 10)
    ```

    Note that we use 1 less than the maximum value in this case. This is because we want to include the maximum value in our probability, so we subtract off the probability of anything lower than it.
    </details>

### Extra

12. Congratulations! At this point, you have now explored the Poisson distribution and even worked with some simulated data. We have a couple of extra tasks if you would like an extra challenge. Feel free to try them out or move onto the next topic!

    Let’s say we want to know how many defects in a given day would put us in the 90th percentile of the Poisson(7) distribution. One way we could calculate this is by using the following method:

    ```python
    stats.poisson.ppf(percentile, lambda) 
    ```

    `percentile` is equal to the desired percentile (a decimal between 0 and 1), and `lambda` is the lambda parameter of the Poisson distribution. This function is essentially the inverse of the CDF.

    Use this method to calculate and print the number of defects that would put us in the 90th percentile for a given day. In other words, on 90% of days, we will observe fewer defects than this number.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    If we wanted to calculate what value puts us in the 65th percentile of the Poisson(9) distribution, we could use:

    ```python
    stats.poisson.ppf(0.65, 9)
    ```

    Remember to use a print statement.
    </details>

13. Now let’s see what proportion of our simulated dataset `year_defects` is greater than or equal to the number we calculated in the previous step.

    By definition of a percentile, we would expect 1 - .90, or about 10% of days to be in this range.

    To calculate this:

    1. Count the number of values in the dataset that are greater than or equal to the 90th percentile value.
    2. Divide this number by the length of the dataset.

    Click the hint if you want to see an example calculation.

    <details>
        <summary>Stuck? Get a hint</summary>
    
    If the value that would put us in the 90th percentile is 20, we could use:

    ```python
    sum(year_defects >= 20)/len(year_defects)
    ```

    The numerator tells us how many values in `year_defects` are greater than or equal to 20. Then we divide this value by the number of observations in the data set, which is 365, to get the proportion of days that have more than 20 defects.
    </details>


In [None]:
import scipy.stats as stats
import numpy as np

### Task Group 1 ###
## Task 1: 

## Task 2:

## Task 3:

## Task 4:


### Task Group 2 ###
## Task 5:

## Task 6:

## Task 7:

## Task 8:

## Task 9:

## Task 10:

## Task 11:


### Extra Bonus ###
# Task 12

# Task 13

### Solution

In [1]:
import scipy.stats as stats
import numpy as np

### Task Group 1 ###
## Task 1: 
lam = 7

## Task 2:
print(stats.poisson.pmf(lam, lam))

## Task 3:
print(stats.poisson.cdf(4, lam))

## Task 4:
print(1 - stats.poisson.cdf(9, lam))


### Task Group 2 ###
## Task 5:
year_defects = stats.poisson.rvs(lam, size = 365)

## Task 6:
print(year_defects[:20])

## Task 7:
print(365 * lam)

## Task 8:
print(sum(year_defects))

## Task 9:
print(year_defects.mean())

## Task 10:
print(year_defects.max())

## Task 11:
print(1 - stats.poisson.cdf(year_defects.max() - 1, lam))


### Extra Bonus ###
# Task 12
print(stats.poisson.ppf(0.9, lam))

# Task 13
print(sum(year_defects >= stats.poisson.ppf(0.9, lam))/len(year_defects))

0.14900277967433773
0.17299160788207146
0.16950406276132668
[ 5  5  3 13  5  7  5 10  4  4  6  5  8  9 10  5  4  9  5  4]
2555
2583
7.076712328767123
16
0.0024065803473980463
10.0
0.19452054794520549
