<a href="https://colab.research.google.com/github/Rohan-1103/Data-Science/blob/main/task_43_Bernoulli_Binomial_CenLimTheo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


1. A company claims that their email marketing campaign has a 15% click-through rate. If you randomly select 100 people to receive the email, what is the probability that exactly 20 will click through to the website?
2. A researcher is investigating whether a new medication improves patient outcomes. The medication has a success rate of 75%. If the researcher enrolls 50 patients in the study, what is the probability that fewer than 35 will have a positive outcome?

3. A website offers a premium subscription service with a 20% sign-up rate. If you randomly select 500 visitors to the website, what is the probability that between 90 and 110 will sign up for the premium service?

4. A school district is investigating the effectiveness of a new reading program. The program has a success rate of 70%. If the district enrolls 200 students in the program, what is the probability that more than 140 will show significant improvement in reading skills?


5. A factory produces electronic components with a defect rate of 5%. If a shipment of 200 components is sent out, what is the probability that fewer than 10 will be defective?

6. A survey shows that 70% of people prefer chocolate ice cream over vanilla ice cream. If you randomly survey one person, what is the probability that they prefer vanilla ice cream?


7. A software company releases a new product with a bug rate of 2%. If 10,000 copies of the product are sold, what is the probability that at least 250 will have a bug?

8. According to data from the National Center for Health Statistics (NCHS), the average height for adult men aged 20 years and over in the United States is approximately 69.2 inches with a standard deviation of approximately 2.9 inches. If you randomly select a sample of 50 adult men aged 20 years and over, what is the probability that the sample mean height is greater than 70 inches?


Central Limit Theorem:
9. A company claims that the average salary of its employees is 75,000 with a standard deviation of 10,000. If you randomly select 100 employees, what is the probability that the sample mean salary is less than 72,500?

10. A restaurant claims that the average wait time for a table is 15 minutes with a standard deviation of 3 minutes. If you randomly survey 50 customers, what is the probability that the sample mean wait time is greater than 16 minutes?
Bernoulli Distribution:


1. A company claims that their email marketing campaign has a 15% click-through rate. If you randomly select 100 people to receive the email, what is the probability that exactly 20 will click through to the website?

- Solution - 1

This is a binomial probability problem. The probability of exactly 20 people clicking through to the website out of 100 people who received the email can be calculated using the binomial formula:

`P(X=k) = (n choose k) * p^k * (1-p)^(n-k)`

where n is the number of trials (100 in this case), k is the number of successes (20 in this case), and p is the probability of success on a single trial (0.15 in this case).

Substituting these values into the formula, we get:

P(X=20) = (100 choose 20) * 0.15^20 * 0.85^80 ≈ 0.04

So, the probability that exactly 20 out of 100 people will click through to the website is approximately 0.04 or about 4%.

In [4]:
from scipy.stats import binom
from scipy.special import comb

n = 100
k = 20
p = 0.15

prob = binom.pmf(k, n, p)
# prob = binom.pmf(k, n, p) — What does it do?
# This line computes the probability of getting exactly k successes in n independent Bernoulli trials,
# where each trial has a success probability p.
# It comes from the Binomial Distribution.
print(prob)

# MANUAL METHOD
# P(x = 20) = (100 choose 20) * 0.15^20 * 0.85^80
result = comb(n, k, exact = True) * ((p)**k) * (1 - p)**(n-k)
print(result)

0.04022449066141772
0.04022449066141756


2. A researcher is investigating whether a new medication improves patient outcomes. The medication has a success rate of 75%. If the researcher enrolls 50 patients in the study, what is the probability that fewer than 35 will have a positive outcome?

In [7]:
n = 50
k = 35
p = 0.75

prob = binom.pmf(k, n, p)
print(prob)

# Manual approach
result = comb(n, k, exact = True) * ((p)**k) * (1-p)**(n-k)
print(result)

0.08883558401463605
0.088835584014636


3. A website offers a premium subscription service with a 20% sign-up rate. If you randomly select 500 visitors to the website, what is the probability that between 90 and 110 will sign up for the premium service?


- ### Solution 3:


This is another binomial probability problem. The probability of between 90 and 110 visitors signing up for the premium service out of 500 visitors to the website can be calculated using the cumulative distribution function (CDF) of the binomial distribution.

The CDF gives the probability that the number of successes in `n` independent trials is less than or equal to a given value `k`. In this case, `n` is the number of visitors to the website (500), `k1` is the minimum number of visitors who sign up for the premium service (90), `k2` is the maximum number of visitors who sign up for the premium service (110), and `p` is the probability of success on a single trial (0.20).

The probability that between `k1` and `k2` visitors will sign up for the premium service out of `n` visitors to the website can be calculated as:

`P(k1 ≤ X ≤ k2) = P(X ≤ k2) - P(X < k1) = F(k2) - F(k1-1)`

where F(k) is the CDF of the binomial distribution with parameters `n` and `p` at `k`.

Here's a Python code snippet that uses the `scipy.stats.binom` module to calculate this probability:



In [12]:
n = 500
k1 = 90
k2 = 110
p = 0.20

prob = binom.cdf(k2, n, p) - binom.cdf(k1-1, n, p)
print(prob)

# MANUAL
def pxeqlessthank(k, n, p):
  result = 0
  for i in range(k+1):
    result += comb(n, i, exact = True) * ((p)**i) * (1-p)**(n-i)
  return result
print(pxeqlessthank(k2, n, p) - pxeqlessthank(k1 - 1, n, p))

0.759748160785157
0.7597481607851779


4. A school district is investigating the effectiveness of a new reading program. The program has a success rate of 70%. If the district enrolls 200 students in the program, what is the probability that more than 140 will show significant improvement in reading skills?

In [15]:
n = 200
k1 = 140
# k2 = 200
p = 0.70

# prob = binom.cdf(k2, n, p) - binom.cdf(k1 - 1, n, p)
prob = 1 - binom.cdf(k1, n, p)
print(prob)

0.4733474593659296



5. A factory produces electronic components with a defect rate of 5%. If a shipment of 200 components is sent out, what is the probability that fewer than 10 will be defective?

In [17]:
n = 200
# k = 10
k = 9          # Fewer than 10
p = 0.05

prob = binom.cdf(k, n, p)
print(prob)

0.45470980868081556


6. A survey shows that 70% of people prefer chocolate ice cream over vanilla ice cream. If you randomly survey one person, what is the probability that they prefer vanilla ice cream?

- Solution 6:
This is a Bernoulli trial with p = 0.7. The probability of the person preferring vanilla ice cream is 1 - p = 0.3.

7. A software company releases a new product with a bug rate of 2%. If 10,000 copies of the product are sold, what is the probability that at least 250 will have a bug?

###✅ Correct Interpretation for “At least 250”

### **Meaning of the phrase**

> **“At least 250”** means
> [
> X >= 250
> ]

For a **discrete** random variable such as the binomial:

[
P(X >= 250) = 1 - P(X <= 249)
]

So the correct expression is:

[
P(X >= 250) = 1 - binom.cdf(249, n, p)
]

---

### ❌ Why `1 - binom.cdf(250)` is incorrect

If you compute:

```python
1 - binom.cdf(250, n, p)
```

You are calculating:

[
1 - P(X <= 250) = P(X >= 251)
]

This **excludes 250**, so the probability is **shifted by one** and therefore incorrect.

---

### ✅ Correct Code (Using k = 250)

### ✔️ Method 1 — Using CDF

```python
prob = 1 - binom.cdf(k - 1, n, p)
# Computes: 1 - P(X <= 249) = P(X >= 250)
```

### ✔️ Method 2 — Using Survival Function (Cleaner)

```python
prob = binom.sf(k - 1, n, p)
# Directly computes P(X >= 250)
```


In [23]:
n = 10000
k = 250         # Atleast 250
p = 0.02

# 1
# prob = 1 - binom.cdf(k, n, p)   INCORRECT

# 2
# prob = 1 - binom.cdf(k-1, n, p)
# OR
prob = binom.sf(k - 1, n, p)
print(prob)

0.0003167183372775443


8. According to data from the National Center for Health Statistics (NCHS), the average height for adult men aged 20 years and over in the United States is approximately 69.2 inches with a standard deviation of approximately 2.9 inches. If you randomly select a sample of 50 adult men aged 20 years and over, what is the probability that the sample mean height is greater than 70 inches?

- Solution 8:
This is a problem that can be solved using the central limit theorem. The central limit theorem states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution.

In this case, we are given that the `population mean height is 69.2 inches` and the `population standard deviation is 2.9 inches.` If we randomly select a sample of 50 adult men, the sample mean height will have a normal distribution with a `sample mean equal to the population mean (69.2)` and `a sample standard deviation equal to the population standard deviation divided by the square root of the sample size (2.9 / sqrt(50) ≈ 0.41).`

We can use this information to calculate the `probability that the sample mean height is greater than 70 inches.` This probability is equivalent to the probability that a standard normal variable Z is greater than (70 - 69.2) / 0.41 ≈ 1.95.

---

Z = (X̄ − μ) / (σ / √n)
<br>= (70 − 69.2) / (2.9 / √50)
<br>= (70 − 69.2) / 0.41
<br>= 0.8 / 0.41
<br>≈ 1.95

---


Using a standard normal table or a mathematical software package such as Python’s scipy library, we can find that this probability is approximately 0.026 or about 2.6%.

In [24]:
from scipy.stats import norm

mu = 69.2
sigma = 2.9
n = 50
x = 70

z = (x - mu) / (sigma/(n ** 0.5))

# Greater than 70
prob = 1 - norm.cdf(z)
print(prob)

0.025549978630102443


Central Limit Theorem:
9. A company claims that the average salary of its employees is 75,000 with a standard deviation of 10,000. If you randomly select 100 employees, what is the probability that the sample mean salary is less than 72,500?

In [25]:
x = 72500
mu = 75000
sigma = 10000
n = 100

z = (x - mu) / (sigma/(n ** 0.5))
prob = norm.cdf(z)
print(prob)

0.006209665325776132


10. A restaurant claims that the average wait time for a table is 15 minutes with a standard deviation of 3 minutes. If you randomly survey 50 customers, what is the probability that the sample mean wait time is greater than 16 minutes?

In [27]:
mu = 15
x = 16
sigma = 3
n = 50

z = (x - mu) / (sigma/(n ** 0.5))
prob = 1 - norm.cdf(z)
print(prob)

0.009211062727049524
