# Common Probability Distributions: Detailed Explanations

This section provides a detailed explanation of each distribution, including theoretical background, real-world applications, and their connection to the examples in this notebook.

---

## 1. Binomial Distribution

### Definition
The binomial distribution models the number of successes in a fixed number of trials, where:
- Each trial is independent.
- Each trial results in one of two possible outcomes (success or failure).
- The probability of success remains constant across trials.

### Parameters
- **n**: Number of trials.
- **p**: Probability of success in each trial.
- **X**: Random variable representing the number of successes.

### Probability Mass Function (PMF)
P(X = k) = C(n, k) * (p^k) * (1 - p)^(n - k)

Where:
- C(n, k) = n! / (k! * (n - k)!)
- k: Number of successes
- n: Number of trials
- p: Probability of success

### Real-World Applications
1. **Quality Control**: Number of defective items in a batch.
2. **Survey Responses**: Number of people agreeing to a statement in a poll.
3. **Sports**: Number of successful free throws in basketball.

### Notebook Example: Product Quality Testing
- **Scenario**: Testing 20 light bulbs with a 5% defect rate.
- **Goal**:
  - Compute P(X = 2): Probability of finding exactly 2 defective bulbs.
  - Compute P(X ≤ 3): Cumulative probability of finding at most 3 defective bulbs.
- **Significance**: Helps predict the likelihood of defects during quality control checks.

---

## 2. Poisson Distribution

### Definition
The Poisson distribution models the number of events occurring in a fixed interval of time or space, assuming:
- Events occur independently.
- Events occur at a constant average rate (λ).

### Parameters
- **λ**: The average number of events in the interval.

### Probability Mass Function (PMF)
P(X = k) = (λ^k * e^(-λ)) / k!

Where:
- λ: Average rate of events
- k: Number of events
- e: Euler's number (~2.718)

### Real-World Applications
1. **Call Centers**: Number of customer calls received in an hour.
2. **Website Traffic**: Number of user complaints or requests per day.
3. **Manufacturing**: Number of machine failures in a shift.

### Notebook Example: Website Traffic
- **Scenario**: Average of 3 complaints per day.
- **Goal**:
  - Compute P(X = 5): Probability of receiving exactly 5 complaints.
  - Compute P(X ≤ 4): Cumulative probability of receiving at most 4 complaints.
- **Significance**: Helps determine staffing needs and prioritize resource allocation.

---

## 3. Normal Distribution

### Definition
The normal distribution is a continuous probability distribution that models data clustering around a mean value. It is symmetric, with the mean, median, and mode being equal.

### Parameters
- **μ**: Mean (central value).
- **σ**: Standard deviation (spread or variability).

### Probability Density Function (PDF)
f(x) = (1 / (σ * sqrt(2 * π))) * e^(-(x - μ)^2 / (2 * σ^2))

Where:
- μ: Mean
- σ: Standard deviation
- x: Value of the random variable
- π: Pi (~3.14159)
- e: Euler's number

### Real-World Applications
1. **Human Characteristics**: Heights, weights, IQ scores.
2. **Employee Productivity**: Task completion times.
3. **Stock Market**: Daily returns on stock prices.

### Notebook Example: Employee Productivity
- **Scenario**: Average task time of 50 minutes with a standard deviation of 10 minutes.
- **Goal**:
  - Compute P(40 ≤ X ≤ 60): Probability of task completion between 40 and 60 minutes.
  - Compute P(X > 70): Probability of task taking more than 70 minutes.
- **Significance**: Helps in workload planning and understanding typical task durations.

---

## 4. Exponential Distribution

### Definition
The exponential distribution models the time between events in a Poisson process. It is often used to model waiting times and assumes that:
- Events occur continuously and independently.
- The rate (λ) of events is constant.

### Parameter
- **λ**: The rate of events per unit time.

### Probability Density Function (PDF)
f(x) = λ * e^(-λ * x), for x ≥ 0

### Cumulative Distribution Function (CDF)
F(x) = 1 - e^(-λ * x), for x ≥ 0

### Real-World Applications
1. **Customer Service**: Time between customer arrivals at a counter.
2. **Network Traffic**: Time between data packet arrivals.
3. **Manufacturing**: Time until a machine breaks down.

### Notebook Example: Customer Support Response Times
- **Scenario**: Average response time of 10 minutes per ticket.
- **Goal**:
  - Compute P(X ≤ 5): Probability of responding within 5 minutes.
  - Compute P(X > 15): Probability of responding after 15 minutes.
- **Significance**: Helps in understanding and improving service efficiency.

---

## Summary of Distributions

| Distribution    | Real-World Example                  | Key Metric                           |
|-----------------|-------------------------------------|--------------------------------------|
| **Binomial**    | Defective bulbs                    | Number of successes in trials       |
| **Poisson**     | Website complaints                 | Count of events in an interval      |
| **Normal**      | Task times                         | Probabilities for ranges of values  |
| **Exponential** | Response times                     | Time until an event occurs          |

---

These explanations provide the theoretical foundation and practical relevance for each distribution, helping you connect the computations to real-world scenarios.


In [1]:
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np

# Load the Iris dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Rename columns for clarity
df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
df['species'] = df['species'].apply(lambda x: iris.target_names[x])

# Display the first few rows
print("Dataset Overview:")
print(df.head())


Dataset Overview:
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
