Distribution is a mathematical function that refers how values in a dataset are spread or arranged. this can help to understand the patterns and calculates the probabilities of data points occuring.

### **Types of Distributions**

**Discrete Distribution**

A discrete distribution is used for discrete data, where values are distinct and finite.

Example: Number of website visits per day (0, 1, 2, … but not 2.5).

Common Discrete Distributions:
- Bernoulli Distribution: Binary outcome (success/failure, 1/0).
- Binomial Distribution: Number of successes in multiple trials (flipping a coin 10 times).

**Continuous Distribution**

A continuous distribution is used for continuous data, where values can take any number within a range.

Example: Heights of students (160.5 cm, 170.2 cm, etc.).

Common Continuous Distributions:
- Normal Distribution (Gaussian Distribution): Bell-shaped curve, common in nature (e.g., heights, IQ scores).
- Exponential Distribution: Time between random events (e.g., time until a car accident occurs).
- Uniform Distribution: All values have equal probability (e.g., rolling a fair die with values between 1 and 6).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binom, norm

# Creating a figure
plt.figure(figsize=(12, 6))

# Discrete Distribution: Binomial (Example: Flipping a Coin 10 Times)
n, p = 10, 0.5  # 10 trials, 50% success probability
x_binom = np.arange(0, 11)
y_binom = binom.pmf(x_binom, n, p)
plt.subplot(1, 2, 1)
plt.bar(x_binom, y_binom, color='blue', alpha=0.7)
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.title('Discreet Distribution')

# Continuous Distribution: Normal (Example: Heights of People)
x_norm = np.linspace(-3, 3, 100)
y_norm = norm.pdf(x_norm, 0, 1)  # Mean=0, Std Dev=1
plt.subplot(1, 2, 2)
plt.plot(x_norm, y_norm, color='red')
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Continuous Distribution')

plt.tight_layout()
plt.show()


**Uniform Distribution (Evenly Spread Data)**

All values occur with equal probability.

Example: Rolling a fair die (1, 2, 3, 4, 5, and 6 are equally likely).

In [None]:
import numpy as np
import matplotlib.pyplot as plt

data = np.random.uniform(0, 10, 1000)  # Generating 1000 random numbers between 0 and 10
plt.hist(data, bins=20, color='blue', alpha=0.7)
plt.title('Uniform Distribution')
plt.show()

**Normal Distribution (Bell Curve)**
Most common in real-world data (e.g., heights, IQ scores) and Follows the 68-95-99.7 Rule:
- 68% of values are within 1 standard deviation (σ) of the mean (μ).
- 95% are within 2σ.
- 99.7% are within 3σ.

In [None]:
from scipy.stats import norm

data = np.random.normal(50, 10, 1000)  # Mean = 50, Std Dev = 10
plt.hist(data, bins=30, density=True, alpha=0.6, color='red')

x = np.linspace(20, 80, 100)
plt.plot(x, norm.pdf(x, 50, 10), color='black')  # Theoretical normal curve
plt.title('Normal Distribution')
plt.show()

**Skewed Distribution**

- Right-skewed (Positive Skew): Tail is on the right (e.g., salaries, housing prices).
- Left-skewed (Negative Skew): Tail is on the left (e.g., test scores where most students score high).

Key Differences:

|Feature|Left-Skewed (Negative Skew)|Right-Skewed (Positive Skew)|
|---|---|---|
|Peak (Mode)|On the right|On the left|
|Tail Direction|Extends left (lower values)|Extends right (higher values)|
|Mean vs. Median|Mean < Median|Mean > Median|
|Example|Test scores where most students score high|Income distribution where few earn very high salaries

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

# Generate left-skewed data (Beta distribution)
left_skewed_data = np.random.beta(a=5, b=2, size=1000) * 100  # Scale to 100

# Generate right-skewed data (Exponential distribution)
right_skewed_data = np.random.exponential(scale=20, size=1000)  # Scale controls spread

# Create figure for side-by-side histograms
plt.figure(figsize=(12, 5))

# Left-Skewed Histogram (Negative Skew)
plt.subplot(1, 2, 1)
plt.hist(left_skewed_data, bins=30, color='purple', alpha=0.7, edgecolor='black')
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.title("Left-Skewed Distribution")

# Right-Skewed Histogram (Positive Skew)
plt.subplot(1, 2, 2)
plt.hist(right_skewed_data, bins=30, color='orange', alpha=0.7, edgecolor='black')
plt.xlabel("Values")
plt.ylabel("Frequency")
plt.title("Right-Skewed Distribution")

# Show plots
plt.tight_layout()
plt.show()
