<a href="https://colab.research.google.com/github/SURESHBEEKHANI/Statistics-For-Data-Science-learining/blob/main/distributions_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Statistical Distributions**

This notebook covers key statistical concepts:  
- **Normal Distribution**: Common in natural phenomena, symmetric bell shape.  
- **Log-Normal Distribution**: Positive values, right-skewed, used in financial modeling.  
- **Power Law Distribution**: Heavy-tailed, explains rare events dominating outcomes.  
- **Pareto Distribution**: "80/20 rule," common in economics.  
- **Central Limit Theorem (CLT)**: Explains why sampling distributions tend to be normal.

### **Objective**
Visualize and understand these distributions and the CLT using Python.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Generate data for a Normal Distribution
mean = 0      # Mean of the distribution
std_dev = 1   # Standard deviation
sample_size = 10000  # Number of random samples to generate

# Generate random samples from a normal distribution using the mean, standard deviation, and sample size
data = np.random.normal(mean, std_dev, sample_size)

# Calculate the boundaries for the Empirical Rule (68-95-99.7 rule)
one_std = (mean - std_dev, mean + std_dev)    # Boundaries for 1σ
two_std = (mean - 2*std_dev, mean + 2*std_dev) # Boundaries for 2σ
three_std = (mean - 3*std_dev, mean + 3*std_dev) # Boundaries for 3σ

# Plot the distribution using a histogram and a Kernel Density Estimate (KDE)
plt.figure(figsize=(12, 6))  # Set the size of the plot
sns.histplot(data, kde=True, bins=50, color='skyblue', label='Data', stat="density", linewidth=0)  # Plot histogram with KDE curve

# Highlight regions for the Empirical Rule on the plot
plt.axvspan(one_std[0], one_std[1], color='green', alpha=0.3, label='68% within 1σ')  # Highlight 68% region
plt.axvspan(two_std[0], two_std[1], color='yellow', alpha=0.3, label='95% within 2σ')  # Highlight 95% region
plt.axvspan(three_std[0], three_std[1], color='red', alpha=0.2, label='99.7% within 3σ')  # Highlight 99.7% region

# Add details to the plot with a professional style
plt.title("Normal Distribution with Empirical Rule", fontsize=18, fontweight='bold', pad=20)  # Title of the plot
plt.xlabel("Value", fontsize=14, labelpad=10)  # Label for the x-axis
plt.ylabel("Density", fontsize=14, labelpad=10)  # Label for the y-axis
plt.legend(fontsize=12, loc='upper right')  # Add legend
plt.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.6)  # Add grid with lighter style

# Set the style for the plot
sns.set_style("whitegrid")

# Show the plot
plt.show()

# Print percentages to validate that the data fits the Empirical Rule
within_1_std = np.mean((data >= one_std[0]) & (data <= one_std[1])) * 100  # Percentage within 1σ
within_2_std = np.mean((data >= two_std[0]) & (data <= two_std[1])) * 100  # Percentage within 2σ
within_3_std = np.mean((data >= three_std[0]) & (data <= three_std[1])) * 100  # Percentage within 3σ

# Output the calculated percentages
print(f"Data within 1σ: {within_1_std:.2f}%")
print(f"Data within 2σ: {within_2_std:.2f}%")
print(f"Data within 3σ: {within_3_std:.2f}%")
