# 06 - Statistical Functions

This notebook covers NumPy's statistical functions for data analysis.

## What You'll Learn
- Measures of central tendency (mean, median, mode)
- Measures of dispersion (std, var, range)
- Percentiles and quantiles
- Correlation and covariance

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Basic Statistics

In [None]:
data = np.array([23, 45, 67, 12, 89, 34, 56, 78, 90, 11])
print(f"Data: {data}")
print(f"\nSum: {np.sum(data)}")
print(f"Mean: {np.mean(data)}")
print(f"Median: {np.median(data)}")
print(f"Min: {np.min(data)}")
print(f"Max: {np.max(data)}")
print(f"Range: {np.ptp(data)}")

## Measures of Dispersion

In [None]:
data = np.array([2, 4, 4, 4, 5, 5, 7, 9])
print(f"Data: {data}")
print(f"\nVariance: {np.var(data):.4f}")
print(f"Standard Deviation: {np.std(data):.4f}")
print(f"\n# With Bessel's correction (ddof=1):")
print(f"Sample Variance: {np.var(data, ddof=1):.4f}")
print(f"Sample Std Dev: {np.std(data, ddof=1):.4f}")

## Percentiles and Quantiles

In [None]:
data = np.arange(1, 101)
print(f"25th percentile (Q1): {np.percentile(data, 25)}")
print(f"50th percentile (Q2/Median): {np.percentile(data, 50)}")
print(f"75th percentile (Q3): {np.percentile(data, 75)}")
print(f"\nMultiple percentiles: {np.percentile(data, [10, 25, 50, 75, 90])}")

## Statistics Along Axes

In [None]:
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"2D Array:\n{arr_2d}")
print(f"\nGlobal mean: {np.mean(arr_2d)}")
print(f"Mean by column (axis=0): {np.mean(arr_2d, axis=0)}")
print(f"Mean by row (axis=1): {np.mean(arr_2d, axis=1)}")

## Correlation and Covariance

In [None]:
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 5, 4, 5])

print(f"Correlation coefficient:\n{np.corrcoef(x, y)}")
print(f"\nCovariance matrix:\n{np.cov(x, y)}")

## Visualization

In [None]:
np.random.seed(42)
data = np.random.normal(loc=50, scale=15, size=1000)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].hist(data, bins=30, edgecolor='black', alpha=0.7)
axes[0].axvline(np.mean(data), color='red', label=f'Mean: {np.mean(data):.2f}')
axes[0].axvline(np.median(data), color='green', label=f'Median: {np.median(data):.2f}')
axes[0].legend()
axes[0].set_title('Distribution with Mean and Median')

axes[1].boxplot(data)
axes[1].set_title('Box Plot')

plt.tight_layout()
plt.show()

## Summary

Key functions:
- `np.mean()`, `np.median()` - Central tendency
- `np.std()`, `np.var()` - Dispersion
- `np.percentile()` - Percentiles
- `np.corrcoef()`, `np.cov()` - Relationships

## Exercises

1. Calculate mean, median, and std of a random array
2. Find the quartiles (Q1, Q2, Q3) of a dataset
3. Calculate statistics along different axes of a 2D array
4. Compute correlation between two variables

In [None]:
# Your exercises here
