# Statistics Fundamentals - Practice Notebook

**Course**: IIT Madras Foundation Level - Statistics  
**Date**: November 14, 2025  
**Topics**: Descriptive Statistics, Visualization, Basic Probability

---

## Setup: Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)
print("Libraries imported successfully!")

## 1. Measures of Central Tendency

In [None]:
# Sample data: test scores
scores = np.array([78, 85, 92, 88, 75, 95, 82, 88, 90, 85])

# Calculate measures
mean = np.mean(scores)
median = np.median(scores)
mode_result = stats.mode(scores, keepdims=True)
mode = mode_result.mode[0]

print(f"Scores: {scores}")
print(f"\nMean: {mean:.2f}")
print(f"Median: {median:.2f}")
print(f"Mode: {mode}")

### Practice Exercise 1:
Given the following data: [23, 45, 67, 45, 89, 34, 45, 56, 78]
Calculate mean, median, and mode.

In [None]:
# Your code here

## 2. Measures of Dispersion

In [None]:
# Using the same scores data
variance = np.var(scores, ddof=1)  # Sample variance
std_dev = np.std(scores, ddof=1)   # Sample standard deviation
range_val = np.max(scores) - np.min(scores)

# Quartiles and IQR
q1 = np.percentile(scores, 25)
q2 = np.percentile(scores, 50)  # Same as median
q3 = np.percentile(scores, 75)
iqr = q3 - q1

print(f"Variance: {variance:.2f}")
print(f"Standard Deviation: {std_dev:.2f}")
print(f"Range: {range_val}")
print(f"\nQ1 (25th percentile): {q1}")
print(f"Q2 (50th percentile/Median): {q2}")
print(f"Q3 (75th percentile): {q3}")
print(f"IQR: {iqr}")

### Practice Exercise 2:
For the data [10, 20, 30, 40, 50, 60, 70, 80, 90], calculate variance, standard deviation, and IQR.

In [None]:
# Your code here

## 3. Data Visualization - Histograms

In [None]:
# Generate sample data
data = np.random.normal(100, 15, 1000)  # Normal distribution

plt.figure(figsize=(12, 5))

# Histogram
plt.subplot(1, 2, 1)
plt.hist(data, bins=30, edgecolor='black', alpha=0.7)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.axvline(np.mean(data), color='red', linestyle='--', label='Mean')
plt.axvline(np.median(data), color='green', linestyle='--', label='Median')
plt.legend()

# Box plot
plt.subplot(1, 2, 2)
plt.boxplot(data)
plt.ylabel('Value')
plt.title('Box Plot')

plt.tight_layout()
plt.show()

### Practice Exercise 3:
Create a histogram and box plot for randomly generated data with mean=50 and std=10.

In [None]:
# Your code here

## 4. Working with DataFrames (Pandas)

In [None]:
# Create a sample dataset
data = {
    'Student': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
    'Math': [78, 85, 92, 88, 75, 95, 82, 88, 90, 85],
    'Statistics': [82, 88, 85, 90, 78, 92, 85, 87, 88, 90],
    'Python': [90, 85, 88, 92, 80, 95, 88, 90, 92, 87]
}

df = pd.DataFrame(data)
print("Student Scores:")
print(df)

print("\nDescriptive Statistics:")
print(df.describe())

### Practice Exercise 4:
Add a new column 'Total' that sums Math, Statistics, and Python scores for each student.

In [None]:
# Your code here

## 5. Correlation Analysis

In [None]:
# Calculate correlation matrix
correlation_matrix = df[['Math', 'Statistics', 'Python']].corr()

print("Correlation Matrix:")
print(correlation_matrix)

# Visualize correlation heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0,
            square=True, linewidths=1)
plt.title('Correlation Heatmap')
plt.show()

### Practice Exercise 5:
Create a scatter plot between Math and Python scores to visualize their relationship.

In [None]:
# Your code here

## 6. Basic Probability

In [None]:
# Simulate coin flips
n_flips = 1000
coin_flips = np.random.choice(['Heads', 'Tails'], size=n_flips)

heads_count = np.sum(coin_flips == 'Heads')
tails_count = np.sum(coin_flips == 'Tails')

print(f"Number of flips: {n_flips}")
print(f"Heads: {heads_count} ({heads_count/n_flips*100:.2f}%)")
print(f"Tails: {tails_count} ({tails_count/n_flips*100:.2f}%)")

# Visualize
plt.bar(['Heads', 'Tails'], [heads_count, tails_count])
plt.ylabel('Count')
plt.title(f'Coin Flip Results ({n_flips} flips)')
plt.axhline(n_flips/2, color='red', linestyle='--', label='Expected (500)')
plt.legend()
plt.show()

### Practice Exercise 6:
Simulate rolling a die 1000 times and plot the frequency of each outcome.

In [None]:
# Your code here

---

## Notes Section

**Key Concepts:**
- 

**Important Formulas:**
- 

**Questions/Doubts:**
- 

**Next Topics to Study:**
- 