## Overview

Python's **Standard Library** is a collection of modules that come pre-installed with Python. These modules provide functionality for common tasks without requiring external package installation.

In this notebook, we'll explore modules particularly useful for data analysis:

1. **`math`** - Mathematical functions and constants
2. **`random`** - Pseudo-random number generation
3. **`statistics`** - Statistical functions
4. **`datetime`** - Date and time manipulation
5. **`platform`** - System information
6. **`collections`** - Specialized container datatypes

Understanding these modules is essential before moving to advanced libraries like NumPy and Pandas.

## 1. The `math` Module

### Mathematical Functions and Constants

In [None]:
import math

# Explore what's available
public_items = [item for item in dir(math) if not item.startswith('_')]
print(f"Number of public items in math: {len(public_items)}")
print(f"\nFirst 15 items: {public_items[:15]}")

### Common Constants

In [None]:
import math

print("Mathematical Constants:")
print(f"π (pi): {math.pi}")
print(f"e (Euler's number): {math.e}")
print(f"τ (tau = 2π): {math.tau}")
print(f"∞ (infinity): {math.inf}")
print(f"NaN (Not a Number): {math.nan}")

### Basic Operations

In [None]:
import math

# Power and roots
print("Power and Roots:")
print(f"sqrt(25) = {math.sqrt(25)}")
print(f"pow(2, 3) = {math.pow(2, 3)}")
print(f"exp(1) = {math.exp(1)}")

# Logarithms
print("\nLogarithms:")
print(f"log(e) = {math.log(math.e)}")
print(f"log10(100) = {math.log10(100)}")
print(f"log2(8) = {math.log2(8)}")

# Rounding
print("\nRounding:")
print(f"ceil(4.3) = {math.ceil(4.3)}")
print(f"floor(4.7) = {math.floor(4.7)}")
print(f"trunc(4.7) = {math.trunc(4.7)}")

### Trigonometric Functions

In [None]:
import math

# Angles in radians
angle_rad = math.pi / 4  # 45 degrees

print(f"Angle: {angle_rad} radians = {math.degrees(angle_rad)} degrees")
print(f"\nsin(π/4) = {math.sin(angle_rad):.4f}")
print(f"cos(π/4) = {math.cos(angle_rad):.4f}")
print(f"tan(π/4) = {math.tan(angle_rad):.4f}")

# Convert degrees to radians
angle_deg = 60
angle_rad = math.radians(angle_deg)
print(f"\nsin({angle_deg}°) = {math.sin(angle_rad):.4f}")

### Data Analysis Example: Distance Calculation

In [None]:
import math

def euclidean_distance(point1, point2):
    """Calculate Euclidean distance between two points."""
    x1, y1 = point1
    x2, y2 = point2
    return math.sqrt((x2 - x1)**2 + (y2 - y1)**2)

# Store locations (coordinates)
warehouse = (0, 0)
store_a = (3, 4)
store_b = (6, 8)
store_c = (5, 12)

# Calculate distances
distances = {
    'Store A': euclidean_distance(warehouse, store_a),
    'Store B': euclidean_distance(warehouse, store_b),
    'Store C': euclidean_distance(warehouse, store_c)
}

print("Distances from warehouse:")
for store, dist in distances.items():
    print(f"{store}: {dist:.2f} km")

# Find closest store
closest = min(distances, key=distances.get)
print(f"\nClosest store: {closest} ({distances[closest]:.2f} km)")

## 2. The `random` Module

### Understanding Pseudo-Randomness

Computers cannot generate truly random numbers. They use **algorithms** that produce sequences of numbers that appear random but are actually deterministic.

The sequence is determined by a **seed** value. Same seed → same sequence.

In [None]:
import random

# Without seed - different each time
print("Random numbers without seed:")
for i in range(5):
    print(random.random())

In [None]:
import random

# With seed - reproducible
print("First run with seed(42):")
random.seed(42)
for i in range(5):
    print(random.random())

print("\nSecond run with same seed(42):")
random.seed(42)
for i in range(5):
    print(random.random())

print("\nNotice: Both sequences are identical!")

### Generating Random Numbers

In [None]:
import random

# Float between 0.0 and 1.0
print(f"random.random(): {random.random()}")

# Float in a range
print(f"random.uniform(10, 20): {random.uniform(10, 20)}")

# Integer in a range [a, b] (inclusive)
print(f"random.randint(1, 10): {random.randint(1, 10)}")

# Integer from range (like range() function)
print(f"random.randrange(0, 100, 5): {random.randrange(0, 100, 5)}")

### Random Selections

In [None]:
import random

colors = ['red', 'green', 'blue', 'yellow', 'purple']

# Choose one item
print(f"random.choice(): {random.choice(colors)}")

# Choose multiple items (with replacement)
print(f"random.choices(k=3): {random.choices(colors, k=3)}")

# Choose multiple unique items (without replacement)
print(f"random.sample(k=3): {random.sample(colors, k=3)}")

# Shuffle in place
deck = list(range(1, 11))
print(f"\nOriginal: {deck}")
random.shuffle(deck)
print(f"Shuffled: {deck}")

### Random Distributions

In [None]:
import random

# Gaussian (normal) distribution
mean = 100
std_dev = 15
print("Normal distribution samples (IQ scores):")
for i in range(5):
    iq = random.gauss(mean, std_dev)
    print(f"  {iq:.1f}")

# Exponential distribution
lambda_val = 1/5  # mean = 5
print("\nExponential distribution samples (time between events):")
for i in range(5):
    time = random.expovariate(lambda_val)
    print(f"  {time:.2f} minutes")

### Data Analysis Example: Simulation

In [None]:
import random

# Simulate dice rolling to verify probability
random.seed(42)  # For reproducibility

def roll_dice(n_rolls=1000):
    """Simulate rolling two dice and count sum occurrences."""
    results = {i: 0 for i in range(2, 13)}  # Possible sums: 2-12
    
    for _ in range(n_rolls):
        die1 = random.randint(1, 6)
        die2 = random.randint(1, 6)
        total = die1 + die2
        results[total] += 1
    
    return results

# Run simulation
rolls = 10000
results = roll_dice(rolls)

print(f"Dice Rolling Simulation ({rolls} rolls)")
print("=" * 40)
for total, count in results.items():
    probability = (count / rolls) * 100
    bar = '█' * int(probability * 2)
    print(f"Sum {total:2d}: {bar} {probability:.1f}%")

print("\nNotice: 7 is the most common sum (16.7% theoretical)")

## 3. The `statistics` Module

### Central Tendency Measures

In [None]:
import statistics as stats

# Test scores dataset
scores = [78, 85, 92, 88, 76, 95, 89, 84, 91, 87, 82, 90]

print("Central Tendency Measures:")
print(f"Mean: {stats.mean(scores):.2f}")
print(f"Median: {stats.median(scores):.2f}")
print(f"Mode: {stats.mode([1, 2, 2, 3, 3, 3, 4])}")

# When no mode exists, use multimode
print(f"\nMultiple modes: {stats.multimode([1, 1, 2, 2, 3])}")

### Dispersion Measures

In [None]:
import statistics as stats

scores = [78, 85, 92, 88, 76, 95, 89, 84, 91, 87, 82, 90]

print("Dispersion Measures:")
print(f"Variance (population): {stats.pvariance(scores):.2f}")
print(f"Variance (sample): {stats.variance(scores):.2f}")
print(f"Std Dev (population): {stats.pstdev(scores):.2f}")
print(f"Std Dev (sample): {stats.stdev(scores):.2f}")

# Range (manual calculation)
data_range = max(scores) - min(scores)
print(f"Range: {data_range}")

### Understanding Population vs Sample

**Population**: All possible data points (use `pvariance`, `pstdev`)

**Sample**: Subset of population (use `variance`, `stdev`)

Sample statistics use `n-1` (Bessel's correction) for unbiased estimation.

In [None]:
import statistics as stats

data = [10, 20, 30, 40, 50]

print("Comparing Population vs Sample statistics:")
print(f"\nPopulation variance (÷n): {stats.pvariance(data):.2f}")
print(f"Sample variance (÷n-1): {stats.variance(data):.2f}")
print(f"\nPopulation std dev: {stats.pstdev(data):.2f}")
print(f"Sample std dev: {stats.stdev(data):.2f}")
print("\nSample estimates are larger (more conservative)")

### Quantiles and Quartiles

In [None]:
import statistics as stats

data = [15, 20, 35, 40, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100]

print("Quantile Analysis:")
print(f"Median (50th percentile): {stats.median(data)}")
print(f"\nQuartiles:")
print(f"Q1 (25th percentile): {stats.quantiles(data, n=4)[0]}")
print(f"Q2 (50th percentile): {stats.quantiles(data, n=4)[1]}")
print(f"Q3 (75th percentile): {stats.quantiles(data, n=4)[2]}")

# Deciles (10 equal parts)
deciles = stats.quantiles(data, n=10)
print(f"\n10th percentile: {deciles[0]}")
print(f"90th percentile: {deciles[-1]}")

### Data Analysis Example: Grade Analysis

In [None]:
import statistics as stats

# Student grades from two classes
class_a = [78, 85, 92, 88, 76, 95, 89, 84, 91, 87, 82, 90]
class_b = [65, 70, 98, 55, 88, 92, 60, 95, 75, 58, 85, 90]

def analyze_grades(grades, class_name):
    """Comprehensive grade analysis."""
    print(f"\n{class_name} Analysis:")
    print("=" * 40)
    print(f"Mean: {stats.mean(grades):.2f}")
    print(f"Median: {stats.median(grades):.2f}")
    print(f"Std Dev: {stats.stdev(grades):.2f}")
    print(f"Range: {max(grades) - min(grades)}")
    
    # Quartiles
    q = stats.quantiles(grades, n=4)
    print(f"\nQuartiles:")
    print(f"  Q1 (25%): {q[0]:.2f}")
    print(f"  Q2 (50%): {q[1]:.2f}")
    print(f"  Q3 (75%): {q[2]:.2f}")
    print(f"  IQR: {q[2] - q[0]:.2f}")

analyze_grades(class_a, "Class A")
analyze_grades(class_b, "Class B")

print("\n" + "=" * 40)
print("Interpretation:")
print(f"Class A: More consistent (lower std dev)")
print(f"Class B: Higher variability (some very high and low grades)")

## 4. The `datetime` Module

### Working with Dates and Times

In [None]:
from datetime import datetime, date, time, timedelta

# Current date and time
now = datetime.now()
print(f"Current datetime: {now}")
print(f"Current date: {date.today()}")

# Creating specific dates
birthday = date(1990, 5, 15)
print(f"\nBirthday: {birthday}")

# Creating specific times
meeting = time(14, 30, 0)
print(f"Meeting time: {meeting}")

# Creating specific datetime
event = datetime(2024, 12, 31, 23, 59, 59)
print(f"New Year's Eve: {event}")

### Date Arithmetic

In [None]:
from datetime import datetime, timedelta

today = datetime.now()
print(f"Today: {today.strftime('%Y-%m-%d %H:%M:%S')}")

# Add/subtract time
tomorrow = today + timedelta(days=1)
print(f"Tomorrow: {tomorrow.strftime('%Y-%m-%d')}")

next_week = today + timedelta(weeks=1)
print(f"Next week: {next_week.strftime('%Y-%m-%d')}")

three_hours_ago = today - timedelta(hours=3)
print(f"3 hours ago: {three_hours_ago.strftime('%H:%M:%S')}")

# Calculate difference
new_year = datetime(2025, 1, 1)
time_left = new_year - today
print(f"\nDays until 2025: {time_left.days}")

### Formatting Dates

In [None]:
from datetime import datetime

now = datetime.now()

# Different formats using strftime
print("Date Formatting Examples:")
print(f"ISO format: {now.isoformat()}")
print(f"US format: {now.strftime('%m/%d/%Y')}")
print(f"European format: {now.strftime('%d/%m/%Y')}")
print(f"Long format: {now.strftime('%B %d, %Y')}")
print(f"With time: {now.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"12-hour format: {now.strftime('%I:%M:%S %p')}")
print(f"Day of week: {now.strftime('%A')}")

### Parsing Date Strings

In [None]:
from datetime import datetime

# Parse strings into datetime objects
date_str1 = "2024-03-15"
date1 = datetime.strptime(date_str1, "%Y-%m-%d")
print(f"Parsed: {date1}")

date_str2 = "March 15, 2024"
date2 = datetime.strptime(date_str2, "%B %d, %Y")
print(f"Parsed: {date2}")

date_str3 = "15/03/2024 14:30"
date3 = datetime.strptime(date_str3, "%d/%m/%Y %H:%M")
print(f"Parsed: {date3}")

### Data Analysis Example: Sales Date Analysis

In [None]:
from datetime import datetime, timedelta
import random

# Simulate sales data with dates
random.seed(42)
base_date = datetime(2024, 1, 1)

sales_data = []
for i in range(30):
    sale_date = base_date + timedelta(days=i)
    amount = random.uniform(100, 500)
    sales_data.append({
        'date': sale_date,
        'amount': amount
    })

# Analyze by week
weekly_sales = {}
for sale in sales_data:
    week_num = sale['date'].isocalendar()[1]  # Get week number
    if week_num not in weekly_sales:
        weekly_sales[week_num] = []
    weekly_sales[week_num].append(sale['amount'])

print("Weekly Sales Analysis (January 2024)")
print("=" * 40)
for week, amounts in sorted(weekly_sales.items()):
    total = sum(amounts)
    avg = total / len(amounts)
    print(f"Week {week}: ${total:.2f} total, ${avg:.2f} average")

# Find best and worst days
best_day = max(sales_data, key=lambda x: x['amount'])
worst_day = min(sales_data, key=lambda x: x['amount'])

print(f"\nBest day: {best_day['date'].strftime('%B %d')} (${best_day['amount']:.2f})")
print(f"Worst day: {worst_day['date'].strftime('%B %d')} (${worst_day['amount']:.2f})")

## 5. The `platform` Module

### System Information

In [None]:
import platform

print("System Information:")
print("=" * 40)
print(f"System: {platform.system()}")
print(f"Release: {platform.release()}")
print(f"Version: {platform.version()}")
print(f"Machine: {platform.machine()}")
print(f"Processor: {platform.processor()}")
print(f"\nPython Information:")
print(f"Implementation: {platform.python_implementation()}")
print(f"Version: {platform.python_version()}")
print(f"Compiler: {platform.python_compiler()}")

### Practical Use: Environment Logging

In [None]:
import platform
from datetime import datetime

def log_environment():
    """Log analysis environment for reproducibility."""
    print("Analysis Environment Report")
    print("=" * 50)
    print(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"Python: {platform.python_version()}")
    print(f"OS: {platform.system()} {platform.release()}")
    print(f"Architecture: {platform.machine()}")
    print("=" * 50)

log_environment()

## 6. The `collections` Module

### Counter: Counting Hashable Objects

In [None]:
from collections import Counter

# Customer satisfaction ratings
ratings = ['Excellent', 'Good', 'Good', 'Excellent', 'Poor', 'Good', 
           'Excellent', 'Fair', 'Good', 'Excellent', 'Poor', 'Good']

# Count occurrences
rating_counts = Counter(ratings)
print("Rating Counts:")
for rating, count in rating_counts.most_common():
    print(f"  {rating}: {count}")

# Most common
print(f"\nMost common: {rating_counts.most_common(1)}")

# Total
print(f"Total ratings: {sum(rating_counts.values())}")

### defaultdict: Dictionary with Default Values

In [None]:
from collections import defaultdict

# Group sales by region
sales = [
    ('North', 1000),
    ('South', 1500),
    ('North', 1200),
    ('East', 800),
    ('South', 1300),
    ('North', 1100)
]

# Using defaultdict - no need to check if key exists
regional_sales = defaultdict(list)
for region, amount in sales:
    regional_sales[region].append(amount)

print("Regional Sales:")
for region, amounts in regional_sales.items():
    total = sum(amounts)
    avg = total / len(amounts)
    print(f"{region}: ${total} total, ${avg:.2f} average")

### namedtuple: Readable Tuple with Named Fields

In [None]:
from collections import namedtuple
import statistics as stats

# Define a named tuple for data points
DataPoint = namedtuple('DataPoint', ['date', 'temperature', 'humidity'])

# Create data
weather_data = [
    DataPoint('2024-01-01', 22.5, 65),
    DataPoint('2024-01-02', 24.1, 68),
    DataPoint('2024-01-03', 23.8, 70),
    DataPoint('2024-01-04', 25.2, 72),
    DataPoint('2024-01-05', 24.7, 69)
]

# Access by name (much clearer than index!)
temperatures = [point.temperature for point in weather_data]
humidities = [point.humidity for point in weather_data]

print("Weather Analysis:")
print(f"Average temperature: {stats.mean(temperatures):.2f}°C")
print(f"Average humidity: {stats.mean(humidities):.1f}%")

# Much more readable than:
# temperatures = [point[1] for point in weather_data]

## Comprehensive Example: Data Pipeline

In [None]:
import random
import statistics as stats
import math
from datetime import datetime, timedelta
from collections import Counter, namedtuple
import platform

# Setup
random.seed(42)
Transaction = namedtuple('Transaction', ['date', 'amount', 'category'])

# Generate simulated transaction data
categories = ['Food', 'Transport', 'Entertainment', 'Shopping']
transactions = []
base_date = datetime(2024, 1, 1)

for i in range(50):
    date = base_date + timedelta(days=random.randint(0, 30))
    amount = random.gauss(50, 20)  # Normal distribution
    amount = max(10, amount)  # Minimum $10
    category = random.choice(categories)
    transactions.append(Transaction(date, amount, category))

# Analysis
print("Transaction Analysis Report")
print("=" * 50)
print(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Platform: {platform.system()} {platform.release()}")
print(f"Python: {platform.python_version()}")
print("=" * 50)

# Overall statistics
amounts = [t.amount for t in transactions]
print(f"\nOverall Statistics:")
print(f"Total transactions: {len(transactions)}")
print(f"Total amount: ${sum(amounts):.2f}")
print(f"Mean: ${stats.mean(amounts):.2f}")
print(f"Median: ${stats.median(amounts):.2f}")
print(f"Std Dev: ${stats.stdev(amounts):.2f}")

# Category breakdown
category_counts = Counter(t.category for t in transactions)
print(f"\nCategory Breakdown:")
for category, count in category_counts.most_common():
    cat_amounts = [t.amount for t in transactions if t.category == category]
    cat_total = sum(cat_amounts)
    cat_avg = stats.mean(cat_amounts)
    print(f"  {category}: {count} transactions, ${cat_total:.2f} total, ${cat_avg:.2f} avg")

# Date range
dates = [t.date for t in transactions]
date_range = max(dates) - min(dates)
print(f"\nDate Range:")
print(f"From: {min(dates).strftime('%Y-%m-%d')}")
print(f"To: {max(dates).strftime('%Y-%m-%d')}")
print(f"Span: {date_range.days} days")

# Extreme values
max_transaction = max(transactions, key=lambda t: t.amount)
min_transaction = min(transactions, key=lambda t: t.amount)
print(f"\nExtreme Values:")
print(f"Largest: ${max_transaction.amount:.2f} ({max_transaction.category})")
print(f"Smallest: ${min_transaction.amount:.2f} ({min_transaction.category})")

## Exercises

### Exercise 1: Comprehensive Statistics

Given this dataset of product prices, calculate:
1. All central tendency measures (mean, median, mode)
2. All dispersion measures (variance, std dev, range, IQR)
3. The coefficient of variation (CV = std_dev / mean * 100)
4. Identify outliers using the IQR method (values < Q1 - 1.5*IQR or > Q3 + 1.5*IQR)

Dataset: `[45, 50, 52, 48, 51, 49, 47, 53, 46, 50, 120, 48, 51, 49, 52]`

In [None]:
# Your solution here

### Exercise 2: Random Sampling Simulation

Simulate a quality control process:
1. Generate 1000 product weights with a normal distribution (mean=500g, std_dev=10g)
2. Randomly sample 30 products
3. Calculate the sample mean and compare it to the population mean
4. Repeat this sampling 100 times and plot how often the sample mean is within 5g of the true mean
5. Use `random.seed(42)` for reproducibility

In [None]:
# Your solution here

### Exercise 3: Date-Based Analysis

You have employee work logs:
```python
logs = [
    "2024-01-15 09:00",
    "2024-01-15 17:30",
    "2024-01-16 08:45",
    "2024-01-16 17:15",
    "2024-01-17 09:15",
    "2024-01-17 18:00"
]
```

Calculate:
1. Hours worked each day (assume logs are in/out pairs)
2. Total hours worked
3. Average hours per day
4. Which day had the longest work hours?

In [None]:
# Your solution here

### Exercise 4: Collections Power

Analyze this customer purchase data:
```python
purchases = [
    ('Alice', 'Book', 15.99),
    ('Bob', 'Electronics', 299.99),
    ('Alice', 'Book', 12.50),
    ('Charlie', 'Clothing', 45.00),
    ('Bob', 'Electronics', 450.00),
    ('Alice', 'Clothing', 60.00),
]
```

Using `Counter`, `defaultdict`, and `namedtuple`:
1. Find the most active customer (most purchases)
2. Calculate total spending per customer
3. Find the most popular category
4. Calculate average purchase amount per category

In [None]:
# Your solution here

## Key Takeaways

| Module | Primary Use | Key Functions |
|--------|-------------|---------------|
| **math** | Mathematical operations | `sqrt()`, `log()`, `ceil()`, `floor()`, `pi`, `e` |
| **random** | Random number generation | `random()`, `randint()`, `choice()`, `sample()`, `seed()` |
| **statistics** | Statistical analysis | `mean()`, `median()`, `stdev()`, `variance()`, `quantiles()` |
| **datetime** | Date/time operations | `datetime.now()`, `timedelta()`, `strftime()`, `strptime()` |
| **platform** | System information | `system()`, `python_version()`, `processor()` |
| **collections** | Specialized containers | `Counter()`, `defaultdict()`, `namedtuple()` |

**Important Concepts:**
- Use `random.seed()` for reproducible random numbers
- Understand population vs sample statistics
- `datetime` arithmetic with `timedelta`
- `Counter` for frequency analysis
- `namedtuple` for readable data structures

## What's Next?

In the next notebook, we'll learn how to **create your own modules**: organizing code into reusable files, understanding `__name__` and `__main__`, and best practices for module design.