## Overview

Python's **Standard Library** is a collection of modules that come pre-installed with Python. These modules provide functionality for common tasks without requiring external package installation.

In this notebook, we'll explore modules particularly useful for data analysis:

1. **`math`** - Mathematical functions and constants
2. **`random`** - Pseudo-random number generation
3. **`statistics`** - Statistical functions
4. **`datetime`** - Date and time manipulation
5. **`platform`** - System information
6. **`collections`** - Specialized container datatypes

Understanding these modules is essential before moving to advanced libraries like NumPy and Pandas.

## 1. The `math` Module

### Mathematical Functions and Constants

In [1]:
import math

# Explore what's available
public_items = [item for item in dir(math) if not item.startswith('_')]
print(f"Number of public items in math: {len(public_items)}")
print(f"\nFirst 15 items: {public_items[:15]}")

Number of public items in math: 62

First 15 items: ['acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'cbrt', 'ceil', 'comb', 'copysign', 'cos', 'cosh', 'degrees', 'dist']


### Common Constants

In [2]:
import math

print("Mathematical Constants:")
print(f"π (pi): {math.pi}")
print(f"e (Euler's number): {math.e}")
print(f"τ (tau = 2π): {math.tau}")
print(f"∞ (infinity): {math.inf}")
print(f"NaN (Not a Number): {math.nan}")

Mathematical Constants:
π (pi): 3.141592653589793
e (Euler's number): 2.718281828459045
τ (tau = 2π): 6.283185307179586
∞ (infinity): inf
NaN (Not a Number): nan


### Basic Operations

In [3]:
import math

# Power and roots
print("Power and Roots:")
print(f"sqrt(25) = {math.sqrt(25)}")
print(f"pow(2, 3) = {math.pow(2, 3)}")
print(f"exp(1) = {math.exp(1)}")

# Logarithms
print("\nLogarithms:")
print(f"log(e) = {math.log(math.e)}")
print(f"log10(100) = {math.log10(100)}")
print(f"log2(8) = {math.log2(8)}")

# Rounding
print("\nRounding:")
print(f"ceil(4.3) = {math.ceil(4.3)}")
print(f"floor(4.7) = {math.floor(4.7)}")
print(f"trunc(4.7) = {math.trunc(4.7)}")

Power and Roots:
sqrt(25) = 5.0
pow(2, 3) = 8.0
exp(1) = 2.718281828459045

Logarithms:
log(e) = 1.0
log10(100) = 2.0
log2(8) = 3.0

Rounding:
ceil(4.3) = 5
floor(4.7) = 4
trunc(4.7) = 4


### Trigonometric Functions

In [4]:
import math

# Angles in radians
angle_rad = math.pi / 4  # 45 degrees

print(f"Angle: {angle_rad} radians = {math.degrees(angle_rad)} degrees")
print(f"\nsin(π/4) = {math.sin(angle_rad):.4f}")
print(f"cos(π/4) = {math.cos(angle_rad):.4f}")
print(f"tan(π/4) = {math.tan(angle_rad):.4f}")

# Convert degrees to radians
angle_deg = 60
angle_rad = math.radians(angle_deg)
print(f"\nsin({angle_deg}°) = {math.sin(angle_rad):.4f}")

Angle: 0.7853981633974483 radians = 45.0 degrees

sin(π/4) = 0.7071
cos(π/4) = 0.7071
tan(π/4) = 1.0000

sin(60°) = 0.8660


### Data Analysis Example: Distance Calculation

In [5]:
import math

def euclidean_distance(point1, point2):
    """Calculate Euclidean distance between two points."""
    x1, y1 = point1
    x2, y2 = point2
    return math.sqrt((x2 - x1)**2 + (y2 - y1)**2)

# Store locations (coordinates)
warehouse = (0, 0)
store_a = (3, 4)
store_b = (6, 8)
store_c = (5, 12)

# Calculate distances
distances = {
    'Store A': euclidean_distance(warehouse, store_a),
    'Store B': euclidean_distance(warehouse, store_b),
    'Store C': euclidean_distance(warehouse, store_c)
}

print("Distances from warehouse:")
for store, dist in distances.items():
    print(f"{store}: {dist:.2f} km")

# Find closest store
closest = min(distances, key=distances.get)
print(f"\nClosest store: {closest} ({distances[closest]:.2f} km)")

Distances from warehouse:
Store A: 5.00 km
Store B: 10.00 km
Store C: 13.00 km

Closest store: Store A (5.00 km)


## 2. The `random` Module

### Understanding Pseudo-Randomness

Computers cannot generate truly random numbers. They use **algorithms** that produce sequences of numbers that appear random but are actually deterministic.

The sequence is determined by a **seed** value. Same seed → same sequence.

In [6]:
import random

# Without seed - different each time
print("Random numbers without seed:")
for i in range(5):
    print(random.random())

Random numbers without seed:
0.656996972656612
0.12944255913777258
0.5059116365763967
0.5758586667095938
0.4652624598660635


In [7]:
import random

# With seed - reproducible
print("First run with seed(42):")
random.seed(42)
for i in range(5):
    print(random.random())

print("\nSecond run with same seed(42):")
random.seed(42)
for i in range(5):
    print(random.random())

print("\nNotice: Both sequences are identical!")

First run with seed(42):
0.6394267984578837
0.025010755222666936
0.27502931836911926
0.22321073814882275
0.7364712141640124

Second run with same seed(42):
0.6394267984578837
0.025010755222666936
0.27502931836911926
0.22321073814882275
0.7364712141640124

Notice: Both sequences are identical!


### Generating Random Numbers

In [8]:
import random

# Float between 0.0 and 1.0
print(f"random.random(): {random.random()}")

# Float in a range
print(f"random.uniform(10, 20): {random.uniform(10, 20)}")

# Integer in a range [a, b] (inclusive)
print(f"random.randint(1, 10): {random.randint(1, 10)}")

# Integer from range (like range() function)
print(f"random.randrange(0, 100, 5): {random.randrange(0, 100, 5)}")

random.random(): 0.6766994874229113
random.uniform(10, 20): 18.921795677048454
random.randint(1, 10): 2
random.randrange(0, 100, 5): 90


### Random Selections

In [9]:
import random

colors = ['red', 'green', 'blue', 'yellow', 'purple']

# Choose one item
print(f"random.choice(): {random.choice(colors)}")

# Choose multiple items (with replacement)
print(f"random.choices(k=3): {random.choices(colors, k=3)}")

# Choose multiple unique items (without replacement)
print(f"random.sample(k=3): {random.sample(colors, k=3)}")

# Shuffle in place
deck = list(range(1, 11))
print(f"\nOriginal: {deck}")
random.shuffle(deck)
print(f"Shuffled: {deck}")

random.choice(): yellow
random.choices(k=3): ['red', 'red', 'green']
random.sample(k=3): ['purple', 'red', 'blue']

Original: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Shuffled: [8, 6, 1, 3, 5, 10, 2, 7, 9, 4]


### Random Distributions

In [10]:
import random

# Gaussian (normal) distribution
mean = 100
std_dev = 15
print("Normal distribution samples (IQ scores):")
for i in range(5):
    iq = random.gauss(mean, std_dev)
    print(f"  {iq:.1f}")

# Exponential distribution
lambda_val = 1/5  # mean = 5
print("\nExponential distribution samples (time between events):")
for i in range(5):
    time = random.expovariate(lambda_val)
    print(f"  {time:.2f} minutes")

Normal distribution samples (IQ scores):
  95.6
  87.0
  121.1
  131.2
  96.6

Exponential distribution samples (time between events):
  0.51 minutes
  9.40 minutes
  4.63 minutes
  8.23 minutes
  6.54 minutes


### Data Analysis Example: Simulation

In [11]:
import random

# Simulate dice rolling to verify probability
random.seed(42)  # For reproducibility

def roll_dice(n_rolls=1000):
    """Simulate rolling two dice and count sum occurrences."""
    results = {i: 0 for i in range(2, 13)}  # Possible sums: 2-12
    
    for _ in range(n_rolls):
        die1 = random.randint(1, 6)
        die2 = random.randint(1, 6)
        total = die1 + die2
        results[total] += 1
    
    return results

# Run simulation
rolls = 10000
results = roll_dice(rolls)

print(f"Dice Rolling Simulation ({rolls} rolls)")
print("=" * 40)
for total, count in results.items():
    probability = (count / rolls) * 100
    bar = '█' * int(probability * 2)
    print(f"Sum {total:2d}: {bar} {probability:.1f}%")

print("\nNotice: 7 is the most common sum (16.7% theoretical)")

Dice Rolling Simulation (10000 rolls)
Sum  2: █████ 2.7%
Sum  3: ██████████ 5.4%
Sum  4: ████████████████ 8.2%
Sum  5: █████████████████████ 10.9%
Sum  6: ███████████████████████████ 13.6%
Sum  7: ██████████████████████████████████ 17.0%
Sum  8: ███████████████████████████ 13.9%
Sum  9: ██████████████████████ 11.2%
Sum 10: ████████████████ 8.4%
Sum 11: ███████████ 5.6%
Sum 12: ██████ 3.1%

Notice: 7 is the most common sum (16.7% theoretical)


## 3. The `statistics` Module

### Central Tendency Measures

In [12]:
import statistics as stats

# Test scores dataset
scores = [78, 85, 92, 88, 76, 95, 89, 84, 91, 87, 82, 90]

print("Central Tendency Measures:")
print(f"Mean: {stats.mean(scores):.2f}")
print(f"Median: {stats.median(scores):.2f}")
print(f"Mode: {stats.mode([1, 2, 2, 3, 3, 3, 4])}")

# When no mode exists, use multimode
print(f"\nMultiple modes: {stats.multimode([1, 1, 2, 2, 3])}")

Central Tendency Measures:
Mean: 86.42
Median: 87.50
Mode: 3

Multiple modes: [1, 2]


### Dispersion Measures

In [13]:
import statistics as stats

scores = [78, 85, 92, 88, 76, 95, 89, 84, 91, 87, 82, 90]

print("Dispersion Measures:")
print(f"Variance (population): {stats.pvariance(scores):.2f}")
print(f"Variance (sample): {stats.variance(scores):.2f}")
print(f"Std Dev (population): {stats.pstdev(scores):.2f}")
print(f"Std Dev (sample): {stats.stdev(scores):.2f}")

# Range (manual calculation)
data_range = max(scores) - min(scores)
print(f"Range: {data_range}")

Dispersion Measures:
Variance (population): 29.58
Variance (sample): 32.27
Std Dev (population): 5.44
Std Dev (sample): 5.68
Range: 19


### Understanding Population vs Sample

**Population**: All possible data points (use `pvariance`, `pstdev`)

**Sample**: Subset of population (use `variance`, `stdev`)

Sample statistics use `n-1` (Bessel's correction) for unbiased estimation.

In [14]:
import statistics as stats

data = [10, 20, 30, 40, 50]

print("Comparing Population vs Sample statistics:")
print(f"\nPopulation variance (÷n): {stats.pvariance(data):.2f}")
print(f"Sample variance (÷n-1): {stats.variance(data):.2f}")
print(f"\nPopulation std dev: {stats.pstdev(data):.2f}")
print(f"Sample std dev: {stats.stdev(data):.2f}")
print("\nSample estimates are larger (more conservative)")

Comparing Population vs Sample statistics:

Population variance (÷n): 200.00
Sample variance (÷n-1): 250.00

Population std dev: 14.14
Sample std dev: 15.81

Sample estimates are larger (more conservative)


### Quantiles and Quartiles

In [15]:
import statistics as stats

data = [15, 20, 35, 40, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100]

print("Quantile Analysis:")
print(f"Median (50th percentile): {stats.median(data)}")
print(f"\nQuartiles:")
print(f"Q1 (25th percentile): {stats.quantiles(data, n=4)[0]}")
print(f"Q2 (50th percentile): {stats.quantiles(data, n=4)[1]}")
print(f"Q3 (75th percentile): {stats.quantiles(data, n=4)[2]}")

# Deciles (10 equal parts)
deciles = stats.quantiles(data, n=10)
print(f"\n10th percentile: {deciles[0]}")
print(f"90th percentile: {deciles[-1]}")

Quantile Analysis:
Median (50th percentile): 65

Quartiles:
Q1 (25th percentile): 40.0
Q2 (50th percentile): 65.0
Q3 (75th percentile): 85.0

10th percentile: 18.0
90th percentile: 97.0


### Data Analysis Example: Grade Analysis

In [16]:
import statistics as stats

# Student grades from two classes
class_a = [78, 85, 92, 88, 76, 95, 89, 84, 91, 87, 82, 90]
class_b = [65, 70, 98, 55, 88, 92, 60, 95, 75, 58, 85, 90]

def analyze_grades(grades, class_name):
    """Comprehensive grade analysis."""
    print(f"\n{class_name} Analysis:")
    print("=" * 40)
    print(f"Mean: {stats.mean(grades):.2f}")
    print(f"Median: {stats.median(grades):.2f}")
    print(f"Std Dev: {stats.stdev(grades):.2f}")
    print(f"Range: {max(grades) - min(grades)}")
    
    # Quartiles
    q = stats.quantiles(grades, n=4)
    print(f"\nQuartiles:")
    print(f"  Q1 (25%): {q[0]:.2f}")
    print(f"  Q2 (50%): {q[1]:.2f}")
    print(f"  Q3 (75%): {q[2]:.2f}")
    print(f"  IQR: {q[2] - q[0]:.2f}")

analyze_grades(class_a, "Class A")
analyze_grades(class_b, "Class B")

print("\n" + "=" * 40)
print("Interpretation:")
print(f"Class A: More consistent (lower std dev)")
print(f"Class B: Higher variability (some very high and low grades)")


Class A Analysis:
Mean: 86.42
Median: 87.50
Std Dev: 5.68
Range: 19

Quartiles:
  Q1 (25%): 82.50
  Q2 (50%): 87.50
  Q3 (75%): 90.75
  IQR: 8.25

Class B Analysis:
Mean: 77.58
Median: 80.00
Std Dev: 15.58
Range: 43

Quartiles:
  Q1 (25%): 61.25
  Q2 (50%): 80.00
  Q3 (75%): 91.50
  IQR: 30.25

Interpretation:
Class A: More consistent (lower std dev)
Class B: Higher variability (some very high and low grades)


## 4. The `datetime` Module

### Working with Dates and Times

In [17]:
from datetime import datetime, date, time, timedelta

# Current date and time
now = datetime.now()
print(f"Current datetime: {now}")
print(f"Current date: {date.today()}")

# Creating specific dates
birthday = date(1990, 5, 15)
print(f"\nBirthday: {birthday}")

# Creating specific times
meeting = time(14, 30, 0)
print(f"Meeting time: {meeting}")

# Creating specific datetime
event = datetime(2024, 12, 31, 23, 59, 59)
print(f"New Year's Eve: {event}")

Current datetime: 2025-12-18 16:54:32.911121
Current date: 2025-12-18

Birthday: 1990-05-15
Meeting time: 14:30:00
New Year's Eve: 2024-12-31 23:59:59


### Date Arithmetic

In [18]:
from datetime import datetime, timedelta

today = datetime.now()
print(f"Today: {today.strftime('%Y-%m-%d %H:%M:%S')}")

# Add/subtract time
tomorrow = today + timedelta(days=1)
print(f"Tomorrow: {tomorrow.strftime('%Y-%m-%d')}")

next_week = today + timedelta(weeks=1)
print(f"Next week: {next_week.strftime('%Y-%m-%d')}")

three_hours_ago = today - timedelta(hours=3)
print(f"3 hours ago: {three_hours_ago.strftime('%H:%M:%S')}")

# Calculate difference
new_year = datetime(2025, 1, 1)
time_left = new_year - today
print(f"\nDays until 2025: {time_left.days}")

Today: 2025-12-18 16:54:32
Tomorrow: 2025-12-19
Next week: 2025-12-25
3 hours ago: 13:54:32

Days until 2025: -352


### Formatting Dates

In [19]:
from datetime import datetime

now = datetime.now()

# Different formats using strftime
print("Date Formatting Examples:")
print(f"ISO format: {now.isoformat()}")
print(f"US format: {now.strftime('%m/%d/%Y')}")
print(f"European format: {now.strftime('%d/%m/%Y')}")
print(f"Long format: {now.strftime('%B %d, %Y')}")
print(f"With time: {now.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"12-hour format: {now.strftime('%I:%M:%S %p')}")
print(f"Day of week: {now.strftime('%A')}")

Date Formatting Examples:
ISO format: 2025-12-18T16:54:32.929736
US format: 12/18/2025
European format: 18/12/2025
Long format: December 18, 2025
With time: 2025-12-18 16:54:32
12-hour format: 04:54:32 PM
Day of week: Thursday


### Parsing Date Strings

In [20]:
from datetime import datetime

# Parse strings into datetime objects
date_str1 = "2024-03-15"
date1 = datetime.strptime(date_str1, "%Y-%m-%d")
print(f"Parsed: {date1}")

date_str2 = "March 15, 2024"
date2 = datetime.strptime(date_str2, "%B %d, %Y")
print(f"Parsed: {date2}")

date_str3 = "15/03/2024 14:30"
date3 = datetime.strptime(date_str3, "%d/%m/%Y %H:%M")
print(f"Parsed: {date3}")

Parsed: 2024-03-15 00:00:00
Parsed: 2024-03-15 00:00:00
Parsed: 2024-03-15 14:30:00


### Data Analysis Example: Sales Date Analysis

In [21]:
from datetime import datetime, timedelta
import random

# Simulate sales data with dates
random.seed(42)
base_date = datetime(2024, 1, 1)

sales_data = []
for i in range(30):
    sale_date = base_date + timedelta(days=i)
    amount = random.uniform(100, 500)
    sales_data.append({
        'date': sale_date,
        'amount': amount
    })

# Analyze by week
weekly_sales = {}
for sale in sales_data:
    week_num = sale['date'].isocalendar()[1]  # Get week number
    if week_num not in weekly_sales:
        weekly_sales[week_num] = []
    weekly_sales[week_num].append(sale['amount'])

print("Weekly Sales Analysis (January 2024)")
print("=" * 40)
for week, amounts in sorted(weekly_sales.items()):
    total = sum(amounts)
    avg = total / len(amounts)
    print(f"Week {week}: ${total:.2f} total, ${avg:.2f} average")

# Find best and worst days
best_day = max(sales_data, key=lambda x: x['amount'])
worst_day = min(sales_data, key=lambda x: x['amount'])

print(f"\nBest day: {best_day['date'].strftime('%B %d')} (${best_day['amount']:.2f})")
print(f"Worst day: {worst_day['date'].strftime('%B %d')} (${worst_day['amount']:.2f})")

Weekly Sales Analysis (January 2024)
Week 1: $2087.21 total, $298.17 average
Week 2: $1295.21 total, $185.03 average
Week 3: $2150.51 total, $307.22 average
Week 4: $1770.86 total, $252.98 average
Week 5: $780.49 total, $390.24 average

Best day: January 25 ($482.89)
Worst day: January 20 ($102.60)


## 5. The `platform` Module

### System Information

In [22]:
import platform

print("System Information:")
print("=" * 40)
print(f"System: {platform.system()}")
print(f"Release: {platform.release()}")
print(f"Version: {platform.version()}")
print(f"Machine: {platform.machine()}")
print(f"Processor: {platform.processor()}")
print(f"\nPython Information:")
print(f"Implementation: {platform.python_implementation()}")
print(f"Version: {platform.python_version()}")
print(f"Compiler: {platform.python_compiler()}")

System Information:
System: Windows
Release: 11
Version: 10.0.26100
Machine: AMD64
Processor: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel

Python Information:
Implementation: CPython
Version: 3.13.9
Compiler: MSC v.1944 64 bit (AMD64)


### Practical Use: Environment Logging

In [23]:
import platform
from datetime import datetime

def log_environment():
    """Log analysis environment for reproducibility."""
    print("Analysis Environment Report")
    print("=" * 50)
    print(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"Python: {platform.python_version()}")
    print(f"OS: {platform.system()} {platform.release()}")
    print(f"Architecture: {platform.machine()}")
    print("=" * 50)

log_environment()

Analysis Environment Report
Date: 2025-12-18 16:54:32
Python: 3.13.9
OS: Windows 11
Architecture: AMD64


## Exercises

### Exercise 1: Comprehensive Statistics

Given this dataset of product prices, calculate:
1. All central tendency measures (mean, median, mode)
2. All dispersion measures (variance, std dev, range, IQR)
3. The coefficient of variation (CV = std_dev / mean * 100)
4. Identify outliers using the IQR method (values < Q1 - 1.5*IQR or > Q3 + 1.5*IQR)

Dataset: `[45, 50, 52, 48, 51, 49, 47, 53, 46, 50, 120, 48, 51, 49, 52]`

In [28]:
# Your solution here

### Exercise 2: Random Sampling Simulation

Simulate a quality control process:
1. Generate 1000 product weights with a normal distribution (mean=500g, std_dev=10g)
2. Randomly sample 30 products
3. Calculate the sample mean and compare it to the population mean
4. Repeat this sampling 100 times and plot how often the sample mean is within 5g of the true mean
5. Use `random.seed(42)` for reproducibility

In [29]:
# Your solution here

### Exercise 3: Date-Based Analysis

You have employee work logs:
```python
logs = [
    "2024-01-15 09:00",
    "2024-01-15 17:30",
    "2024-01-16 08:45",
    "2024-01-16 17:15",
    "2024-01-17 09:15",
    "2024-01-17 18:00"
]
```

Calculate:
1. Hours worked each day (assume logs are in/out pairs)
2. Total hours worked
3. Average hours per day
4. Which day had the longest work hours?

In [30]:
# Your solution here

## Key Takeaways

| Module | Primary Use | Key Functions |
|--------|-------------|---------------|
| **math** | Mathematical operations | `sqrt()`, `log()`, `ceil()`, `floor()`, `pi`, `e` |
| **random** | Random number generation | `random()`, `randint()`, `choice()`, `sample()`, `seed()` |
| **statistics** | Statistical analysis | `mean()`, `median()`, `stdev()`, `variance()`, `quantiles()` |
| **datetime** | Date/time operations | `datetime.now()`, `timedelta()`, `strftime()`, `strptime()` |
| **platform** | System information | `system()`, `python_version()`, `processor()` |

**Important Concepts:**
- Use `random.seed()` for reproducible random numbers
- Understand population vs sample statistics
- `datetime` arithmetic with `timedelta`
- `Counter` for frequency analysis
- `namedtuple` for readable data structures