# DTSA 5001: Probability Theory: Foundation for Data Science

## Course Overview and Quick Reference Guide

This notebook serves as a comprehensive overview and quick reference guide for the key concepts, techniques, and implementations covered in this course.

### Course Objectives
- Understanding fundamental probability concepts
- Applying probability theory to data science problems
- Working with probability distributions
- Implementing probabilistic models

In [None]:
# Import common libraries
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns

# Display settings
%matplotlib inline
plt.style.use('seaborn')
np.random.seed(42)

## Week 1: Introduction to Probability

### Key Concepts
- 

### Important Terms
- 

### Code Examples

In [None]:
def calculate_probability(favorable_outcomes: int, total_outcomes: int) -> float:
    """Calculate basic probability"""
    return favorable_outcomes / total_outcomes

def simulate_coin_flips(n: int) -> dict:
    """Simulate n coin flips and return probabilities"""
    flips = np.random.choice(['H', 'T'], size=n)
    return {
        'heads': np.mean(flips == 'H'),
        'tails': np.mean(flips == 'T')
    }

## Week 2: Discrete Probability Distributions

### Key Concepts
- 

### Important Distributions
- 

### Code Examples

In [None]:
def plot_binomial_distribution(n: int, p: float):
    """Plot binomial distribution"""
    x = np.arange(0, n+1)
    pmf = stats.binom.pmf(x, n, p)
    
    plt.figure(figsize=(10, 6))
    plt.bar(x, pmf)
    plt.title(f'Binomial Distribution (n={n}, p={p})')
    plt.xlabel('Number of Successes')
    plt.ylabel('Probability')
    plt.show()

## Week 3: Continuous Probability Distributions

### Key Concepts
- 

### Important Distributions
- 

### Code Examples

In [None]:
def plot_normal_distribution(mu: float, sigma: float):
    """Plot normal distribution"""
    x = np.linspace(mu - 4*sigma, mu + 4*sigma, 100)
    pdf = stats.norm.pdf(x, mu, sigma)
    
    plt.figure(figsize=(10, 6))
    plt.plot(x, pdf)
    plt.title(f'Normal Distribution (μ={mu}, σ={sigma})')
    plt.xlabel('Value')
    plt.ylabel('Probability Density')
    plt.show()

## Week 4: Joint Probability and Independence

### Key Concepts
- 

### Important Methods
- 

### Code Examples

In [None]:
def calculate_joint_probability(prob_a: float, prob_b: float, independent: bool = True) -> float:
    """Calculate joint probability"""
    if independent:
        return prob_a * prob_b
    else:
        # Need conditional probability for dependent events
        return None

## Week 5: Conditional Probability

### Key Concepts
- 

### Important Formulas
- 

### Code Examples

In [None]:
def bayes_theorem(prior: float, likelihood: float, evidence: float) -> float:
    """Calculate posterior probability using Bayes' theorem"""
    return (likelihood * prior) / evidence

## Week 6: Random Variables

### Key Concepts
- 

### Important Properties
- 

### Code Examples

In [None]:
def calculate_expectation(values: np.ndarray, probabilities: np.ndarray) -> float:
    """Calculate expected value of a random variable"""
    return np.sum(values * probabilities)

def calculate_variance(values: np.ndarray, probabilities: np.ndarray) -> float:
    """Calculate variance of a random variable"""
    exp = calculate_expectation(values, probabilities)
    return calculate_expectation((values - exp)**2, probabilities)

## Week 7: Law of Large Numbers and CLT

### Key Concepts
- 

### Important Theorems
- 

### Code Examples

In [None]:
def demonstrate_law_of_large_numbers(n_samples: List[int], true_mean: float):
    """Demonstrate the law of large numbers"""
    sample_means = [np.mean(np.random.normal(true_mean, 1, n)) for n in n_samples]
    
    plt.figure(figsize=(10, 6))
    plt.plot(n_samples, sample_means)
    plt.axhline(y=true_mean, color='r', linestyle='--')
    plt.xscale('log')
    plt.title('Law of Large Numbers Demonstration')
    plt.xlabel('Sample Size')
    plt.ylabel('Sample Mean')
    plt.show()

## Week 8: Applications in Data Science

### Key Concepts
- 

### Important Applications
- 

### Code Examples

In [None]:
def bootstrap_confidence_interval(data: np.ndarray, statistic: callable, n_bootstrap: int = 1000, confidence: float = 0.95):
    """Calculate bootstrap confidence interval"""
    bootstrap_stats = [statistic(np.random.choice(data, size=len(data), replace=True)) 
                      for _ in range(n_bootstrap)]
    
    lower = np.percentile(bootstrap_stats, (1 - confidence) * 100 / 2)
    upper = np.percentile(bootstrap_stats, (1 + confidence) * 100 / 2)
    
    return lower, upper

## Additional Resources and References

### Useful Libraries
- NumPy: Numerical computations
- SciPy: Statistical functions
- Matplotlib: Visualization
- Seaborn: Statistical visualization

### External Links
- Course materials
- Probability theory resources
- Practice problems

### Personal Notes
- Key formulas
- Important theorems
- Common distributions