# Section 1: Introduction and Fundamentals

#### PyData London 2025 - Bayesian Time Series Analysis with PyMC

---

## Motivation for Bayesian Time Series Analysis

Traditional time series analysis has served us well for decades, but it suffers from a fundamental limitation: **uncertainty is often treated as an afterthought**. When we build an ARIMA model or apply exponential smoothing, we typically get point forecasts with some measure of "confidence intervals" that are often based on strong distributional assumptions and may not reflect the true uncertainty in our predictions.

**Bayesian time series analysis fundamentally changes this perspective** by treating uncertainty as a first-class citizen throughout the entire modeling process. Instead of obtaining single-valued predictions, we get complete **probability distributions** over all possible future values, conditioned on our data and our modeling assumptions.

### Handling Uncertainty

Consider predicting next month's sales. A classical approach might tell us "we expect 1,000 units ± 100." But what does this really mean? Are we 90% confident? 95%? What's the shape of this uncertainty—is it symmetric or skewed?

A Bayesian approach instead tells us: "There's a 20% chance sales will be below 950 units, a 50% chance they'll be between 950 and 1,050, and a 30% chance they'll exceed 1,050." This **probabilistic language** is much more natural for decision-making under uncertainty.

### Incorporating Prior Knowledge

Time series data often come with rich domain knowledge. We might know that:
- Sales are typically higher in December due to holiday shopping
- Stock volatility tends to cluster (high volatility periods are followed by more high volatility)
- Economic indicators don't change dramatically from day to day

**Bayesian methods provide a principled way to incorporate this knowledge** through prior distributions, rather than treating each dataset as if we know nothing about the world.

### Flexibility for Complex Patterns

Bayesian approaches allow us to naturally handle:
- **Missing data**: Rather than imputing or interpolating, we can model missingness directly
- **Irregular spacing**: No need to artificially regularize timestamps
- **Multiple seasonality**: Hierarchical models naturally accommodate daily, weekly, and yearly patterns
- **Changing relationships**: Time-varying parameters are natural in the Bayesian framework

In [1]:
# Import necessary libraries for Section 1
import numpy as np
import polars as pl
import matplotlib.pyplot as plt
import pymc as pm
import arviz as az
import warnings

# Configure plotting
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 100
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)
RANDOM_SEED = 42

print("📊 Libraries loaded successfully!")
print("PyMC version:", pm.__version__)
print("ArviZ version:", az.__version__)
print("NumPy version:", np.__version__)
print("Polars version:", pl.__version__)

📊 Libraries loaded successfully!
PyMC version: 5.22.0
ArviZ version: 0.21.0
NumPy version: 2.2.5
Polars version: 1.27.1


## Key Characteristics of Time Series Data

Before diving into Bayesian methods, let's establish a solid foundation in time series fundamentals. Understanding these characteristics is crucial for building appropriate models.

### Trend, Seasonality, and Noise Components

Most time series can be decomposed into several interpretable components:

1. **Trend ($T_t$)**: The long-term direction of the data
2. **Seasonality ($S_t$)**: Regular, predictable patterns that repeat over fixed periods
3. **Cyclical patterns ($C_t$)**: Longer-term fluctuations without fixed periods
4. **Irregular/Noise ($\epsilon_t$)**: Random fluctuations that cannot be explained by the other components

We can write this **decomposition** as either:

**Additive**: $y_t = T_t + S_t + C_t + \epsilon_t$

**Multiplicative**: $y_t = T_t \times S_t \times C_t \times \epsilon_t$

### Stationarity and Autocorrelation

The defining characteristic of time series data is **temporal dependence**—observations that are close in time are typically more similar than observations that are far apart. This violates the independence assumption that underlies most statistical methods, requiring specialized approaches.

**Mathematically**, we can express this as:

$$\text{Corr}(y_t, y_{t+h}) \neq 0$$

for some lag $h > 0$. This **autocorrelation** is what makes time series prediction possible—if future values were completely unrelated to past values, prediction would be impossible.

In [2]:
# Load and explore the births dataset - a classic time series example
# Handle null values in the data
births_data = pl.read_csv('../data/births.csv', null_values=['null', 'NA', '', 'NULL'])

# Filter out rows with null days if any exist
births_data = births_data.filter(pl.col('day').is_not_null())

# Aggregate to monthly data
monthly_births = (births_data
    .group_by(['year', 'month'])
    .agg(pl.col('births').sum())
    .sort(['year', 'month'])
)

# Create valid dates using the first day of each month
monthly_births = monthly_births.with_columns([
    pl.date(pl.col('year'), pl.col('month'), 1).alias('date')
])

# Focus on a 20-year period for clarity (1970-1990)
births_subset = (monthly_births
    .filter((pl.col('year') >= 1970) & (pl.col('year') <= 1990))
    .with_row_index('index')
)

print(f"📈 Births Dataset Overview:")
print(f"   • Total months: {births_subset.height}")
print(f"   • Date range: {births_subset['year'].min()} to {births_subset['year'].max()}")
print(f"   • Monthly births range: {births_subset['births'].min():,} to {births_subset['births'].max():,}")
print(f"   • Average monthly births: {births_subset['births'].mean():.0f}")
print(f"   • Standard deviation: {births_subset['births'].std():.0f}")

# Display the first few observations
print(f"\n📋 First few observations:")
print(births_subset.select(['year', 'month', 'births']).head(10))

📈 Births Dataset Overview:
   • Total months: 228
   • Date range: 1970 to 1988
   • Monthly births range: 237,302 to 354,599
   • Average monthly births: 293389
   • Standard deviation: 25082

📋 First few observations:
shape: (10, 3)
┌──────┬───────┬────────┐
│ year ┆ month ┆ births │
│ ---  ┆ ---   ┆ ---    │
│ i64  ┆ i64   ┆ i64    │
╞══════╪═══════╪════════╡
│ 1970 ┆ 1     ┆ 302278 │
│ 1970 ┆ 2     ┆ 281488 │
│ 1970 ┆ 3     ┆ 307448 │
│ 1970 ┆ 4     ┆ 287090 │
│ 1970 ┆ 5     ┆ 298140 │
│ 1970 ┆ 6     ┆ 303378 │
│ 1970 ┆ 7     ┆ 330452 │
│ 1970 ┆ 8     ┆ 331326 │
│ 1970 ┆ 9     ┆ 332496 │
│ 1970 ┆ 10    ┆ 324422 │
└──────┴───────┴────────┘


## Data Preprocessing Techniques

Before building Bayesian models, proper data preprocessing is essential. This section demonstrates key preprocessing techniques that prepare time series data for effective modeling.

### Why Preprocessing Matters

Time series preprocessing serves several critical purposes:

1. **Numerical Stability**: Standardization helps MCMC samplers converge more reliably
2. **Prior Specification**: Normalized data makes it easier to specify reasonable priors
3. **Model Interpretation**: Standardized coefficients are easier to interpret and compare
4. **Computational Efficiency**: Well-scaled data leads to faster sampling

### Common Preprocessing Techniques

1. **Standardization**: $(x - \mu) / \sigma$ - Centers data at 0 with unit variance
2. **Min-Max Normalization**: $(x - \min) / (\max - \min)$ - Scales to [0,1] range
3. **Log Transformation**: $\log(x)$ - Stabilizes variance and handles exponential growth
4. **Differencing**: $x_t - x_{t-1}$ - Removes trends and induces stationarity
5. **Seasonal Decomposition**: Separates trend, seasonal, and residual components

In [3]:
# Demonstrate different preprocessing transformations
original_data = births_subset['births'].to_numpy()

# 1. Standardization (most common for Bayesian modeling)
standardized = (original_data - original_data.mean()) / original_data.std()

# 2. Min-Max normalization  
min_max_norm = (original_data - original_data.min()) / (original_data.max() - original_data.min())

# 3. Log transformation
log_transform = np.log(original_data)

print("📊 Preprocessing Results:")
print(f"   • Original data: mean={original_data.mean():.0f}, std={original_data.std():.0f}")
print(f"   • Standardized: mean={standardized.mean():.3f}, std={standardized.std():.3f}")
print(f"   • Min-Max norm: min={min_max_norm.min():.3f}, max={min_max_norm.max():.3f}")
print(f"   • Log transform: mean={log_transform.mean():.3f}, std={log_transform.std():.3f}")

# Choose standardized data for modeling (most common choice)
births_standardized = standardized
print(f"\n✅ **Selected preprocessing**: Standardized data (mean=0, std=1)")
print(f"   This choice provides:")
print(f"   • Numerical stability for MCMC sampling")
print(f"   • Easy interpretation of parameters")
print(f"   • Natural scale for prior specification")

📊 Preprocessing Results:
   • Original data: mean=293389, std=25027
   • Standardized: mean=-0.000, std=1.000
   • Min-Max norm: min=0.000, max=1.000
   • Log transform: mean=12.586, std=0.086

✅ **Selected preprocessing**: Standardized data (mean=0, std=1)
   This choice provides:
   • Numerical stability for MCMC sampling
   • Easy interpretation of parameters
   • Natural scale for prior specification


## Summary

In this section, we've covered:

1. **Motivation for Bayesian time series analysis**: Uncertainty quantification, prior knowledge incorporation, and flexibility
2. **Key time series characteristics**: Trend, seasonality, autocorrelation, and stationarity
3. **Essential preprocessing techniques**: Standardization, normalization, and transformations

**Next**: In Section 2, we'll dive into the fundamentals of Bayesian inference and learn how to use PyMC for time series modeling.

---

**Key Takeaways**:
- Bayesian methods treat uncertainty as a first-class citizen
- Time series data have unique characteristics that require specialized modeling approaches
- Proper preprocessing is essential for successful Bayesian time series modeling
- Standardization is typically the best preprocessing choice for Bayesian models