[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/danpele/Time-Series-Analysis/blob/main/EN/Seminar_Notebooks/chapter0_seminar_notebook.ipynb)

---

# Chapter 0: Seminar - Exercises and Practice

**Course:** Time Series Analysis and Forecasting  
**Program:** Bachelor program, Faculty of Cybernetics, Statistics and Economic Informatics, Bucharest University of Economic Studies, Romania  
**Academic Year:** 2025-2026

---

## Seminar Objectives

In this seminar, you will:
1. Practice calculating exponential smoothing forecasts by hand
2. Apply decomposition methods to real data
3. Evaluate forecast accuracy with different metrics
4. Compare exponential smoothing methods
5. Understand trend and seasonality handling

## Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

import yfinance as yf
from statsmodels.tsa.seasonal import seasonal_decompose, STL
from statsmodels.tsa.holtwinters import SimpleExpSmoothing, ExponentialSmoothing
from statsmodels.tsa.filters.hp_filter import hpfilter
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Plotting style - clean, professional
plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['axes.facecolor'] = 'none'
plt.rcParams['figure.facecolor'] = 'none'
plt.rcParams['savefig.facecolor'] = 'none'
plt.rcParams['savefig.transparent'] = True
plt.rcParams['legend.frameon'] = False
plt.rcParams['axes.grid'] = False
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False

# Colors
BLUE = '#1A3A6E'
RED = '#DC3545'
GREEN = '#2E7D32'
ORANGE = '#E67E22'

print("Setup complete!")

---
# Part 1: Multiple Choice Quiz

Answer the following questions. Run the cell after each answer to check if you're correct.

### Quiz 1: Time Series Basics

**Question:** Which of the following is NOT a characteristic of time series data?

- A) Observations are ordered in time
- B) Consecutive observations are typically correlated
- C) Observations are independent and identically distributed
- D) The data has a natural temporal ordering

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz1_answer = ''  # <-- Enter your answer here

# Check answer
if quiz1_answer.upper() == 'C':
    print("CORRECT! Time series observations are typically DEPENDENT (autocorrelated), not i.i.d.")
    print("This temporal dependence is what makes time series analysis unique.")
elif quiz1_answer:
    print("Incorrect. Try again!")
    print("Hint: What assumption is violated in time series that holds in cross-sectional data?")

### Quiz 2: Decomposition

**Question:** When should you use multiplicative decomposition instead of additive?

- A) When the seasonal pattern has constant amplitude
- B) When the variance of the series is stable over time
- C) When the seasonal fluctuations grow proportionally with the level
- D) When the time series has no trend component

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz2_answer = ''  # <-- Enter your answer here

# Check answer
if quiz2_answer.upper() == 'C':
    print("CORRECT! In multiplicative decomposition X = T * S * e,")
    print("the seasonal component S is a ratio, so the absolute effect scales with the level.")
    print("Use when you see 'fan-shaped' patterns where variance increases with mean.")
elif quiz2_answer:
    print("Incorrect. Try again!")
    print("Hint: Think about what happens to seasonal peaks as the series level increases.")

### Quiz 3: Exponential Smoothing

**Question:** In Simple Exponential Smoothing with $\alpha = 0.9$, what happens?

- A) Forecasts are very smooth and stable
- B) Recent observations have very little weight
- C) Forecasts react quickly to recent changes
- D) The forecast is essentially a long-term average

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz3_answer = ''  # <-- Enter your answer here

# Check answer
if quiz3_answer.upper() == 'C':
    print("CORRECT! With alpha = 0.9: forecast = 0.9 * X_t + 0.1 * previous_forecast")
    print("This means 90% weight on the most recent observation!")
    print("High alpha = reactive. Low alpha = smooth.")
elif quiz3_answer:
    print("Incorrect. Try again!")
    print("Hint: What does a high alpha mean for the weight on the most recent observation?")

### Quiz 4: Holt-Winters Parameters

**Question:** In Holt-Winters exponential smoothing, what does the gamma ($\gamma$) parameter control?

- A) The level smoothing
- B) The trend smoothing
- C) The seasonal smoothing
- D) The error variance

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz4_answer = ''  # <-- Enter your answer here

# Check answer
if quiz4_answer.upper() == 'C':
    print("CORRECT! In Holt-Winters: alpha controls level, beta controls trend, gamma controls seasonality.")
    print("A high gamma means the seasonal pattern adapts quickly to recent seasonal changes.")
    print("A low gamma means the seasonal pattern is more stable over time.")
elif quiz4_answer:
    print("Incorrect. Try again!")
    print("Hint: Holt-Winters has three parameters: alpha (level), beta (trend), and gamma (?)")

### Quiz 5: Forecast Accuracy Metrics (RMSE vs MAE)

**Question:** When comparing RMSE and MAE for forecast evaluation, which statement is correct?

- A) RMSE is always smaller than MAE
- B) RMSE penalizes large errors more heavily than MAE
- C) MAE is more sensitive to outliers than RMSE
- D) RMSE and MAE always give the same ranking of models

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz5_answer = ''  # <-- Enter your answer here

# Check answer
if quiz5_answer.upper() == 'B':
    print("CORRECT! RMSE squares the errors before averaging, so large errors are penalized more.")
    print("RMSE >= MAE always (equality only when all errors are equal).")
    print("Use RMSE when large errors are particularly undesirable.")
elif quiz5_answer:
    print("Incorrect. Try again!")
    print("Hint: What happens when you square a large error vs a small error?")

### Quiz 6: Cross-validation for Time Series

**Question:** Why can't we use standard k-fold cross-validation for time series data?

- A) Time series data is too large
- B) It violates the temporal ordering and causes data leakage
- C) Cross-validation only works for classification
- D) Time series data has no variance

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz6_answer = ''  # <-- Enter your answer here

# Check answer
if quiz6_answer.upper() == 'B':
    print("CORRECT! Standard k-fold CV randomly shuffles data, which destroys temporal ordering.")
    print("Using future data to predict the past causes data leakage and overly optimistic results.")
    print("Use time series CV: expanding window or rolling window validation instead.")
elif quiz6_answer:
    print("Incorrect. Try again!")
    print("Hint: What happens when you shuffle time series data and use future values to predict past?")

### Quiz 7: Variance Stabilization

**Question:** A time series shows increasing variance as the level increases (heteroscedasticity). Which transformation is most appropriate?

- A) First differencing
- B) Log transformation
- C) Adding a constant
- D) Seasonal differencing

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz7_answer = ''  # <-- Enter your answer here

# Check answer
if quiz7_answer.upper() == 'B':
    print("CORRECT! Log transformation stabilizes variance when it increases with the level.")
    print("If Var(X) is proportional to E[X]^2, then Var(log X) becomes approximately constant.")
    print("This is common in financial and economic data.")
elif quiz7_answer:
    print("Incorrect. Try again!")
    print("Hint: Which transformation converts multiplicative relationships to additive?")

### Quiz 8: Trend Estimation

**Question:** Which method is most appropriate for estimating a non-linear trend in a time series?

- A) Simple moving average with window size 3
- B) Linear regression on time
- C) LOESS (locally weighted regression)
- D) First differencing

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz8_answer = ''  # <-- Enter your answer here

# Check answer
if quiz8_answer.upper() == 'C':
    print("CORRECT! LOESS (Locally Estimated Scatterplot Smoothing) is ideal for non-linear trends.")
    print("It fits local polynomials to subsets of data, adapting to changing curvature.")
    print("Linear regression assumes a straight line; differencing removes trends but doesn't estimate them.")
elif quiz8_answer:
    print("Incorrect. Try again!")
    print("Hint: Which method can adapt to curves and changes in trend direction?")

### Quiz 9: Seasonal Adjustment

**Question:** What is the purpose of seasonal adjustment in time series analysis?

- A) To remove the trend component
- B) To make the series non-stationary
- C) To remove recurring seasonal patterns for better trend analysis
- D) To increase the variance of the series

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz9_answer = ''  # <-- Enter your answer here

# Check answer
if quiz9_answer.upper() == 'C':
    print("CORRECT! Seasonal adjustment removes predictable seasonal patterns from data.")
    print("This makes it easier to see the underlying trend and irregular movements.")
    print("Seasonally adjusted data is often published for economic indicators like GDP and unemployment.")
elif quiz9_answer:
    print("Incorrect. Try again!")
    print("Hint: Why would economists want to 'remove' the Christmas shopping spike from retail sales?")

### Quiz 10: Forecast Horizon

**Question:** As the forecast horizon increases, what typically happens to forecast accuracy?

- A) Accuracy improves because more data is used
- B) Accuracy decreases because uncertainty accumulates
- C) Accuracy stays constant for stationary series
- D) Accuracy improves due to mean reversion

In [None]:
# Enter your answer: 'A', 'B', 'C', or 'D'
quiz10_answer = ''  # <-- Enter your answer here

# Check answer
if quiz10_answer.upper() == 'B':
    print("CORRECT! Forecast uncertainty grows with the horizon.")
    print("Each step ahead compounds the error from previous forecasts.")
    print("This is why confidence intervals 'fan out' as h increases.")
elif quiz10_answer:
    print("Incorrect. Try again!")
    print("Hint: What happens to prediction intervals as you forecast further ahead?")

---
# Part 2: True/False Questions

In [None]:
# Answer each statement with True or False
tf_answers = {
    1: None,  # "Multiplicative decomposition is always better than additive."
    2: None,  # "Holt-Winters is appropriate for data with no seasonality."
    3: None,  # "You should always use the test set for hyperparameter tuning."
    4: None,  # "Log transformation can stabilize variance in a time series."
    5: None,  # "Moving average with a larger window produces smoother trends."
    6: None,  # "MAPE is undefined when actual values are zero."
}

# Enter your answers below (True or False)
tf_answers[1] = None  # Multiplicative always better
tf_answers[2] = None  # Holt-Winters for non-seasonal
tf_answers[3] = None  # Use test set for tuning
tf_answers[4] = None  # Log stabilizes variance
tf_answers[5] = None  # Larger window = smoother
tf_answers[6] = None  # MAPE undefined at zero

In [None]:
# Check your answers
correct_answers = {1: False, 2: False, 3: False, 4: True, 5: True, 6: True}
explanations = {
    1: "FALSE: Use multiplicative when seasonal amplitude grows with level, additive when constant.",
    2: "FALSE: Use Holt's method (no seasonal) or SES for non-seasonal data.",
    3: "FALSE: Use VALIDATION set for tuning. Test set is for FINAL evaluation only!",
    4: "TRUE: Log transformation converts multiplicative to additive, stabilizing variance.",
    5: "TRUE: Larger windows average over more observations, producing smoother trends.",
    6: "TRUE: MAPE = |error/actual|, so division by zero when actual = 0."
}

score = 0
for q, correct in correct_answers.items():
    user_ans = tf_answers[q]
    if user_ans is None:
        status = "NOT ANSWERED"
    elif user_ans == correct:
        status = "CORRECT"
        score += 1
    else:
        status = "INCORRECT"
    print(f"Q{q}: {status}")
    if user_ans is not None:
        print(f"   {explanations[q]}")
    print()

print(f"\nScore: {score}/6")

---
# Part 3: Calculation Exercises

## Exercise 1: Simple Exponential Smoothing by Hand

Given the following data and $\alpha = 0.3$:

| t | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| $X_t$ | 10 | 12 | 11 | 14 | 13 |

Starting with $\hat{X}_1 = X_1 = 10$, calculate the forecasts.

In [None]:
# Data
X = [10, 12, 11, 14, 13]
alpha = 0.3

# YOUR TASK: Fill in the forecasts
# Formula: X_hat[t+1] = alpha * X[t] + (1-alpha) * X_hat[t]

X_hat = [10]  # Start with X_hat[1] = 10

# Calculate X_hat[2]
X_hat_2 = None  # <-- Calculate this

# Calculate X_hat[3]
X_hat_3 = None  # <-- Calculate this

# Calculate X_hat[4]
X_hat_4 = None  # <-- Calculate this

# Calculate X_hat[5]
X_hat_5 = None  # <-- Calculate this

# Calculate X_hat[6] (forecast for next period)
X_hat_6 = None  # <-- Calculate this

print("Your answers:")
print(f"X_hat[2] = {X_hat_2}")
print(f"X_hat[3] = {X_hat_3}")
print(f"X_hat[4] = {X_hat_4}")
print(f"X_hat[5] = {X_hat_5}")
print(f"X_hat[6] = {X_hat_6}")

In [None]:
# SOLUTION - Run this to check your answers
print("SOLUTION:")
print("="*50)

X = [10, 12, 11, 14, 13]
alpha = 0.3
X_hat_sol = [10]  # X_hat[1] = X[1] = 10

for t in range(len(X)):
    next_forecast = alpha * X[t] + (1 - alpha) * X_hat_sol[-1]
    X_hat_sol.append(round(next_forecast, 2))
    if t < len(X) - 1:
        print(f"X_hat[{t+2}] = {alpha} * {X[t]} + {1-alpha} * {X_hat_sol[t]:.2f} = {next_forecast:.2f}")
    else:
        print(f"X_hat[{t+2}] = {alpha} * {X[t]} + {1-alpha} * {X_hat_sol[t]:.2f} = {next_forecast:.2f} (Forecast)")

# Calculate errors
errors = [X[i] - X_hat_sol[i] for i in range(1, len(X))]
mae = np.mean(np.abs(errors))
rmse = np.sqrt(np.mean(np.array(errors)**2))

print(f"\nErrors: {[round(e, 2) for e in errors]}")
print(f"MAE: {mae:.3f}")
print(f"RMSE: {rmse:.3f}")

## Exercise 2: Seasonal Index Calculation

Given quarterly sales data for 2 years:

| Year | Q1 | Q2 | Q3 | Q4 |
|------|-----|-----|-----|-----|
| 2023 | 80 | 120 | 140 | 160 |
| 2024 | 100 | 140 | 160 | 200 |

Calculate the seasonal indices using the ratio-to-moving-average method (multiplicative).

In [None]:
# Data
sales = [80, 120, 140, 160, 100, 140, 160, 200]
quarters = ['Q1', 'Q2', 'Q3', 'Q4'] * 2

# YOUR TASK:
# Step 1: Calculate centered moving average (window=4)
# Step 2: Calculate ratio: actual / CMA
# Step 3: Average the ratios for each quarter
# Step 4: Normalize so they sum to 4 (or average to 1)

# Calculate the seasonal indices for each quarter
Q1_index = None  # <-- Calculate
Q2_index = None  # <-- Calculate
Q3_index = None  # <-- Calculate
Q4_index = None  # <-- Calculate

print("Your seasonal indices:")
print(f"Q1: {Q1_index}")
print(f"Q2: {Q2_index}")
print(f"Q3: {Q3_index}")
print(f"Q4: {Q4_index}")

In [None]:
# SOLUTION
print("SOLUTION:")
print("="*50)

import pandas as pd
sales_series = pd.Series(sales)

# Calculate 4-period moving average (for quarterly data)
# Use a 2x4-MA (centered moving average)
ma4 = sales_series.rolling(window=4, center=False).mean()
cma = ma4.rolling(window=2, center=False).mean().shift(-1)

# Calculate seasonal ratios
ratios = sales_series / cma

print("Centered Moving Averages and Ratios:")
for i in range(len(sales)):
    cma_val = cma.iloc[i] if pd.notna(cma.iloc[i]) else 'N/A'
    ratio_val = ratios.iloc[i] if pd.notna(ratios.iloc[i]) else 'N/A'
    print(f"  {quarters[i]}: Sales={sales[i]}, CMA={cma_val}, Ratio={ratio_val}")

# Average ratios by quarter
q_ratios = {}
for q in ['Q1', 'Q2', 'Q3', 'Q4']:
    q_vals = [ratios.iloc[i] for i in range(len(quarters)) if quarters[i] == q and pd.notna(ratios.iloc[i])]
    q_ratios[q] = np.mean(q_vals) if q_vals else np.nan

# Normalize
total = sum([v for v in q_ratios.values() if pd.notna(v)])
n_valid = sum([1 for v in q_ratios.values() if pd.notna(v)])
adjustment = n_valid / total if total > 0 else 1

print("\nSeasonal Indices (simplified calculation):")
# For a cleaner solution, use statsmodels
from statsmodels.tsa.seasonal import seasonal_decompose
idx = pd.date_range('2023-01-01', periods=8, freq='Q')
sales_ts = pd.Series(sales, index=idx)
decomp = seasonal_decompose(sales_ts, model='multiplicative', period=4)

indices = decomp.seasonal[:4]
print(f"Q1: {indices.iloc[0]:.3f}")
print(f"Q2: {indices.iloc[1]:.3f}")
print(f"Q3: {indices.iloc[2]:.3f}")
print(f"Q4: {indices.iloc[3]:.3f}")
print(f"\nInterpretation: Q1 is {(1-indices.iloc[0])*100:.1f}% below average, Q4 is {(indices.iloc[3]-1)*100:.1f}% above average")

## Exercise 3: Error Metrics Calculation

Given actual and forecast values:

| t | Actual | Forecast |
|---|--------|----------|
| 1 | 100 | 95 |
| 2 | 110 | 105 |
| 3 | 90 | 100 |
| 4 | 120 | 115 |

Calculate MAE, MSE, RMSE, and MAPE.

In [None]:
# Data
actual = np.array([100, 110, 90, 120])
forecast = np.array([95, 105, 100, 115])

# YOUR TASK: Calculate error metrics
# Formulas:
# MAE = mean(|actual - forecast|)
# MSE = mean((actual - forecast)^2)
# RMSE = sqrt(MSE)
# MAPE = 100 * mean(|actual - forecast| / |actual|)

errors = actual - forecast
print(f"Errors: {errors}")

MAE = None  # <-- Calculate
MSE = None  # <-- Calculate
RMSE = None  # <-- Calculate
MAPE = None  # <-- Calculate

print(f"\nYour answers:")
print(f"MAE = {MAE}")
print(f"MSE = {MSE}")
print(f"RMSE = {RMSE}")
print(f"MAPE = {MAPE}%")

In [None]:
# SOLUTION
print("SOLUTION:")
print("="*50)

errors = actual - forecast
print(f"Errors: {errors}")
print(f"Absolute Errors: {np.abs(errors)}")
print(f"Squared Errors: {errors**2}")

MAE_sol = np.mean(np.abs(errors))
MSE_sol = np.mean(errors**2)
RMSE_sol = np.sqrt(MSE_sol)
MAPE_sol = 100 * np.mean(np.abs(errors) / actual)

print(f"\nMAE = mean(|5, 5, 10, 5|) = {MAE_sol}")
print(f"MSE = mean(25, 25, 100, 25) = {MSE_sol}")
print(f"RMSE = sqrt({MSE_sol}) = {RMSE_sol:.2f}")
print(f"MAPE = 100 * mean(5/100, 5/110, 10/90, 5/120) = {MAPE_sol:.2f}%")

---
# Part 4: Python Coding Exercises

## Exercise 4: Load and Decompose Real Data

In [None]:
# TASK: Perform STL decomposition on airline passengers data

# Load data
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'
airline = pd.read_csv(url, parse_dates=['Month'], index_col='Month')
airline.columns = ['Passengers']

# Step 1: Apply STL decomposition with period=12
# YOUR CODE HERE
# stl = STL(...)  # <-- Complete this
# result = stl.fit()


# Step 2: Plot all four components (original, trend, seasonal, residual)
# YOUR CODE HERE


# Step 3: Calculate what percentage of variance is explained by trend
# Hint: Compare Var(trend) to Var(original)
# YOUR CODE HERE

In [None]:
# SOLUTION
print("SOLUTION:")
print("="*50)

# STL decomposition
stl = STL(airline['Passengers'], period=12, robust=True)
result = stl.fit()

# Plot
fig, axes = plt.subplots(4, 1, figsize=(12, 10))

axes[0].plot(airline.index, airline['Passengers'], color=BLUE, label='Original')
axes[0].set_title('Original', fontweight='bold')
axes[0].legend(loc='upper left')

axes[1].plot(airline.index, result.trend, color=GREEN, label='Trend')
axes[1].set_title('Trend', fontweight='bold')
axes[1].legend(loc='upper left')

axes[2].plot(airline.index, result.seasonal, color=ORANGE, label='Seasonal')
axes[2].set_title('Seasonal', fontweight='bold')
axes[2].legend(loc='upper left')

axes[3].plot(airline.index, result.resid, color=RED, label='Residual')
axes[3].set_title('Residual', fontweight='bold')
axes[3].legend(loc='upper left')

plt.tight_layout()
plt.show()

# Variance explained
var_original = airline['Passengers'].var()
var_trend = result.trend.var()
pct_explained = (var_trend / var_original) * 100

print(f"\nVariance of original: {var_original:.2f}")
print(f"Variance of trend: {var_trend:.2f}")
print(f"Percentage explained by trend: {pct_explained:.1f}%")

## Exercise 5: Forecast Comparison

In [None]:
# TASK: Compare SES, Holt, and Holt-Winters on airline data

# Split data
train = airline[:'1958']
test = airline['1959':]

# Step 1: Fit Simple Exponential Smoothing
# ses = SimpleExpSmoothing(train['Passengers']).fit()
# ses_forecast = ses.forecast(len(test))


# Step 2: Fit Holt's method (trend='add')
# YOUR CODE HERE


# Step 3: Fit Holt-Winters with multiplicative seasonality
# YOUR CODE HERE


# Step 4: Calculate RMSE for each method
# YOUR CODE HERE


# Step 5: Plot all forecasts vs actual
# YOUR CODE HERE

In [None]:
# SOLUTION

# Fit models
ses = SimpleExpSmoothing(train['Passengers']).fit()
holt = ExponentialSmoothing(train['Passengers'], trend='add', seasonal=None).fit()
hw = ExponentialSmoothing(train['Passengers'], trend='add', 
                          seasonal='mul', seasonal_periods=12).fit()

# Forecasts
h = len(test)
ses_fc = ses.forecast(h)
holt_fc = holt.forecast(h)
hw_fc = hw.forecast(h)

# Calculate RMSE
actual = test['Passengers'].values
rmse_ses = np.sqrt(mean_squared_error(actual, ses_fc))
rmse_holt = np.sqrt(mean_squared_error(actual, holt_fc))
rmse_hw = np.sqrt(mean_squared_error(actual, hw_fc))

# Plot
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(train.index, train['Passengers'], color=BLUE, label='Training')
ax.plot(test.index, test['Passengers'], color='gray', linewidth=2, label='Actual')
ax.plot(test.index, ses_fc, color=RED, linestyle='--', label=f'SES (RMSE={rmse_ses:.1f})')
ax.plot(test.index, holt_fc, color=ORANGE, linestyle='--', label=f'Holt (RMSE={rmse_holt:.1f})')
ax.plot(test.index, hw_fc, color=GREEN, linestyle='--', label=f'HW (RMSE={rmse_hw:.1f})')
ax.axvline(x=train.index[-1], color='black', linestyle='-', alpha=0.3)
ax.set_title('Forecast Comparison', fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Passengers')
ax.legend(loc='upper left')
plt.tight_layout()
plt.show()

print("SOLUTION:")
print("="*50)
print(f"\nRMSE Comparison:")
print(f"  SES:          {rmse_ses:.2f}")
print(f"  Holt:         {rmse_holt:.2f}")
print(f"  Holt-Winters: {rmse_hw:.2f}")
print(f"\nBest Model: Holt-Winters (captures both trend and seasonality!)")

## Exercise 6: Effect of Smoothing Parameters

In [None]:
# TASK: Explore how alpha affects SES forecasts

# Create simple synthetic data
np.random.seed(42)
n = 50
synthetic = pd.Series(50 + np.cumsum(np.random.randn(n) * 2),
                      index=pd.date_range('2020-01-01', periods=n, freq='D'))

# Test different alpha values
alphas = [0.1, 0.3, 0.5, 0.9]

# YOUR TASK:
# 1. Fit SES with each alpha value
# 2. Plot the fitted values for each
# 3. Observe the differences

In [None]:
# SOLUTION
fig, axes = plt.subplots(2, 2, figsize=(14, 8))
axes = axes.flatten()

for i, alpha in enumerate(alphas):
    model = SimpleExpSmoothing(synthetic).fit(smoothing_level=alpha, optimized=False)
    fitted = model.fittedvalues
    
    axes[i].plot(synthetic.index, synthetic, color='gray', linewidth=1, alpha=0.7, label='Actual')
    axes[i].plot(fitted.index, fitted, color=BLUE, linewidth=2, label=f'SES (alpha={alpha})')
    axes[i].set_title(f'alpha = {alpha}', fontweight='bold')
    axes[i].legend(loc='upper left')

plt.suptitle('Effect of Smoothing Parameter on SES', fontweight='bold', fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("Observation:")
print("- Low alpha (0.1): Very smooth, slow to react to changes")
print("- High alpha (0.9): Very reactive, follows data closely")

---
# Part 5: Discussion Questions

Write your answers in the markdown cells below.

### Discussion 1

**Scenario:** You are analyzing monthly sales data for a retail company. The data shows clear seasonality (high sales in December) and an upward trend. The seasonal peaks have been getting larger over time.

**Questions:**
1. Should you use additive or multiplicative decomposition? Why?
2. Which exponential smoothing method would you recommend?
3. How would you evaluate your forecast model?

**Your Answer:**

*Write your answer here...*

### Discussion 2

**Scenario:** Your colleague suggests using RMSE to compare two forecast models. Model A has RMSE = 5 on one dataset, and Model B has RMSE = 50 on a different dataset. Your colleague concludes Model A is better.

**Questions:**
1. What is wrong with this comparison?
2. How should you properly compare models?
3. Which metric would be more appropriate for comparing across different scales?

**Your Answer:**

*Write your answer here...*

---
# Summary

## Key Takeaways from Today's Seminar

1. **Choose decomposition wisely** - multiplicative when seasonal amplitude grows with level
2. **Understand smoothing parameters** - high $\alpha$ = reactive, low $\alpha$ = smooth
3. **Match method to data** - SES (no trend/seasonality), Holt (trend), Holt-Winters (both)
4. **Proper evaluation** - never tune on test set, use validation set
5. **Error metrics matter** - RMSE penalizes large errors, MAPE is scale-independent

## Next Seminar
Stochastic processes, stationarity, and unit root testing