# 03_model_regression – Weekly Bookings Forecast (Regression)

## Objectives
- Train baseline and simple regression models to forecast weekly bookings by region.
- Establish benchmark errors (MAE, MAPE, R²).
- Produce evaluation plots (actual vs predicted, residuals).
- Assess whether performance meets business requirements (KPIs).

## Inputs
- `data/processed/train_regression.csv`
- `data/processed/test_regression.csv`

## Outputs
- Baseline metrics
- Linear/ElasticNet model metrics
- Evaluation plots saved to `reports/figures/`
- (Advanced boosted model + hyperparameter tuning will be added in Part 2)


In [None]:
import pandas as pd
import numpy as np
from pathlib import Path

from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, r2_score
from sklearn.linear_model import LinearRegression, ElasticNet
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

import matplotlib.pyplot as plt
import seaborn as sns

BASE_DIR = Path("..").resolve()
DATA = BASE_DIR / "data" / "processed"
FIG_DIR = BASE_DIR / "reports" / "figures"
FIG_DIR.mkdir(parents=True, exist_ok=True)

sns.set(style="whitegrid")


In [None]:
train = pd.read_csv(DATA / "train_regression.csv", parse_dates=["week_start"])
test = pd.read_csv(DATA / "test_regression.csv", parse_dates=["week_start"])

train.head(), test.head()


In [None]:
TARGET = "bookings_count"

FEATURES = [
    "region",
    "week_number",
    "month",
    "is_bank_holiday_week",
    "is_peak_winter",
    "mean_temp_c",
    "precip_mm",
    "snowfall_flag",
    "wind_speed_kph",
    "visibility_km",
    "lag_1w_bookings",
    "lag_4w_mean",
    "lag_52w_bookings"
]

X_train = train[FEATURES]
y_train = train[TARGET]
X_test = test[FEATURES]
y_test = test[TARGET]
