# EDA & Insights – Winter Mountain Tour Demand & Cancellations

## Objectives
- Explore seasonal booking patterns by region.
- Quantify the uplift from bank holiday weeks and peak winter.
- Understand how weather and calendar features relate to bookings.
- Explore cancellation patterns, especially under severe weather.
- Generate figures and narrative insights to support hypotheses H1–H3.

## Inputs
- `data/processed/weekly_bookings_regression.csv`
- `data/processed/bookings_for_classification.csv`

## Outputs
- EDA figures saved to `reports/figures/`
- Textual insights that will be re-used in the README and dashboard.


In [None]:
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

BASE_DIR = Path("..").resolve()
DATA_PROCESSED = BASE_DIR / "data" / "processed"
FIG_DIR = BASE_DIR / "reports" / "figures"

FIG_DIR.mkdir(parents=True, exist_ok=True)

sns.set(style="whitegrid")


In [None]:
weekly = pd.read_csv(
    DATA_PROCESSED / "weekly_bookings_regression.csv",
    parse_dates=["week_start"]
)

bookings_clf = pd.read_csv(
    DATA_PROCESSED / "bookings_for_classification.csv",
    parse_dates=["tour_date", "booking_date", "week_start"]
)

weekly.head(), bookings_clf.head()


In [None]:
plt.figure(figsize=(14, 6))
for region, grp in weekly.groupby("region"):
    plt.plot(grp["week_start"], grp["bookings_count"], label=region)

plt.title("Weekly Bookings Over Time by Region")
plt.xlabel("Week start")
plt.ylabel("Bookings count")
plt.legend()
plt.tight_layout()

fig_path = FIG_DIR / "weekly_bookings_by_region.png"
plt.savefig(fig_path, dpi=120)
fig_path


### Insight 1 – Seasonality Patterns by Region (supports H1, H3)

The weekly bookings time series shows clear **seasonal peaks** during winter months
and known holiday periods. Regions such as the Lake District exhibit consistently
higher baseline demand compared to others, with visible surges during peak weeks.

This supports the idea that:
- Seasonality and holidays are key drivers of demand (H1).
- Recent history (lagged demand) is likely to be informative, as peaks and troughs
  tend to persist over short time windows (H3).
