# Ultimate Ski Holiday 2026 - Data Analysis
## Inter-Uni Datathon 2025 - Allianz Challenge

**Objective**: Identify the optimal week and ski resort for the ultimate ski holiday in 2026

**Key Considerations**:
- Visitor numbers and crowd levels
- Weather patterns and snow conditions
- Prices and value for money
- Unique features and accessibility of each resort

## 1. Setup and Data Loading

In [None]:
# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Set style for better looking plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully!")

In [None]:
# Load the datasets
file_path = "2025 Allianz Datathon Dataset.xlsx"

# Load all sheets
visitation_data = pd.read_excel(file_path, sheet_name="Visitation Data")
climate_data = pd.read_excel(file_path, sheet_name="Climate Data")

print("Dataset loaded successfully!")
print(f"Visitation data shape: {visitation_data.shape}")
print(f"Climate data shape: {climate_data.shape}")

## 2. Initial Data Exploration

In [None]:
# Display basic information about visitation data
print("VISITATION DATA OVERVIEW:")
print("=" * 40)
print(visitation_data.head())
print("\nData Info:")
print(visitation_data.info())
print("\nSummary Statistics:")
print(visitation_data.describe())

In [None]:
# Display basic information about climate data
print("CLIMATE DATA OVERVIEW:")
print("=" * 40)
print(climate_data.head())
print("\nData Info:")
print(climate_data.info())
print("\nSummary Statistics:")
print(climate_data.describe())

In [None]:
# Check for missing values
print("MISSING VALUES ANALYSIS:")
print("=" * 40)
print("\nVisitation Data Missing Values:")
print(visitation_data.isnull().sum())
print("\nClimate Data Missing Values:")
print(climate_data.isnull().sum())

## 3. Data Cleaning and Preparation

In [None]:
# Get resort names from visitation data (excluding Year and Week columns)
resort_columns = [col for col in visitation_data.columns if col not in ['Year', 'Week']]
print("Available resorts:")
for i, resort in enumerate(resort_columns, 1):
    print(f"{i}. {resort}")

print(f"\nTotal resorts: {len(resort_columns)}")

In [None]:
# Create a mapping of weather stations to resorts based on README information
weather_station_mapping = {
    71032: 'Thredbo',  # Thredbo AWS
    71075: 'Perisher',  # Perisher AWS - also covers Charlotte Pass
    72161: 'Charlotte Pass',  # Cabramurra SMHEA AWS - close to Charlotte Pass
    83024: 'Mt. Buller',  # Mount Buller - also covers Mt. Stirling
    83084: 'Falls Creek',  # Falls Creek
    83085: 'Mt. Hotham',  # Mount Hotham
    85291: 'Mt. Baw Baw'   # Mount Baw Baw
}

# Add resort names to climate data
climate_data['Resort'] = climate_data['Bureau of Meteorology station number'].map(weather_station_mapping)

print("Weather station to resort mapping:")
for station, resort in weather_station_mapping.items():
    count = climate_data[climate_data['Bureau of Meteorology station number'] == station].shape[0]
    print(f"Station {station} -> {resort}: {count} records")

In [None]:
# Create date column for climate data
climate_data['Date'] = pd.to_datetime(climate_data[['Year', 'Month', 'Day']])

print("Climate data date range:")
print(f"From: {climate_data['Date'].min()}")
print(f"To: {climate_data['Date'].max()}")
print(f"Total days: {climate_data['Date'].nunique()}")

## 4. Ski Season Week Mapping

In [None]:
# Define ski season dates for each year (based on 2024 example from README)
# Week 1 starts around early June, Week 15 ends in mid September

def get_ski_season_dates(year):
    """Generate ski season dates for a given year"""
    # Approximate ski season: early June to mid September (15 weeks)
    # This is based on the 2024 dates provided in README
    import datetime
    
    # Start around first week of June
    start_date = datetime.date(year, 6, 8)  # Approximate start
    
    week_dates = {}
    for week in range(1, 16):
        week_start = start_date + datetime.timedelta(weeks=week-1)
        week_end = week_start + datetime.timedelta(days=6)
        week_dates[week] = {'start': week_start, 'end': week_end}
    
    return week_dates

# Generate ski season dates for all years in our data
years_in_data = sorted(visitation_data['Year'].unique())
ski_seasons = {}

for year in years_in_data:
    ski_seasons[year] = get_ski_season_dates(year)

print("Ski season mapping created for years:", years_in_data)
print("\nExample - 2024 Week 1:", ski_seasons[2024][1])
print("Example - 2024 Week 15:", ski_seasons[2024][15])

## 5. Next Steps Preview

The following analysis sections will be implemented:

### Phase 2: Core Analysis
- Weather-Visitation Correlation Analysis
- Peak vs Off-Peak Week Identification  
- Resort Performance Comparison
- Historical Trend Analysis (2014-2024)

### Phase 3: External Data Integration
- Flight price research and integration
- Accommodation cost analysis
- Resort pricing and amenities
- Special events and school holidays

### Phase 4: 2026 Predictions
- Weather forecasting for 2026 ski season
- Visitor demand predictions
- Price trend projections

### Phase 5: Recommendation Engine
- Multi-criteria decision analysis
- Optimal week and resort selection
- Trade-off analysis and sensitivity testing

In [None]:
# Save cleaned data for next phase
print("Data preparation completed!")
print("\nDatasets ready for analysis:")
print(f"- Visitation data: {visitation_data.shape[0]} records across {len(resort_columns)} resorts")
print(f"- Climate data: {climate_data.shape[0]} records from {climate_data['Resort'].nunique()} weather stations")
print(f"- Time period: {visitation_data['Year'].min()}-{visitation_data['Year'].max()} (visitation)")
print(f"- Climate period: {climate_data['Year'].min()}-{climate_data['Year'].max()} (weather)")