# Flight Rerouting Analysis at Dublin Airport

This project investigates flight rerouting events at Dublin Airport and their relationship to local weather conditions. The analysis combines flight activity data with historical and forecast weather data from Met √âireann to identify trends, visualise reroute reasons, and project future rerouting probabilities.

## Setup Instructions - Import Libraries

This notebook requires the following libraries:

In [None]:
# Setup: imports, paths and basic config
import json  # for any config files
from pathlib import Path  # for path management
import numpy as np  # numerical operations
import pandas as pd  # data manipulation
import matplotlib.pyplot as plt  # plotting
import seaborn as sns  # enhanced plotting
import plotly.express as px  # interactive plotting

from sklearn.model_selection import train_test_split, cross_val_score  # model validation
from sklearn.linear_model import LogisticRegression  # example model
from sklearn.ensemble import GradientBoostingClassifier  # example model
from sklearn.metrics import roc_auc_score, classification_report, confusion_matrix  # model evaluation
import joblib  # model persistence

# Plotting style
sns.set(style='whitegrid')

# Explicit project root: programming-for-data-analytics/project
ROOT = Path.cwd().resolve()
if ROOT.name != "project":
    # climb up until we find project folder
    for parent in ROOT.parents:
        if parent.name == "project":
            ROOT = parent
            break

# Define key directories inside project
DATA_DIR = ROOT / "data"
OUTPUT_DIR = ROOT / "outputs"
MODEL_DIR = ROOT / "models"
DOCS_DIR = ROOT / "docs"

# Ensure directories exist
for path in [DATA_DIR, OUTPUT_DIR, MODEL_DIR, DOCS_DIR]:
    path.mkdir(parents=True, exist_ok=True)

print(f"Project root: {ROOT}")
print(f"Data directory: {DATA_DIR}")
print(f"Output directory: {OUTPUT_DIR}")
print(f"Model directory: {MODEL_DIR}")
print(f"Docs directory: {DOCS_DIR}")


Project root: C:\Users\eCron\OneDrive\Documents\ATU_CourseWork\Programming For Data Analytics\programming-for-data-analytics\big-project
Data directory: C:\Users\eCron\OneDrive\Documents\ATU_CourseWork\Programming For Data Analytics\programming-for-data-analytics\big-project\data
Output directory: C:\Users\eCron\OneDrive\Documents\ATU_CourseWork\Programming For Data Analytics\programming-for-data-analytics\big-project\outputs
Model directory: C:\Users\eCron\OneDrive\Documents\ATU_CourseWork\Programming For Data Analytics\programming-for-data-analytics\big-project\models
Docs directory: C:\Users\eCron\OneDrive\Documents\ATU_CourseWork\Programming For Data Analytics\programming-for-data-analytics\big-project\docs


### Step 3 ‚Äì Utilise Helper Functions for Dublin Airport Data Processing

This section defines a set of reusable helper functions that simplify common tasks in the project.  
They are designed specifically to support the analysis of **Dublin Airport flight activity and weather data** by handling messy inputs and preparing clean datasets for exploration and modelling.

The functions help with:

- ‚úÖ Detecting and parsing inconsistent datetime formats in flight and weather logs  
- ‚úÖ Standardising and cleaning temperature columns from Met √âireann datasets  
- ‚úÖ Loading and preparing Dublin Airport daily weather data from local CSV files  
- ‚úÖ Defining Irish seasonal boundaries for rerouting analysis (Winter, Spring, Summer, Autumn)  
- ‚úÖ Filtering weather data for a custom date range to align with flight events  
- ‚úÖ Validating user-provided date inputs for reproducible analysis  
- ‚úÖ Detecting header rows in raw CSV files downloaded from dashboards  

Each helper is **modular** ‚Äî it performs one clear task and can be reused across notebooks and scripts.  
This improves readability, reduces duplication, and supports good programming practices for the final project.

üìå *Tip: These helpers are written to be beginner-friendly, with comments explaining their purpose and logic. They make it easier to align flight activity with weather conditions when investigating rerouting events.*

üìñ References:  
- [Real Python ‚Äì Python Helper Functions](https://realpython.com/defining-your-own-python-function/)  
- [GeeksforGeeks ‚Äì Python Helper Functions](https://www.geeksforgeeks.org/python-helper-functions/)  
- [Wikipedia ‚Äì DRY Principle (Don't Repeat Yourself)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)


In [None]:
# üìÇ Helper Functions for Dublin Airport Project
# These functions help with parsing dates, cleaning weather data, handling temperature columns,
# defining Irish seasons, preparing data ranges, and detecting CSV headers.
# Keep them in one cell so they are easy to reuse across the notebook.

import pandas as pd
import warnings
from pathlib import Path

# üìÖ Detect the most likely datetime format from sample strings
def detect_datetime_format(samples, formats, dayfirst=True):
    """
    Try each format and return the one that matches at least 70% of samples.
    Helps ensure consistent parsing of date strings.
    """
    for fmt in formats:
        parsed = pd.to_datetime(samples, format=fmt, dayfirst=dayfirst, errors='coerce')
        if parsed.notna().sum() >= max(1, int(len(samples) * 0.7)):
            return fmt
    return None

# üìÖ Parse a datetime column using format detection or fallback
def parse_datetime_column(df, date_col, candidate_formats=None, dayfirst=True):
    """
    Parse a datetime column using known formats.
    Falls back to flexible parsing if none match.
    """
    if candidate_formats is None:
        candidate_formats = [
            '%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M', '%d-%b-%Y %H:%M',
            '%d/%m/%Y %H:%M:%S', '%d/%m/%Y %H:%M', '%d-%m-%Y %H:%M',
            '%d %b %Y %H:%M', '%d %B %Y %H:%M',
        ]

    sample_vals = df[date_col].dropna().astype(str).head(80).tolist()
    chosen_fmt = detect_datetime_format(sample_vals, candidate_formats, dayfirst=dayfirst)

    if chosen_fmt:
        print(f"‚úÖ Detected datetime format: {chosen_fmt}")
        return pd.to_datetime(df[date_col], format=chosen_fmt, dayfirst=dayfirst, errors='coerce')
    else:
        print("‚ö†Ô∏è No single format matched. Falling back to flexible parsing.")
        with warnings.catch_warnings():
            warnings.filterwarnings('ignore', message='Could not infer format')
            return pd.to_datetime(df[date_col], dayfirst=dayfirst, errors='coerce')

# üå°Ô∏è Ensure temperature column is numeric and named 'temp'
def parse_temperature_column(df, col_name='temp'):
    """
    Convert the temperature column to numeric and rename it to 'temp'.
    If no exact match, look for any column containing 'temp'.
    """
    if col_name not in df.columns:
        col_name = next((c for c in df.columns if 'temp' in c.lower()), None)
        if col_name is None:
            raise KeyError("No temperature column found.")
    df['temp'] = pd.to_numeric(df[col_name], errors='coerce')
    return df

# üìÇ Load cleaned weather data from local CSV
def load_cleaned_weather_data(filepath="data/dublin_airport_daily.csv"):
    """
    Load weather dataset from CSV and strip spaces from column names.
    """
    df = pd.read_csv(filepath, low_memory=False)
    df.columns = df.columns.str.strip()
    return df

# üçÇ Define Irish seasonal boundaries for a given year
def define_irish_seasons(year=2025):
    """
    Return start and end dates for Irish meteorological seasons.
    """
    data = [
        ("Winter", pd.Timestamp(f"{year-1}-12-01"), pd.Timestamp(f"{year}-02-28 23:59")),
        ("Spring", pd.Timestamp(f"{year}-03-01"), pd.Timestamp(f"{year}-05-31 23:59")),
        ("Summer", pd.Timestamp(f"{year}-06-01"), pd.Timestamp(f"{year}-08-31 23:59")),
        ("Autumn", pd.Timestamp(f"{year}-09-01"), pd.Timestamp(f"{year}-11-30 23:59")),
    ]
    return pd.DataFrame(data, columns=["season", "start", "end"])

# üìä Filter and prepare temperature data for a custom date range
def prepare_temperature_data(df, start_date, end_date):
    """
    Filter weather data to a date range and add useful time features.
    """
    df = df.copy()
    df.columns = df.columns.str.strip()

    if 'date' not in df.columns:
        raise KeyError("Expected 'date' column not found.")

    # Try parsing with a common format, fallback to flexible parsing
    try:
        df['datetime'] = pd.to_datetime(df['date'], format='%d-%b-%Y %H:%M', errors='raise')
    except Exception:
        df['datetime'] = pd.to_datetime(df['date'], errors='coerce')

    df = df.dropna(subset=['datetime'])
    mask = (df['datetime'] >= start_date) & (df['datetime'] <= end_date)
    range_df = df.loc[mask].copy()

    # Add date and hour columns for plotting
    range_df['date'] = range_df['datetime'].dt.date
    range_df['hour'] = range_df['datetime'].dt.strftime('%H:%M')

    return range_df.sort_values('datetime').reset_index(drop=True)

# üìÜ Convert user input strings into a validated date range
def get_custom_range(start_str, end_str):
    """
    Convert string inputs into datetime objects and validate order.
    """
    try:
        start = pd.to_datetime(start_str)
        end = pd.to_datetime(end_str)
        if start > end:
            raise ValueError("Start date must be before end date.")
        return start, end
    except Exception as e:
        print(f"‚ùå Invalid date range: {e}")
        return None, None

# üîç Detect the header row in a CSV file
def detect_header(lines):
    """
    Detect the most likely header row in a CSV file.
    Looks for lines starting with 'date' or 'station' and containing commas.
    """
    for i, line in enumerate(lines):
        line_lower = line.strip().lower()
        if (line_lower.startswith("station") or line_lower.startswith("date")) and "," in line:
            columns = line.split(",")
            if len(columns) > 5:  # Header rows usually have multiple columns
                return i
    print("‚ö†Ô∏è Warning: header row not found. Defaulting to first line.")
    return 0


In [None]:
# üìÇ Step 4 ‚Äì Download Dublin Airport Daily Data CSV and Detect Header Row

from pathlib import Path
import requests

# --- Define output path for cleaned CSV ---
DATA_PATH = Path("data/dublin_airport_daily.csv")

# --- Download raw CSV from Met √âireann (Dublin Airport Daily Data) ---
url = "https://cli.fusio.net/cli/climate_data/webdata/dly532.csv"
response = requests.get(url)

# ‚úÖ Check for successful response
if response.status_code != 200:
    raise RuntimeError(f"‚ùå Failed to download data: HTTP {response.status_code}")

# --- Split response into lines ---
lines = response.text.splitlines()

# --- Detect header row using helper function ---
header_index = detect_header(lines)

# ‚úÖ Confirm detected header row
print(f"‚úÖ Header row detected at line {header_index}:")
print(lines[header_index])
