# Exploratory Data Analysis (EDA)
## WiDS Global Datathon â€“ Wildfire Prediction

This notebook explores the structure of the wildfire dataset used to predict
whether and when a wildfire will threaten an evacuation zone. All analysis
focuses on information available within the first five hours after ignition.

In [13]:
import sys
sys.executable


'c:\\Users\\danni\\AppData\\Local\\Programs\\Python\\Python310\\python.exe'

## Data Loading

We begin by loading the training data, test data, and metadata file provided
by the competition.


In [14]:
import pandas as pd
import numpy as np

train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
metadata = pd.read_csv("metaData.csv")

train.shape, test.shape


((221, 37), (95, 35))

## Feature Overview

This section inspects the available features in the training dataset,
including fire growth metrics, spatial movement indicators, and temporal
context variables.


In [15]:
train.columns




Index(['event_id', 'num_perimeters_0_5h', 'dt_first_last_0_5h',
       'low_temporal_resolution_0_5h', 'area_first_ha', 'area_growth_abs_0_5h',
       'area_growth_rel_0_5h', 'area_growth_rate_ha_per_h', 'log1p_area_first',
       'log1p_growth', 'log_area_ratio_0_5h', 'relative_growth_0_5h',
       'radial_growth_m', 'radial_growth_rate_m_per_h',
       'centroid_displacement_m', 'centroid_speed_m_per_h',
       'spread_bearing_deg', 'spread_bearing_sin', 'spread_bearing_cos',
       'dist_min_ci_0_5h', 'dist_std_ci_0_5h', 'dist_change_ci_0_5h',
       'dist_slope_ci_0_5h', 'closing_speed_m_per_h',
       'closing_speed_abs_m_per_h', 'projected_advance_m',
       'dist_accel_m_per_h2', 'dist_fit_r2_0_5h', 'alignment_cos',
       'alignment_abs', 'cross_track_component', 'along_track_speed',
       'event_start_hour', 'event_start_dayofweek', 'event_start_month',
       'time_to_hit_hours', 'event'],
      dtype='object')

## Target Variable Investigation

The dataset does not include separate binary labels for each prediction horizon.
Instead, targets are defined using a time-to-event formulation.


In [16]:
[col for col in train.columns if "12" in col or "24" in col or "48" in col or "72" in col]

[]

## Time-to-Event Labels

The variable `event` indicates whether a wildfire ever threatens an evacuation zone.
The variable `time_to_hit_hours` represents the number of hours after ignition at
which that threat occurs. These variables are used to derive binary targets for
multiple prediction horizons.


## Prediction Horizon (12h, 24h, 48h, 72h)

The competition requires prediciting the probability that a wildfire will threaten an evacuation zone within multiple future time horizons. Rather than providing separate labels, the dataset uses a time-to-event representation.

Binary targets for each horizon can be derived as follows:
- A fire is considered a positive case for horizon H if:
   - event == 1, and 
   - time_to_hit_hours <= H

Fires with event == 0 are treated as right-censored observations. This formulation allows models to produce calibrated probabiliites across multiple time horizons using a shared representation.