# Global Disaster Response Analysis (2018–2024)

## Abstract
We analyze 50,000 global disaster records from 2018 to 2024 to examine how response characteristics relate to recovery duration. Using exploratory data analysis and a supervised regression model, we investigate associations between aid amount, response time, response efficiency, and recovery days.

## Motivation
Understanding disaster recovery timelines is critical for effective disaster management and resource allocation. While many studies focus on disaster severity, fewer examine how response characteristics relate to recovery duration at a global scale.

## Data
We use the Global Disaster Response Analysis dataset (2018–2024) from Kaggle. The dataset contains 50,000 records and 12 variables, including disaster type, country, severity index, casualties, economic loss, response time, aid amount, response efficiency score, and recovery days.

## Research Questions
- How does recovery time vary across disaster types?
- What is the relationship between aid amount, response metrics, and recovery duration?
- Can recovery duration be predicted using available response and impact variables?

## Methods
We first perform exploratory data analysis to visualize relationships between key variables. We then frame recovery time prediction as a regression problem and apply a k-nearest neighbors (KNN) model using standardized numeric features and encoded categorical variables.



In [1]:
import pandas as pd
metrics = pd.read_csv("outputs/ml/metrics.csv")
metrics

Unnamed: 0,model,RMSE,MAE,R2
0,ridge,4.978634,3.970246,0.939425
1,random_forest,5.093356,4.059898,0.936602
2,lasso,5.099404,4.062804,0.936451
3,baseline_mean,20.228813,16.355676,-2.5e-05


We evaluate model performance using RMSE, MAE, and R², all reported in days where applicable.
Among the evaluated models, Ridge regression achieves the lowest RMSE, indicating the best
predictive accuracy on the held-out test set.

The median recovery time in the test data is approximately 50 days, with values ranging from
2 to 111 days. An RMSE on the order of 25 days therefore represents a substantial level of
uncertainty. This suggests that while the model captures some systematic signal, recovery
time remains only partially predictable using the available covariates.


## Results
This section summarizes the main descriptive patterns from EDA and the predictive performance of our model. Figures and summary tables are loaded from outputs generated in the analysis notebooks.


In [1]:
from IPython.display import Image, display
from pathlib import Path

# Example: display a saved figure if it exists
p = Path("outputs/eda")
if p.exists():
    for img_path in sorted(p.glob("*.png"))[:3]:
        display(Image(filename=str(img_path)))
else:
    print("No outputs/eda directory found yet. Run eda.ipynb and save figures to outputs/eda/ first.")
