<a href="https://colab.research.google.com/github/JDCurry/fema-disaster-prediction/blob/main/notebooks/FEMA_Disaster_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FEMA Disaster Data Analysis

This notebook explores the FEMA Disaster Declarations dataset to understand patterns and inform our prediction model.

First, let's set up our environment and fetch the data.

In [None]:
# Install required packages
!pip install pandas numpy matplotlib seaborn scikit-learn requests

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import requests
import json

# Set plotting style
sns.set_theme(style='whitegrid')  # Using set_theme instead of style.use
sns.set_palette("husl")

In [None]:
# Fetch data directly from FEMA API
def fetch_fema_data():
    print("Fetching data from FEMA API...")
    url = "https://www.fema.gov/api/open/v2/DisasterDeclarationsSummaries"
    response = requests.get(url)

    if response.status_code == 200:
        data = response.json()
        df = pd.DataFrame(data["DisasterDeclarationsSummaries"])
        return df
    else:
        raise Exception(f"Failed to fetch data: {response.status_code}")

# Load the data
try:
    df = fetch_fema_data()
    print("Data successfully loaded!")

    # Display basic information
    print("\nDataset Shape:", df.shape)
    print("\nColumns:", df.columns.tolist())
    print("\nData Types:\n", df.dtypes)
    print("\nMissing Values:\n", df.isnull().sum())
except Exception as e:
    print(f"Error loading data: {str(e)}")

## 2. Temporal Analysis

In [None]:
# Convert date columns
date_columns = ['declarationDate', 'incidentBeginDate', 'incidentEndDate']
for col in date_columns:
    df[col] = pd.to_datetime(df[col])

# Analyze disasters over time
plt.figure(figsize=(15, 6))
df.groupby(df['declarationDate'].dt.year)['incidentType'].count().plot(kind='line')
plt.title('Number of Disaster Declarations by Year')
plt.xlabel('Year')
plt.ylabel('Number of Declarations')
plt.show()

# Seasonal patterns
plt.figure(figsize=(15, 6))
df.groupby(df['declarationDate'].dt.month)['incidentType'].count().plot(kind='bar')
plt.title('Disaster Declarations by Month')
plt.xlabel('Month')
plt.ylabel('Number of Declarations')
plt.show()

## 3. Geographic Analysis

In [None]:
# State-level analysis
plt.figure(figsize=(15, 8))
state_counts = df['state'].value_counts().head(15)
sns.barplot(x=state_counts.values, y=state_counts.index)
plt.title('Top 15 States by Number of Disaster Declarations')
plt.xlabel('Number of Declarations')
plt.ylabel('State')
plt.show()

# Region-level analysis
plt.figure(figsize=(12, 6))
df['region'].value_counts().plot(kind='bar')
plt.title('Disaster Declarations by FEMA Region')
plt.xlabel('Region')
plt.ylabel('Number of Declarations')
plt.xticks(rotation=45)
plt.show()

# Create a heatmap of disaster types by region
pivot_table = pd.crosstab(df['region'], df['incidentType'])
plt.figure(figsize=(15, 8))
sns.heatmap(pivot_table, annot=True, fmt='d', cmap='YlOrRd')
plt.title('Distribution of Disaster Types by Region')
plt.show()

# Print summary statistics
print("\nGeographic Analysis Summary:")
print(f"Number of states affected: {df['state'].nunique()}")
print(f"Number of regions: {df['region'].nunique()}")
print("\nTop 5 states by disaster declarations:")
print(df['state'].value_counts().head().to_frame())

## 4. Disaster Type Analysis

In [None]:
# Distribution of disaster types
plt.figure(figsize=(12, 6))
disaster_counts = df['incidentType'].value_counts()
sns.barplot(x=disaster_counts.values, y=disaster_counts.index)
plt.title('Distribution of Disaster Types')
plt.xlabel('Number of Declarations')
plt.show()

# Calculate and display average duration by disaster type
df['incident_duration'] = (df['incidentEndDate'] - df['incidentBeginDate']).dt.days

plt.figure(figsize=(12, 6))
sns.boxplot(x='incidentType', y='incident_duration', data=df)
plt.title('Incident Duration by Disaster Type')
plt.xticks(rotation=45)
plt.show()

# Temporal patterns by disaster type
plt.figure(figsize=(15, 6))
annual_counts = df.groupby([df['declarationDate'].dt.year, 'incidentType']).size().unstack()
annual_counts.plot(kind='line', marker='o')
plt.title('Disaster Types Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Declarations')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Print summary statistics
print("\nDisaster Type Analysis Summary:")
print(f"Total number of disaster types: {df['incidentType'].nunique()}")
print("\nDisaster type distribution:")
print(df['incidentType'].value_counts().to_frame())
print("\nAverage duration by disaster type (days):")
print(df.groupby('incidentType')['incident_duration'].mean().round(1).sort_values(ascending=False))

## Conclusions

From this analysis, we can conclude:
1. Temporal patterns in disaster occurrences
2. Geographic distribution of different disaster types
3. Effectiveness of our seasonal risk scoring
4. Most important features for prediction

These insights have been incorporated into our prediction model in the main application.