**Project Title:**

DisasterGuard: AI-Powered Natural Disaster Prediction System

**Problem Statement:**

Natural disasters like floods, earthquakes, and wildfires pose serious threats to life and infrastructure. Accurate predictions are essential for timely preparation and risk reduction. This project aims to develop a machine learning system that predicts the most likely disaster for a given region based on historical, climatic, and geographic data.

**Description:**

“DisasterGuard” uses a dataset of 15 disaster types to build a multi-class classification model. Input features include location, climate data, and historical disaster records. The model predicts the likely disaster type and can optionally provide probabilities for all disaster types. The system can be deployed with a dashboard to help authorities and communities make informed decisions and prepare effectively.

In [1]:
# Data handling
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier  # optional alternative
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Model saving/loading
import joblib

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")

# Inline plots
%matplotlib inline


In [5]:
import pandas as pd


df = pd.read_csv("../data/Natural disaster.csv")


print(df.head())

   Year   Seq Glide Disaster Group Disaster Subgroup      Disaster Type  \
0  1900  9002   NaN        Natural    Climatological            Drought   
1  1900  9001   NaN        Natural    Climatological            Drought   
2  1902    12   NaN        Natural       Geophysical         Earthquake   
3  1902     3   NaN        Natural       Geophysical  Volcanic activity   
4  1902    10   NaN        Natural       Geophysical  Volcanic activity   

  Disaster Subtype Disaster Subsubtype   Event Name     Country  ...  \
0          Drought                 NaN          NaN  Cabo Verde  ...   
1          Drought                 NaN          NaN       India  ...   
2  Ground movement                 NaN          NaN   Guatemala  ...   
3         Ash fall                 NaN  Santa Maria   Guatemala  ...   
4         Ash fall                 NaN  Santa Maria   Guatemala  ...   

  No Affected No Homeless Total Affected Insured Damages ('000 US$)  \
0         NaN         NaN            NaN     

In [28]:
print(df.info())
print(df.describe())
print(df.isnull().sum())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16126 entries, 0 to 16125
Data columns (total 45 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Year                        16126 non-null  int64  
 1   Seq                         16126 non-null  int64  
 2   Glide                       1581 non-null   object 
 3   Disaster Group              16126 non-null  object 
 4   Disaster Subgroup           16126 non-null  object 
 5   Disaster Type               16126 non-null  object 
 6   Disaster Subtype            13016 non-null  object 
 7   Disaster Subsubtype         1077 non-null   object 
 8   Event Name                  3861 non-null   object 
 9   Country                     16126 non-null  object 
 10  ISO                         16126 non-null  object 
 11  Region                      16126 non-null  object 
 12  Continent                   16126 non-null  object 
 13  Location                    143