### Project Title
RiskScope: Global Disaster Prediction

## Problem Statement 
The project uses a dataset and a simple prediction model to figure out which places in the world are most at risk from natural disasters like floods or storms. It looks at factors like how exposed a place is, how poor or unprepared people are, and how risks have changed over time to predict a disaster risk score (WRI)

### Description 
The dataset includes the World Risk Index (WRI) and factors like Exposure (disaster likelihood), Susceptibility (poverty, weak infrastructure), Lack of Coping/Adaptive Capabilities, and Year for regions worldwide, aiming to identify disaster-prone areas. Using a simple regression model, we predict WRI to pinpoint high-risk regions and key risk drivers. The model and clear charts help governments and communities plan better defenses, like stronger shelters or emergency training, to reduce disaster impacts. Categorical columns (WRI Category, etc.) validate predictions for actionable insights.

In [2]:
## Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
from yellowbrick.regressor import PredictionError  # Optional visualizer
import shap  # Optional explainer
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/global-disaster-risk-index-time-series-dataset/world_risk_index.csv


In [3]:
#import the dataset 
df = pd.read_csv("/kaggle/input/global-disaster-risk-index-time-series-dataset/world_risk_index.csv")
df.head(10)

Unnamed: 0,Region,WRI,Exposure,Vulnerability,Susceptibility,Lack of Coping Capabilities,Lack of Adaptive Capacities,Year,Exposure Category,WRI Category,Vulnerability Category,Susceptibility Category
0,Vanuatu,32.0,56.33,56.81,37.14,79.34,53.96,2011,Very High,Very High,High,High
1,Tonga,29.08,56.04,51.9,28.94,81.8,44.97,2011,Very High,Very High,Medium,Medium
2,Philippinen,24.32,45.09,53.93,34.99,82.78,44.01,2011,Very High,Very High,High,High
3,Salomonen,23.51,36.4,64.6,44.11,85.95,63.74,2011,Very High,Very High,Very High,High
4,Guatemala,20.88,38.42,54.35,35.36,77.83,49.87,2011,Very High,Very High,High,High
5,Bangladesch,17.45,27.52,63.41,44.96,86.49,58.77,2011,Very High,Very High,Very High,High
6,Timor-Leste,17.45,25.97,67.17,52.42,89.16,59.93,2011,Very High,Very High,Very High,Very High
7,Costa Rica,16.74,42.39,39.5,21.96,63.39,33.14,2011,Very High,Very High,Low,Low
8,Kambodscha,16.58,26.66,62.18,48.28,86.43,51.81,2011,Very High,Very High,High,High
9,El Salvador,16.49,32.18,51.24,30.55,75.35,47.82,2011,Very High,Very High,Medium,Medium


In [4]:
#exploring and Understanding the dataset

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1917 entries, 0 to 1916
Data columns (total 12 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Region                        1917 non-null   object 
 1   WRI                           1917 non-null   float64
 2   Exposure                      1917 non-null   float64
 3   Vulnerability                 1917 non-null   float64
 4   Susceptibility                1917 non-null   float64
 5   Lack of Coping Capabilities   1917 non-null   float64
 6    Lack of Adaptive Capacities  1916 non-null   float64
 7   Year                          1917 non-null   int64  
 8   Exposure Category             1917 non-null   object 
 9   WRI Category                  1916 non-null   object 
 10  Vulnerability Category        1913 non-null   object 
 11  Susceptibility Category       1917 non-null   object 
dtypes: float64(6), int64(1), object(5)
memory usage: 179.8+ KB


In [5]:
#statistical description of the dataset
df.describe()

Unnamed: 0,WRI,Exposure,Vulnerability,Susceptibility,Lack of Coping Capabilities,Lack of Adaptive Capacities,Year
count,1917.0,1917.0,1917.0,1917.0,1917.0,1916.0,1917.0
mean,7.551763,15.388336,48.075759,30.739431,70.438289,43.090511,2016.049557
std,5.553257,10.240135,13.835666,15.66703,15.038854,13.551156,3.182045
min,0.02,0.05,14.31,8.26,31.59,11.16,2011.0
25%,3.74,10.16,37.04,17.79,59.33,33.1925,2013.0
50%,6.52,12.76,47.1,25.4,74.23,43.08,2016.0
75%,9.4,16.45,60.06,42.64,83.0,53.065,2019.0
max,56.71,99.88,76.47,70.83,94.36,76.11,2021.0


In [6]:
#check for all the null values
df.isnull().sum()

Region                          0
WRI                             0
Exposure                        0
Vulnerability                   0
Susceptibility                  0
Lack of Coping Capabilities     0
 Lack of Adaptive Capacities    1
Year                            0
Exposure Category               0
WRI Category                    1
Vulnerability Category          4
Susceptibility Category         0
dtype: int64