# World Disaster Risk Analysis 

In [47]:
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

import math as mt
import pandas as pd
from scipy.stats import norm

## Import Data

In [48]:
wri = pd.read_csv('Data/world_risk_index.csv')
wri.head()

Unnamed: 0,Region,WRI,Exposure,Vulnerability,Susceptibility,Lack of Coping Capabilities,Lack of Adaptive Capacities,Year,Exposure Category,WRI Category,Vulnerability Category,Susceptibility Category
0,Vanuatu,32.0,56.33,56.81,37.14,79.34,53.96,2011,Very High,Very High,High,High
1,Tonga,29.08,56.04,51.9,28.94,81.8,44.97,2011,Very High,Very High,Medium,Medium
2,Philippinen,24.32,45.09,53.93,34.99,82.78,44.01,2011,Very High,Very High,High,High
3,Salomonen,23.51,36.4,64.6,44.11,85.95,63.74,2011,Very High,Very High,Very High,High
4,Guatemala,20.88,38.42,54.35,35.36,77.83,49.87,2011,Very High,Very High,High,High


### Background: 

The WorldRiskIndex is a statistical model that provides an assessment of the latent risk of 193 countries falling victim to a humanitarian disaster caused by extreme natural events and the negative impacts of climate change. Based on peer-reviewed concepts of risk, hazard and vulnerability, it is assumed that disaster risks are not solely shaped by the occurrence, intensity, and duration of extreme natural events, but that social factors, political conditions, and economic structures are equally responsible for whether disasters occur in the context of extreme natural events. Accordingly, both main spheres of disaster risk, exposure and vulnerability, are treated as equals.

The WorldRiskIndex was initially developed in 2011 by the United Nations University Institute for Environment and Human Security (UNU-EHS) for Bündnis Entwicklung Hilft as a model with 27 indicators to analytically link and relate the two spheres of disaster risks – exposure to natural hazards such as earthquakes, storms or droughts, and societal capacities to respond to these kinds of events. The methodology of the WorldRiskIndex has been continuously revised and developed by the Institute for International Law of Peace and Armed Conflict (IFHV) since 2018. In 2022, a new, fully revised model of the WorldRiskIndex was published, enabling more accurate analyses by incorporating more than 100 high-quality indicators, new data sources, and more robust statistical methods, thus finally replacing the previously used model.

source: https://data.humdata.org/dataset/worldriskindex

### Description of the data: 

* Region: Name of the region.
* WRI: World Risk Score of the region.
* Exposure: Risk/exposure to natural hazards such as earthquakes, hurricanes, floods, droughts, and sea level rise.
* Vulnerability: Vulnerability depending on infrastructure, nutrition, housing situation, and economic framework conditions.
* Susceptibility: Susceptibility depending on infrastructure, nutrition, housing situation, and economic framework conditions.
* Lack of Coping Capabilities: Coping capacities in dependence of governance, preparedness and early warning, medical care, and social and material security.
* Lack of Adaptive Capacities: Adaptive capacities related to coming natural events, climate change, and other challenges.
* Year: Year data is being described.
* WRI Category: WRI Category for the given WRI Score.
* Exposure Category: Exposure Category for the given Exposure Score.
* Vulnerability Category: Vulnerability Category for the given Vulnerability Score.
* Susceptibility Category: Susceptibility Category for the given Susceptibility Score.

source:  https://www.kaggle.com/datasets/tr1gg3rtrash/global-disaster-risk-index-time-series-dataset/data

### Observations:

The data collected spans from 2011 to 2021. Based on the description, the two primary spheres of disaster risk are exposure and vulnerability, which are treated as equally important.

* __Exposure__ is influenced by the occurrence, intensity, and duration of extreme natural events.
* __Vulnerability__ is shaped by social factors, political conditions, and economic structures.

It will be interesting for us to explore the WRI of countries, with a focus on its  relationship exposure and vulnerability. 


## Data Cleaning

In [49]:
wri.isnull().sum()

Region                          0
WRI                             0
Exposure                        0
Vulnerability                   0
Susceptibility                  0
Lack of Coping Capabilities     0
 Lack of Adaptive Capacities    1
Year                            0
Exposure Category               0
WRI Category                    1
Vulnerability Category          4
Susceptibility Category         0
dtype: int64

In [50]:
wri[wri.duplicated()]

Unnamed: 0,Region,WRI,Exposure,Vulnerability,Susceptibility,Lack of Coping Capabilities,Lack of Adaptive Capacities,Year,Exposure Category,WRI Category,Vulnerability Category,Susceptibility Category


In [51]:
wri.rename(columns={' Lack of Adaptive Capacities': 'Lack of Adaptive Capacities'}, inplace=True)
wri.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1917 entries, 0 to 1916
Data columns (total 12 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Region                       1917 non-null   object 
 1   WRI                          1917 non-null   float64
 2   Exposure                     1917 non-null   float64
 3   Vulnerability                1917 non-null   float64
 4   Susceptibility               1917 non-null   float64
 5   Lack of Coping Capabilities  1917 non-null   float64
 6   Lack of Adaptive Capacities  1916 non-null   float64
 7   Year                         1917 non-null   int64  
 8   Exposure Category            1917 non-null   object 
 9   WRI Category                 1916 non-null   object 
 10  Vulnerability Category       1913 non-null   object 
 11  Susceptibility Category      1917 non-null   object 
dtypes: float64(6), int64(1), object(5)
memory usage: 179.8+ KB


In [52]:
wri.isnull().sum()

Region                         0
WRI                            0
Exposure                       0
Vulnerability                  0
Susceptibility                 0
Lack of Coping Capabilities    0
Lack of Adaptive Capacities    1
Year                           0
Exposure Category              0
WRI Category                   1
Vulnerability Category         4
Susceptibility Category        0
dtype: int64

To handle the missing "Category" values, we will fill in the missing values based on the data. For the quantitative data, we will replace missing values with the mean.

In [53]:
#show the location of the nulls
rows_with_nulls = wri[wri.isnull().any(axis=1)]
rows_with_nulls

Unnamed: 0,Region,WRI,Exposure,Vulnerability,Susceptibility,Lack of Coping Capabilities,Lack of Adaptive Capacities,Year,Exposure Category,WRI Category,Vulnerability Category,Susceptibility Category
1193,Österreich,2.87,13.18,21.75,13.63,39.27,12.34,2019,Medium,Very Low,,Very Low
1202,Deutschland,2.43,11.51,21.11,14.3,36.44,12.6,2019,Low,Very Low,,Very Low
1205,Norwegen,2.34,10.6,22.06,13.29,39.21,13.68,2019,Low,Very Low,,Very Low
1292,Föd. Staaten v. Mikronesien,7.59,14.95,50.77,31.79,72.13,48.39,2020,High,,High,High
1858,Korea Republic of 4.59,14.89,30.82,14.31,46.55,31.59,,2016,Very High,Very High,,High


In [54]:
#Calculate Min and Max Values for Each Category
data_2019 = wri[wri['Year'] == 2019]
category_stats = data_2020.groupby('Vulnerability Category')['Vulnerability'].agg(['min', 'max']).reset_index() #to dataframe
category_stats = category_stats.sort_values(by='min')
category_stats

Unnamed: 0,Vulnerability Category,min,max
4,Very Low,22.81,33.9
1,Low,34.14,42.1
2,Medium,42.39,48.04
0,High,48.13,60.6
3,Very High,61.5,76.34


In [55]:
#Fill in the missing values
wri.loc[1193,'Vulnerability Category'] = 'Very Low'
wri.loc[1202,'Vulnerability Category'] = 'Very Low'
wri.loc[1205,'Vulnerability Category'] = 'Very Low'

In [56]:
#Calculate Min and Max Values for Each Category
data_2016 = wri[wri['Year'] == 2016]
category_stats = data_2020.groupby('Vulnerability Category')['Vulnerability'].agg(['min', 'max']).reset_index() #to dataframe
category_stats = category_stats.sort_values(by='min')
category_stats

Unnamed: 0,Vulnerability Category,min,max
4,Very Low,22.81,33.9
1,Low,34.14,42.1
2,Medium,42.39,48.04
0,High,48.13,60.6
3,Very High,61.5,76.34


In [57]:
wri.loc[1858,'Vulnerability Category'] = 'Very Low'

In [58]:
#Calculate Min and Max Values for Each Category
data_2020 = wri[wri['Year'] == 2020]
category_stats = data_2020.groupby('WRI Category')['WRI'].agg(['min', 'max']).reset_index() #to dataframe
category_stats = category_stats.sort_values(by='min')
category_stats

Unnamed: 0,WRI Category,min,max
4,Very Low,0.31,3.14
1,Low,3.3,5.66
2,Medium,5.68,7.57
0,High,7.71,10.51
3,Very High,10.76,49.74


In [59]:
#Fill in the missing value
wri.loc[1292,'WRI Category'] = 'Medium'

In [60]:
mean_lack_adaptive = data_2016['Lack of Adaptive Capacities'].mean()
wri.loc[1858,'Lack of Adaptive Capacities']=mean_lack_adaptive

In [61]:
wri.isnull().sum()

Region                         0
WRI                            0
Exposure                       0
Vulnerability                  0
Susceptibility                 0
Lack of Coping Capabilities    0
Lack of Adaptive Capacities    0
Year                           0
Exposure Category              0
WRI Category                   0
Vulnerability Category         0
Susceptibility Category        0
dtype: int64

We have cleaned our dataset, making sure there are no null or duplicate values. 

__Note:__ Some of the missing category values fall outside the observed range, suggesting they may have been treated as outliers. Furthermore, filling in missing values for "Lack of Adaptive Capacities" with the mean could introduce bias. If we do not use models that rely on a complete dataset, it might be more appropriate to drop these rows with missing values to avoid potential distortion of analysis.

## Exploratory Data Analysis