<a href="https://colab.research.google.com/github/Rozieyati/Data-Science-Project/blob/main/P166417_Project_2_Data_Science.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Exploratory Time Series Analysis of Global Disasters (2000 - 2025)**

# **Introduction**

Flood disasters represent one of the most significant natural hazards worldwide, causing extensive loss of life, displacement of populations, and substantial economic damage. Globally, floods account for a large proportion of disaster-related losses, particularly in low- and middle-income countries where vulnerability and exposure are high (United Nations Office for Disaster Risk Reduction [UNDRR], 2022). In recent decades, the increasing frequency and severity of flood events have been closely linked to climate change, rapid urbanization, and environmental degradation, which intensify extreme precipitation and river overflow events (Intergovernmental Panel on Climate Change [IPCC], 2023).

This study aims to conduct an exploratory time series analysis of global flood disasters from 2000 to 2025 using data obtained from the EM-DAT International Disaster Database. EM-DAT is a widely used and authoritative global disaster database that systematically records the occurrence and impacts of natural disasters worldwide (Centre for Research on the Epidemiology of Disasters [CRED], 2024). The analysis focuses on identifying temporal trends, variations, and relationships between key flood impact indicators, including mortality, affected population, and economic damage. The findings are intended to support evidence-based disaster risk management and climate adaptation planning for policymakers and relevant stakeholders.

Dataset Source (RAW DATA):
https://www.emdat.be/

# **Problem Statement**

Despite growing awareness of flood risks, many regions continue to experience severe flood impacts with substantial human and economic losses. Previous studies have shown that inadequate preparedness, rapid urban expansion in flood-prone areas, and climate-induced extreme weather events significantly increase flood vulnerability (World Bank, 2021). Understanding long-term flood trends and their impacts is therefore essential for improving disaster preparedness, mitigation strategies, and resilience planning.

This study addresses the following research questions:

1. What are the temporal trends in flood occurrence from 2000 to 2025?

2. How have flood-related deaths and affected populations changed over time?

3. Which countries experience the highest frequency of flood disasters?

4. What relationships exist between flood-related deaths, affected populations, and economic damage?

5. How can these findings support disaster risk reduction and climate adaptation planning, as emphasized in global disaster risk frameworks (UNDRR, 2022)?

In [1]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")


In [4]:
# Upload Dataset

from google.colab import files

uploaded = files.upload()


Saving Asia_public_emdat_custom_request_2025-12-31_8e655a9a-5b34-45f3-addd-7731d263abd4.xlsx to Asia_public_emdat_custom_request_2025-12-31_8e655a9a-5b34-45f3-addd-7731d263abd4.xlsx


In [5]:
# Read Dataset

filename = list(uploaded.keys())[0]

# Read Excel file
df = pd.read_excel(filename)

df.head()


Unnamed: 0,DisNo.,Historic,Classification Key,Disaster Group,Disaster Subgroup,Disaster Type,Disaster Subtype,External IDs,Event Name,ISO,...,"Reconstruction Costs, Adjusted ('000 US$)",Insured Damage ('000 US$),"Insured Damage, Adjusted ('000 US$)",Total Damage ('000 US$),"Total Damage, Adjusted ('000 US$)",CPI,Admin Units,GADM Admin Units,Entry Date,Last Update
0,2010-0562-IDN,No,nat-geo-vol-ash,Natural,Geophysical,Volcanic activity,Ash fall,GLIDE:VO-2010-000214,Mt. Merapi,IDN,...,,,,,,69.513293,"[{""adm2_code"":17985,""adm2_name"":""Sleman""},{""ad...","[{""gid_2"":""IDN.10.14_1"",""migration_date"":""2025...",2014-07-28,2025-12-20
1,2022-0418-IDN,No,nat-hyd-flo-flo,Natural,Hydrological,Flood,Flood (General),,,IDN,...,,,,,,93.294607,,,2022-07-12,2023-09-26
2,2022-0707-PHL,No,nat-met-sto-tro,Natural,Meteorological,Storm,Tropical cyclone,GLIDE:TC-2022-000352,Storm 'Nalgae' (Paeng),PHL,...,,,,45569.0,48844.0,93.294607,,,2022-10-28,2023-09-26
3,2022-0736-IDN,No,nat-hyd-flo-flo,Natural,Hydrological,Flood,Flood (General),,,IDN,...,,,,,,93.294607,,,2022-11-14,2023-09-26
4,2000-0108-IDN,No,nat-bio-epi-vir,Natural,Biological,Epidemic,Viral disease,,Dengue fever,IDN,...,,,,,,54.895152,,,2003-07-01,2023-09-25


In [6]:
# Data understanding

df.info()

df.isnull().sum()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1381 entries, 0 to 1380
Data columns (total 47 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   DisNo.                                     1381 non-null   object 
 1   Historic                                   1381 non-null   object 
 2   Classification Key                         1381 non-null   object 
 3   Disaster Group                             1381 non-null   object 
 4   Disaster Subgroup                          1381 non-null   object 
 5   Disaster Type                              1381 non-null   object 
 6   Disaster Subtype                           1381 non-null   object 
 7   External IDs                               550 non-null    object 
 8   Event Name                                 425 non-null    object 
 9   ISO                                        1381 non-null   object 
 10  Country                 

Unnamed: 0,0
DisNo.,0
Historic,0
Classification Key,0
Disaster Group,0
Disaster Subgroup,0
Disaster Type,0
Disaster Subtype,0
External IDs,831
Event Name,956
ISO,0


In [12]:
# Data Cleaning

# Reload the dataset to ensure original columns are available
df = pd.read_excel(filename)

# Select relevant columns
df = df[
    ['Start Year', 'Country', 'Total Deaths', 'Total Affected', "Total Damage ('000 US$)"]
].copy() # Use .copy() to ensure we are working on a new DataFrame

# Rename columns
df.columns = ['Year', 'Country', 'Deaths', 'Affected', 'Damage']

# Convert data type
df['Year'] = df['Year'].astype(int)

# Handle missing values
df.loc[:, ['Deaths', 'Affected', 'Damage']] = df[['Deaths', 'Affected', 'Damage']].fillna(0)

# Filter study period
df = df[(df['Year'] >= 2000) & (df['Year'] <= 2025)]

df.head()

Unnamed: 0,Year,Country,Deaths,Affected,Damage
0,2010,Indonesia,322.0,137140.0,0.0
1,2022,Indonesia,7.0,3126.0,0.0
2,2022,Philippines,158.0,3323291.0,45569.0
3,2022,Indonesia,3.0,7000.0,0.0
4,2000,Indonesia,10.0,1516.0,0.0


# Data Cleaning Explanation

The raw EM-DAT dataset contains missing values and inconsistencies typical of
large-scale disaster databases. Data cleaning involved selecting relevant variables,
standardizing column names, converting data types, handling missing values by
replacing them with zeros, and filtering the dataset to the study period (2000–2025).
These steps ensure data reliability and consistency for subsequent analysis.


# **Findings and Discussion**

The time series analysis reveals an overall increasing trend in the number of recorded flood events, particularly after 2010. This pattern is consistent with global assessments that report an increase in hydrometeorological disasters due to climate change and increased exposure of populations and assets (IPCC, 2023; UNDRR, 2022). The observed rise in flood frequency highlights the growing impact of extreme rainfall events and changing climate patterns.

Although flood-related deaths fluctuate annually, several years exhibit sharp spikes in mortality, indicating the occurrence of extreme flood events with severe consequences. Similar patterns have been reported in global disaster studies, which note that a small number of high-impact flood events often account for a large proportion of total disaster-related fatalities (World Bank, 2021).

Furthermore, countries experiencing frequent flood events tend to face recurring socio-economic losses, reflecting structural vulnerabilities such as high population density, insufficient drainage infrastructure, and limited disaster preparedness capacity. The correlation analysis demonstrates positive relationships between flood-related deaths, affected populations, and economic damage, suggesting that severe flood events often result in compounded human and economic impacts. These findings align with previous research emphasizing the need for integrated flood risk management and early warning systems (UNDRR, 2022).


# **Conclusion**

This study provides a comprehensive exploratory time series analysis of global flood disasters from 2000 to 2025 using EM-DAT data. The findings indicate an increasing frequency of flood events and substantial variability in their human and economic impacts over time. These results reinforce existing evidence that climate change and increased exposure are key drivers of rising flood risks worldwide (IPCC, 2023; World Bank, 2021).

The study underscores the importance of data-driven decision-making in disaster risk management and climate adaptation planning, particularly for countries that are highly vulnerable to flooding. Future research may extend this analysis by incorporating climate variables, socio-economic indicators, or predictive modeling approaches to further enhance flood risk assessment and preparedness strategies, as recommended by global disaster risk reduction frameworks (UNDRR, 2022).

# **References**

Centre for Research on the Epidemiology of Disasters. (2024). EM-DAT: *The International Disaster Database*. https://www.emdat.be/

Intergovernmental Panel on Climate Change. (2023). *Sixth assessment report: Impacts, adaptation and vulnerability*. IPCC.

United Nations Office for Disaster Risk Reduction. (2022). Global assessment report on disaster risk reduction. UNDRR.

World Bank. (2021). *Climate change and disaster risk management*. World Bank Publications.