# Climate Risk and Disaster Management Analysis
## Data Analysis Project

This notebook analyzes natural disasters data to understand patterns and impacts of climate-related disasters worldwide. We'll explore various aspects of natural disasters, their frequency, and their effects on different regions.

## 1. Import Required Libraries

First, we'll import all the necessary Python libraries for our analysis:

In [2]:
# Check if packages are installed and import them
try:
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    print("All required packages are already installed!")
except ImportError:
    print("Installing required packages...")
    import pip
    pip.main(['install', 'pandas', 'numpy', 'matplotlib', 'seaborn'])
    print("Packages installed successfully!")

All required packages are already installed!


In [5]:
# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for better visualizations
plt.style.use('default')  # Using default style instead
sns.set_theme()  # Apply seaborn theme

# Display all columns in dataframe
pd.set_option('display.max_columns', None)

## 2. Load the Dataset

We'll be using the Natural Disasters Dataset located in the DISASTERS folder. We have two datasets available:
- natural_disasters.csv
- data2.csv

Let's start by analyzing the main dataset (natural_disasters.csv).

In [None]:
# Set the file path to the dataset
file_path = 'natural_disasters.csv'  # Dataset should be in the same folder as notebook

# Load the dataset
df = pd.read_csv(file_path)

# Display the first few rows of the dataset
print("First few rows of the dataset:")
display(df.head())

First few rows of the dataset:


Unnamed: 0,Year,Seq,Glide,Disaster Group,Disaster Subgroup,Disaster Type,Disaster Subtype,Disaster Subsubtype,Event Name,Country,ISO,Region,Continent,Location,Origin,Associated Dis,Associated Dis2,OFDA Response,Appeal,Declaration,Aid Contribution,Dis Mag Value,Dis Mag Scale,Latitude,Longitude,Local Time,River Basin,Start Year,Start Month,Start Day,End Year,End Month,End Day,Total Deaths,No Injured,No Affected,No Homeless,Total Affected,Insured Damages ('000 US$),Total Damages ('000 US$),CPI,Adm Level,Admin1 Code,Admin2 Code,Geo Locations
0,1900,9002,,Natural,Climatological,Drought,Drought,,,Cabo Verde,CPV,Western Africa,Africa,Countrywide,,Famine,,,No,No,,,Km2,,,,,1900,,,1900,,,11000.0,,,,,,,3.221647,,,,
1,1900,9001,,Natural,Climatological,Drought,Drought,,,India,IND,Southern Asia,Asia,Bengal,,,,,No,No,,,Km2,,,,,1900,,,1900,,,1250000.0,,,,,,,3.221647,,,,
2,1902,12,,Natural,Geophysical,Earthquake,Ground movement,,,Guatemala,GTM,Central America,Americas,"Quezaltenango, San Marcos",,Tsunami/Tidal wave,,,,,,8.0,Richter,14.0,-91.0,20:20,,1902,4.0,18.0,1902,4.0,18.0,2000.0,,,,,,25000.0,3.350513,,,,
3,1902,3,,Natural,Geophysical,Volcanic activity,Ash fall,,Santa Maria,Guatemala,GTM,Central America,Americas,,,,,,,,,,,,,,,1902,4.0,8.0,1902,4.0,8.0,1000.0,,,,,,,3.350513,,,,
4,1902,10,,Natural,Geophysical,Volcanic activity,Ash fall,,Santa Maria,Guatemala,GTM,Central America,Americas,,,,,,,,,,,,,,,1902,10.0,24.0,1902,10.0,24.0,6000.0,,,,,,,3.350513,,,,


## 3. Explore the Dataset with .info()

The `.info()` function tells us important details about our dataset:
- Total number of rows (entries)
- Column names
- Data type of each column (int64, float64, object, etc.)
- Number of non-null values in each column
- Memory usage

This helps us understand:
1. How big our dataset is
2. What kind of data is in each column
3. If any columns have missing data

In [10]:
# Display basic information about the dataset
print("\nDataset Information:")
print("-------------------")
df.info()


Dataset Information:
-------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16126 entries, 0 to 16125
Data columns (total 45 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Year                        16126 non-null  int64  
 1   Seq                         16126 non-null  int64  
 2   Glide                       1581 non-null   object 
 3   Disaster Group              16126 non-null  object 
 4   Disaster Subgroup           16126 non-null  object 
 5   Disaster Type               16126 non-null  object 
 6   Disaster Subtype            13016 non-null  object 
 7   Disaster Subsubtype         1077 non-null   object 
 8   Event Name                  3861 non-null   object 
 9   Country                     16126 non-null  object 
 10  ISO                         16126 non-null  object 
 11  Region                      16126 non-null  object 
 12  Continent                   16126 non-null  ob

## 4. Check for Missing Values with .isnull().sum()

The `.isnull().sum()` command helps us find missing data:
- Shows each column name
- Counts how many missing values (NaN or None) are in each column
- A value of 0 means no missing data in that column
- Higher numbers indicate missing data that we might need to handle

This is important because:
1. Missing data can affect our analysis
2. We might need to fill or remove missing values
3. It helps us assess data quality

In [11]:
# Check for missing values
print("Missing Values Count:")
print("--------------------")
print(df.isnull().sum())

Missing Values Count:
--------------------
Year                              0
Seq                               0
Glide                         14545
Disaster Group                    0
Disaster Subgroup                 0
Disaster Type                     0
Disaster Subtype               3110
Disaster Subsubtype           15049
Event Name                    12265
Country                           0
ISO                               0
Region                            0
Continent                         0
Location                       1792
Origin                        12332
Associated Dis                12778
Associated Dis2               15419
OFDA Response                 14432
Appeal                        13557
Declaration                   12870
Aid Contribution              15449
Dis Mag Value                 11180
Dis Mag Scale                  1190
Latitude                      13397
Longitude                     13394
Local Time                    15023
River Basin          

## 5. Basic Statistical Summary with .describe()

The `.describe()` function gives us statistical information about numerical columns:
- count: number of non-null values
- mean: average value
- std: standard deviation (spread of data)
- min: minimum value
- 25%: first quartile
- 50%: median (middle value)
- 75%: third quartile
- max: maximum value

This helps us understand:
1. The range of our data
2. If there are unusual values (outliers)
3. How the data is distributed

In [12]:
# Generate descriptive statistics
print("Descriptive Statistics:")
print("----------------------")
display(df.describe())

Descriptive Statistics:
----------------------


Unnamed: 0,Year,Seq,Aid Contribution,Dis Mag Value,Start Year,Start Month,Start Day,End Year,End Month,End Day,Total Deaths,No Injured,No Affected,No Homeless,Total Affected,Insured Damages ('000 US$),Total Damages ('000 US$),CPI
count,16126.0,16126.0,677.0,4946.0,16126.0,15739.0,12498.0,16126.0,15418.0,12570.0,11413.0,3895.0,9220.0,2430.0,11617.0,1096.0,5245.0,15811.0
mean,1996.76479,714.78482,125413.6,47350.38,1996.77837,6.444374,15.233957,1996.835607,6.576728,15.77502,2842.866,2621.102,882361.2,73293.14,716508.8,798651.4,724783.5,63.215103
std,20.159065,1929.635089,2997875.0,309424.2,20.15571,3.393965,8.953821,20.14301,3.352965,8.865486,68605.95,34403.43,8573913.0,523005.8,7718598.0,3057638.0,4723131.0,26.734285
min,1900.0,1.0,1.0,-57.0,1900.0,1.0,1.0,1900.0,1.0,1.0,1.0,1.0,1.0,3.0,1.0,34.0,2.0,3.221647
25%,1989.0,93.0,175.0,7.0,1989.0,4.0,7.0,1989.0,4.0,8.0,6.0,14.0,1244.75,572.5,650.0,50000.0,8300.0,45.692897
50%,2001.0,270.0,721.0,151.5,2001.0,7.0,15.0,2001.0,7.0,16.0,20.0,50.0,10000.0,3000.0,5965.0,172500.0,60000.0,68.415379
75%,2011.0,486.0,3511.0,11296.5,2011.0,9.0,23.0,2011.0,9.0,23.0,63.0,200.0,91823.0,17500.0,58255.0,500000.0,317300.0,84.252733
max,2021.0,9881.0,78000000.0,13025870.0,2021.0,12.0,31.0,2021.0,12.0,31.0,3700000.0,1800000.0,330000000.0,15850000.0,330000000.0,60000000.0,210000000.0,100.0


## Data Exploration Analysis

Based on the exploratory data analysis performed above, we can observe several key aspects of our natural disasters dataset:

### Dataset Structure
- Examined the dataset's fundamental composition using `.info()`
- Analyzed the data types of each variable
- Verified the completeness of records across all features

### Data Quality Assessment
- Conducted missing value analysis using `.isnull().sum()`
- Identified potential data gaps or inconsistencies
- Evaluated data completeness for each feature

### Statistical Overview
- Performed descriptive statistical analysis using `.describe()`
- Examined the distribution of numerical variables
- Identified key statistical measures including central tendency and dispersion

This initial exploration provides the foundation for further in-depth analysis of climate-related disaster patterns and their impacts.

## Analysis of Secondary Dataset (data2.csv)

To ensure comprehensive analysis of all available data, let's examine our second dataset using the same systematic approach.

In [13]:
# Load the second dataset
file_path_2 = 'data2.csv'  # Dataset should be in the same folder as notebook
df2 = pd.read_csv(file_path_2)

# Display the first few rows of the second dataset
print("\nFirst few rows of data2.csv:")
print("---------------------------")
display(df2.head())


First few rows of data2.csv:
---------------------------


Unnamed: 0,Dis No,Year,Seq,Glide,Disaster Group,Disaster Subgroup,Disaster Type,Disaster Subtype,Disaster Subsubtype,Event Name,Country,ISO,Region,Continent,Location,Origin,Associated Dis,Associated Dis2,OFDA Response,Appeal,Declaration,Aid Contribution,Dis Mag Value,Dis Mag Scale,Latitude,Longitude,Local Time,River Basin,Start Year,Start Month,Start Day,End Year,End Month,End Day,Total Deaths,No Injured,No Affected,No Homeless,Total Affected,Reconstruction Costs ('000 US$),Insured Damages ('000 US$),Total Damages ('000 US$),CPI,Adm Level,Admin1 Code,Admin2 Code,Geo Locations
0,1970-0013-ARG,1970,13,,Natural,Hydrological,Flood,,,,Argentina,ARG,South America,Americas,Mendoza,,,,,,,,,Km2,,,,,1970,1.0,4.0,1970,1.0,4.0,36.0,,,,,,,25000.0,15.001282,,,,
1,1970-0109-AUS,1970,109,,Natural,Meteorological,Storm,Tropical cyclone,,Ada,Australia,AUS,Australia and New Zealand,Oceania,Queensland,,,,,,,,,Kph,,,,,1970,1.0,,1970,1.0,,13.0,,,,,,,72475.0,15.001282,,,,
2,1970-0044-BEN,1970,44,,Natural,Hydrological,Flood,,,,Benin,BEN,Western Africa,Africa,Atacora region,,,,Yes,,,,,Km2,,,,,1970,9.0,,1970,9.0,,,,,,,,,200.0,15.001282,,,,
3,1970-0063-BGD,1970,63,,Natural,Meteorological,Storm,Tropical cyclone,,,Bangladesh,BGD,Southern Asia,Asia,"Khulna, Chittagong",,,,Yes,,,,,Kph,,,,,1970,11.0,12.0,1970,11.0,12.0,300000.0,,3648000.0,,3648000.0,,,86400.0,15.001282,,,,
4,1970-0026-BGD,1970,26,,Natural,Meteorological,Storm,,,,Bangladesh,BGD,Southern Asia,Asia,,,,,,,,,,Kph,,,,,1970,4.0,13.0,1970,4.0,13.0,17.0,,110.0,,110.0,,,,15.001282,,,,


In [None]:
# Display information about the second dataset
print("\nDataset Information (data2.csv):")
print("------------------------------")
df2.info()

In [None]:
# Check for missing values in the second dataset
print("\nMissing Values Count (data2.csv):")
print("-------------------------------")
print(df2.isnull().sum())

In [None]:
# Generate descriptive statistics for the second dataset
print("\nDescriptive Statistics (data2.csv):")
print("--------------------------------")
display(df2.describe())