<h1><p style="text-align:center">Infrastructure Damage Prediction and Resource Optimization</p></h1>

# Business Problem

After a major earthquake, governments, NGOs, and environmental agencies allocate rescue teams, priortize inspections, distribute medical supplies, and allocate reconstruction funds but the downside of this is that inspecting every building immediately is impossible and resources that is been allocated by the concerned organizations is not sufficient putting into consideration that delays cost lives and also money.

As a data scientist i was tasked to with the below objectives.

# Business Objectives

Develop a predictive model that
1. Identifies buildings at high risk of severe damage
2. Prioritizes emergency response
3. Optimizes allocation of relief resources

# Analytical Objectives

1. Extract, process, and explore data.
2. Build a binary classification predictive model to identify buildings at high risk of severe damage.
3. Use model to conduct business impact analysis.

# Data Overview

The dataset for this project was extracted from a relational database and is based on a Nepal earthquake survey. It contains information about building structures and the corresponding damage grade assigned to each building after the earthquake. The data covers buildings from four districts in Nepal; Kavrepalenchok, Ramaechhap, Sindhupalchok, and Gorkha which are represented in the dataset by district_id values 1, 2, 3, and 4, respectively.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier, RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, precision_recall_curve, roc_curve, auc
import imblearn.over_sampling as OS
import imblearn.under_sampling as US

In [8]:
# read in data into pandas
df = pd.read_csv("../data/nepal_earthquake_damage.csv", index_col="b_id")
df.head()

Unnamed: 0_level_0,district_id,building_id,count_floors_pre_eq,count_floors_post_eq,age_building,plinth_area_sq_ft,height_ft_pre_eq,height_ft_post_eq,land_surface_condition,foundation_type,roof_type,ground_floor_type,other_floor_type,position,plan_configuration,condition_post_eq,superstructure,damage_grade
b_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
56,1,56,2,2,40,322,18,18,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Light roof,Mud,TImber/Bamboo-Mud,Not attached,Rectangular,Damaged-Not used,"Stone, mud mortar",Grade 2
63,1,63,2,2,1,437,16,16,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Light roof,Mud,Timber-Planck,Not attached,Rectangular,Not damaged,"Stone, mud mortar",Grade 1
97,1,97,2,2,22,420,16,16,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Light roof,Mud,TImber/Bamboo-Mud,Not attached,Rectangular,Damaged-Not used,"Stone, mud mortar",Grade 2
99,1,99,2,2,50,242,16,16,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Light roof,Mud,TImber/Bamboo-Mud,Not attached,Rectangular,Damaged-Not used,"Stone, mud mortar",Grade 4
115,1,115,2,2,12,308,16,16,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Light roof,Mud,Timber-Planck,Not attached,Rectangular,Damaged-Not used,"Stone, mud mortar",Grade 3


In [9]:
df.tail()

Unnamed: 0_level_0,district_id,building_id,count_floors_pre_eq,count_floors_post_eq,age_building,plinth_area_sq_ft,height_ft_pre_eq,height_ft_post_eq,land_surface_condition,foundation_type,roof_type,ground_floor_type,other_floor_type,position,plan_configuration,condition_post_eq,superstructure,damage_grade
b_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
234774,4,234774,2,0,45,336,18,0,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Heavy roof,Mud,Timber-Planck,Attached-1 side,Rectangular,Damaged-Rubble Clear-New building built,"Stone, mud mortar",Grade 5
234808,4,234808,2,0,70,255,18,0,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Light roof,Mud,Timber-Planck,Not attached,Rectangular,Damaged-Rubble Clear-New building built,"Stone, mud mortar",Grade 5
234821,4,234821,2,0,11,552,18,0,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Heavy roof,Mud,Timber-Planck,Attached-1 side,Rectangular,Damaged-Rubble Clear-New building built,"Stone, mud mortar",Grade 5
234828,4,234828,2,0,35,598,18,0,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Heavy roof,Mud,Timber-Planck,Not attached,Rectangular,Damaged-Rubble Clear-New building built,"Stone, mud mortar",Grade 5
234835,4,234835,2,0,12,840,18,0,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Heavy roof,Mud,Timber-Planck,Not attached,Rectangular,Damaged-Rubble clear,"Stone, mud mortar",Grade 5


In [10]:
# inspect dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 234835 entries, 56 to 234835
Data columns (total 18 columns):
 #   Column                  Non-Null Count   Dtype 
---  ------                  --------------   ----- 
 0   district_id             234835 non-null  int64 
 1   building_id             234835 non-null  int64 
 2   count_floors_pre_eq     234835 non-null  int64 
 3   count_floors_post_eq    234835 non-null  int64 
 4   age_building            234835 non-null  int64 
 5   plinth_area_sq_ft       234835 non-null  int64 
 6   height_ft_pre_eq        234835 non-null  int64 
 7   height_ft_post_eq       234835 non-null  int64 
 8   land_surface_condition  234835 non-null  object
 9   foundation_type         234835 non-null  object
 10  roof_type               234835 non-null  object
 11  ground_floor_type       234835 non-null  object
 12  other_floor_type        234835 non-null  object
 13  position                234834 non-null  object
 14  plan_configuration      234834 non-null 

In [11]:
# Check for duplicated rows
df.duplicated().sum()

np.int64(0)

In [13]:
df.describe()

Unnamed: 0,district_id,building_id,count_floors_pre_eq,count_floors_post_eq,age_building,plinth_area_sq_ft,height_ft_pre_eq,height_ft_post_eq
count,234835.0,234835.0,234835.0,234835.0,234835.0,234835.0,234835.0,234835.0
mean,2.77893,117418.0,2.088603,1.534456,26.950987,402.474623,16.085566,11.875385
std,1.037302,67791.16957,0.609112,1.031819,71.030354,195.838405,5.181656,8.281453
min,1.0,1.0,1.0,0.0,0.0,70.0,6.0,0.0
25%,2.0,58709.5,2.0,1.0,10.0,287.0,13.0,6.0
50%,3.0,117418.0,2.0,2.0,18.0,371.0,15.0,14.0
75%,4.0,176126.5,2.0,2.0,30.0,475.0,18.0,18.0
max,4.0,234835.0,9.0,9.0,999.0,4995.0,99.0,99.0


In [14]:
df.describe(include="object")

Unnamed: 0,land_surface_condition,foundation_type,roof_type,ground_floor_type,other_floor_type,position,plan_configuration,condition_post_eq,superstructure,damage_grade
count,234835,234835,234835,234835,234835,234834,234834,234835,234835,234824
unique,3,5,3,5,4,4,10,8,11,5
top,Flat,Mud mortar-Stone/Brick,Bamboo/Timber-Light roof,Mud,TImber/Bamboo-Mud,Not attached,Rectangular,Damaged-Not used,"Stone, mud mortar",Grade 4
freq,192510,214958,163976,200634,164167,193760,226806,85037,196841,69795
