# Richter's Predictor: Modeling Earthquake Damage Using Neural Networks
_Hosted By DrivenData_

The dataset mainly consists of information on the buildings' structure and their legal ownership. Each row in the dataset represents a specific building in the region that was hit by the Gorkha earthquake.

We're trying to predict the ordinal variable `damage_grade`, which represents a level of damage to the building that was hit by the earthquake. There are 3 grades of the damage:

 1. represents low damage
 2. represents a medium amount of damage
 3. represents almost complete destruction

The level of damage is an ordinal variable meaning that ordering is important. This can be viewed as a classification or an ordinal regression problem. 
 
To measure the performance of our algorithms, we'll use the _F1 score_ which balances the precision and recall of a classifier. Traditionally, the F1 score is used to evaluate performance on a binary classifier, but since we have three possible labels we will use a variant called the _micro averaged F1 score_.
 
 - [Loading data](#Loading-data)
 - [Model training](#Model-training)
 - [Performance metric for DrivenData competition](#Performance-metric-for-DrivenData-competition)

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow
from tensorflow import keras
from time import time
from pathlib import Path, PurePath
%matplotlib inline

In [3]:
print(f'TensorFlow version: {tensorflow.__version__}')
print(f'Keras version: {keras.__version__}')

TensorFlow version: 2.1.0
Keras version: 2.2.4-tf


In [6]:
project_root_dir = Path('/Users/angelo/Programming/data/modeling-earthquake-damage')
train_values_file = project_root_dir / 'train_values.csv'
train_labels_file = project_root_dir / 'train_labels.csv'
test_values_file = project_root_dir / 'test_values.csv'

In [7]:
train_values_df = pd.read_csv(train_values_file, index_col='building_id')
test_values_df = pd.read_csv(test_values_file, index_col='building_id')
train_labels_df = pd.read_csv(train_labels_file, index_col='building_id')

In [8]:
print(f'train_values_df.shape: {train_values_df.shape}')
print(f'train_labels_df.shape: {train_labels_df.shape}')
print(f'test_values_df.shape: {test_values_df.shape}')

train_values_df.shape: (260601, 38)
train_labels_df.shape: (260601, 1)
test_values_df.shape: (86868, 38)
