### Labels

Both are binary variables: 0 = No; 1 = Yes. Some respondents didn't get either vaccine, others got only one, and some got both. This is formulated as a multilabel (and not multiclass) problem.<br><br>

We're trying to predict the ordinal variable damage_grade, which represents a level of damage to the building that was hit by the earthquake. There are 3 grades of the damage:<br><br>

1) 1 represents low damage<br>
2) 2 represents a medium amount of damage<br>
3) 3 represents almost complete destruction<br>

### Features

The dataset mainly consists of information on the buildings' structure and their legal ownership. Each row in the dataset represents a specific building in the region that was hit by Gorkha earthquake.

There are 39 columns in this dataset, where the building_id column is a unique and random identifier. The remaining 38 features are described in the section below. Categorical variables have been obfuscated random lowercase ascii characters. The appearance of the same character in distinct columns does not imply the same original value.

### Description

* geo_level_1_id, geo_level_2_id, geo_level_3_id (type: int): geographic region in which building exists, from largest (level 1) to most specific sub-region (level 3). Possible values: level 1: 0-30, level 2: 0-1427, level 3: 0-12567.<br><br>
* count_floors_pre_eq (type: int): number of floors in the building before the earthquake.<br><br>
* age (type: int): age of the building in years.<br><br>
* area_percentage (type: int): normalized area of the building footprint.<br><br>
* height_percentage (type: int): normalized height of the building footprint.<br><br>
* land_surface_condition (type: categorical): surface condition of the land where the building was built. Possible values: n, o, t.<br><br>
* foundation_type (type: categorical): type of foundation used while building. Possible values: h, i, r, u, w.<br><br>
* roof_type (type: categorical): type of roof used while building. Possible values: n, q, x.<br><br>
* ground_floor_type (type: categorical): type of the ground floor. Possible values: f, m, v, x, z.<br><br>
other_floor_type (type: categorical): type of constructions used in higher than the ground floors (except of roof). Possible values: j, q, s, x.<br><br>
* position (type: categorical): position of the building. Possible values: j, o, s, t.<br><br>
* plan_configuration (type: categorical): building plan configuration. Possible values: a, c, d, f, m, n, o, q, s, u.<br><br>

* has_superstructure_adobe_mud (type: binary): flag variable that indicates if the superstructure was made of Adobe/Mud.<br><br>
* has_superstructure_mud_mortar_stone (type: binary): flag variable that indicates if the superstructure was made of Mud Mortar - Stone.<br><br>
* has_superstructure_stone_flag (type: binary): flag variable that indicates if the superstructure was made of Stone.<br><br>
* has_superstructure_cement_mortar_stone (type: binary): flag variable that indicates if the superstructure was made of Cement Mortar - Stone.<br><br>
* has_superstructure_mud_mortar_brick (type: binary): flag variable that indicates if the superstructure was made of Mud Mortar - Brick.<br><br>
* has_superstructure_cement_mortar_brick (type: binary): flag variable that indicates if the superstructure was made of Cement Mortar - Brick.<br><br>
* has_superstructure_timber (type: binary): flag variable that indicates if the superstructure was made of Timber.<br><br>
* has_superstructure_bamboo (type: binary): flag variable that indicates if the superstructure was made of Bamboo.<br><br>
* has_superstructure_rc_non_engineered (type: binary): flag variable that indicates if the superstructure was made of non-engineered reinforced concrete.<br><br>
* has_superstructure_rc_engineered (type: binary): flag variable that indicates if the superstructure was made of engineered reinforced concrete.<br><br>
* has_superstructure_other (type: binary): flag variable that indicates if the superstructure was made of any other material.<br><br>

* legal_ownership_status (type: categorical): legal ownership status of the land where building was built. Possible values: a, r, v, w.<br><br>
* count_families (type: int): number of families that live in the building.<br><br>

* has_secondary_use (type: binary): flag variable that indicates if the building was used for any secondary purpose.<br><br>
* has_secondary_use_agriculture (type: binary): flag variable that indicates if the building was used for agricultural purposes.<br><br>
* has_secondary_use_hotel (type: binary): flag variable that indicates if the building was used as a hotel.<br><br>
* has_secondary_use_rental (type: binary): flag variable that indicates if the building was used for rental purposes.<br><br>
* has_secondary_use_institution (type: binary): flag variable that indicates if the building was used as a location of any institution.<br><br>
* has_secondary_use_school (type: binary): flag variable that indicates if the building was used as a school.<br><br>
* has_secondary_use_industry (type: binary): flag variable that indicates if the building was used for industrial purposes.<br><br>
* has_secondary_use_health_post (type: binary): flag variable that indicates if the building was used as a health post.<br><br>
* has_secondary_use_gov_office (type: binary): flag variable that indicates if the building was used fas a government office.<br><br>
* has_secondary_use_use_police (type: binary): flag variable that indicates if the building was used as a police station.<br><br>
* has_secondary_use_other (type: binary): flag variable that indicates if the building was secondarily used for other purposes.<br><br>

### Importing Required Modules

In [136]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import *
from sklearn.metrics import *
import warnings
warnings.filterwarnings("ignore")
from sklearn.ensemble import *

### Importing Datasets

In [137]:
train_input = pd.read_csv("train_values.csv")
train_output = pd.read_csv("train_labels.csv")

test_input = pd.read_csv("test_values.csv")
submission_format = pd.read_csv("submission_format.csv")

In [138]:
id_columns = train_input.columns.values[0:4]

X_train = train_input.drop(columns=id_columns)
X_test = test_input.drop(columns=id_columns)
y_train = train_output.drop(columns="building_id")

In [139]:
data_types = train_input.dtypes.reset_index()
categorical_features = data_types[data_types[0]=="object"]["index"].reset_index(drop=True)

for i in categorical_features:
    unique_values = np.sort(X_train[i].unique())
    dic = {}
    ctr = 0
    for j in unique_values:
        dic[j] = ctr
        ctr += 1
    X_train[i] = X_train[i].map(dic)
    X_test[i] = X_test[i].map(dic)

In [140]:
mms = MinMaxScaler()
X_train = pd.DataFrame(mms.fit_transform(X_train),columns=X_train.columns)
X_test = pd.DataFrame(mms.fit_transform(X_test),columns=X_test.columns)

In [141]:
rfc = RandomForestClassifier()
rfc.fit(X_train,y_train)
y_test = rfc.predict(X_test)

In [142]:
df_final = pd.DataFrame(columns=train_output.columns)
df_final["building_id"] = test_input["building_id"].copy()
df_final["damage_grade"] = y_test

In [143]:
df_final

Unnamed: 0,building_id,damage_grade
0,300051,3
1,99355,2
2,890251,2
3,745817,1
4,421793,2
...,...,...
86863,310028,2
86864,663567,2
86865,1049160,2
86866,442785,3


In [144]:
df_final.to_csv("amith_submission.csv",index=False,header=True)