# Milestone 4 - Additional Algorithm: CatBoost

This notebook corresponds to the fourth and final stage of the Machine Learning final project, as part of the Copernicus Master in Digital Earth, in the Data Science track at University of South Brittany, Vannes, France by Candela Sol PELLIZA & Rajeswari PARASA.

In this milestone we present a new algorithm, CatBoost...
### COMPLETE INTRO


## 1. Presenting the new algorithm: CatBoost

### 1.a. Literal Description
For this final step of the project we introduce the CatBoost algorithm as a promising technique for predicting house prices. 
# COMPLETE

### 1.b. Hyperparameters
# COMPLETE

## 2. Data Loading and Preprocessing

To start the project, we will upload the original raw dataset. Different from the previous notebooks, in which we used directly the preprocessed and splited training and test datasets, here the preprocessing step will be carried on again. This is because, as it was said before, one of the main advantadges of CatBoost algorithm is the fact that it can deal with categorical variables. Therefore, we need to re-apply the data processing steps to our dataset, avoiding the encoding of categorical variables. 

Moreover, some other modifications to the porcessing steps are applied according to the feedback of Milestone 2, such as data normalisation, which are discussed in detail in the corresponding section.

### 2.1. Loading Libraries & Importing Original Dataset

In [14]:
#Importing Libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.compose import make_column_selector as selector
from sklearn.preprocessing import OrdinalEncoder
import matplotlib.pyplot as plt
import warnings

In [34]:
#Setting pandas to show all the columns
pd.set_option('display.max_columns', None)

In [15]:
#Importing  and visualizing the dataset
data = pd.read_csv('OpenData/Ames.csv')
data.head()

Unnamed: 0,Order,PID,area,price,MS.SubClass,MS.Zoning,Lot.Frontage,Lot.Area,Street,Alley,...,Screen.Porch,Pool.Area,Pool.QC,Fence,Misc.Feature,Misc.Val,Mo.Sold,Yr.Sold,Sale.Type,Sale.Condition
0,1,526301100,1656,215000,20,RL,141.0,31770,Pave,,...,0,0,,,,0,5,2010,WD,Normal
1,2,526350040,896,105000,20,RH,80.0,11622,Pave,,...,120,0,,MnPrv,,0,6,2010,WD,Normal
2,3,526351010,1329,172000,20,RL,81.0,14267,Pave,,...,0,0,,,Gar2,12500,6,2010,WD,Normal
3,4,526353030,2110,244000,20,RL,93.0,11160,Pave,,...,0,0,,,,0,4,2010,WD,Normal
4,5,527105010,1629,189900,60,RL,74.0,13830,Pave,,...,0,0,,MnPrv,,0,3,2010,WD,Normal


### 2.2. First Preprocessing Steps

#### 2.2.a. Renaming variables
In order to get a better and uniform handling of the variables, the columns of the original dataset are renamed, adopting the Pascal case convention (capitalizing the first letter of every word, including the first one). The abreviations for long words are kept the same as in the original dataset.

In [16]:
#Create a dictionary with the old and new variable's names
RenameMapping = {
    'area': 'BldgArea',
    'price': 'SoldPrice',
    'MS.SubClass': 'MSSubClass',
    'MS.Zoning': 'MSZoning',
    'Lot.Frontage': 'LotFrontage',
    'Lot.Area': 'LotArea',
    'Lot.Shape': 'LotShape',
    'Land.Contour': 'LandContour',
    'Lot.Config': 'LotConfig',
    'Land.Slope': 'LandSlope',
    'Condition.1': 'Condition1',
    'Condition.2': 'Condition2',
    'Bldg.Type': 'BldgType',
    'House.Style': 'HouseStyle',
    'Overall.Qual': 'OverallQual',
    'Overall.Cond': 'OverallCond',
    'Year.Built': 'YearBuilt',
    'Year.Remod.Add': 'YearRemodAdd',
    'Roof.Style': 'RoofStyle',
    'Roof.Matl': 'RoofMatl',
    'Exterior.1st': 'Exterior1st',
    'Exterior.2nd': 'Exterior2nd',
    'Mas.Vnr.Type': 'MasVnrType',
    'Mas.Vnr.Area': 'MasVnrArea',
    'Exter.Qual': 'ExterQual',
    'Exter.Cond': 'ExterCond',
    'Bsmt.Qual': 'BsmtQual',
    'Bsmt.Cond': 'BsmtCond',
    'Bsmt.Exposure': 'BsmtExposure',
    'BsmtFin.Type.1': 'BsmtFinType1',
    'BsmtFin.SF.1': 'BsmtFinSF1',
    'BsmtFin.Type.2': 'BsmtFinType2',
    'BsmtFin.SF.2': 'BsmtFinSF2',
    'Bsmt.Unf.SF': 'BsmtUnfSF',
    'Total.Bsmt.SF': 'TotalBsmtSF',
    'Heating.QC': 'HeatingQual',
    'Central.Air': 'CentralAir',
    '1st.Flr.SF': '1stFlrSF',
    '2nd.Flr.SF': '2ndFlrSF',
    'Low.Qual.Fin.SF': 'LowQualFinSF',
    'Bsmt.Full.Bath': 'BsmtFullBath',
    'Bsmt.Half.Bath': 'BsmtHalfBath',
    'Full.Bath': 'FullBath',
    'Half.Bath': 'HalfBath',
    'Kitchen.Qual': 'KitchenQual',
    'TotRms.AbvGrd': 'TotRmsAbvGrd',
    'Fireplaces': 'Fireplaces',
    'Fireplace.Qu': 'FireplaceQu',
    'Garage.Type': 'GarageType',
    'Garage.Yr.Blt': 'GarageYrBlt',
    'Garage.Finish': 'GarageFinish',
    'Garage.Cars': 'GarageCars',
    'Garage.Area': 'GarageArea',
    'Garage.Qual': 'GarageQual',
    'Garage.Cond': 'GarageCond',
    'Paved.Drive': 'PavedDrive',
    'Wood.Deck.SF': 'WoodDeckSF',
    'Open.Porch.SF': 'OpenPorchSF',
    'Enclosed.Porch': 'EnclosedPorchSF',
    '3Ssn.Porch': '3SsnPorchSF',
    'Screen.Porch': 'ScreenPorchSF',
    'Pool.Area': 'PoolArea',
    'Pool.QC': 'PoolQual',
    'Misc.Feature': 'MiscFeature',
    'Misc.Val': 'MiscVal',
    'Mo.Sold': 'MoSold',
    'Yr.Sold': 'YrSold',
    'Sale.Type': 'SaleType',
    'Sale.Condition': 'SaleCondition',
    'X1st.Flr.SF': 'X1FloorSF',
    'X2nd.Flr.SF': 'X2FloorSF',
    'X3Ssn.Porch': '3SsnPorchSF',
    'Kitchen.AbvGr': 'KitchenAbvGr',
    'Bedroom.AbvGr': 'BedroomAbvGr',
    }

#Applying the name change
data.rename(columns=RenameMapping, inplace=True)

Moreover, we will also remove the "order" column, considering that it is just an index column withou any significative meaning, and the dataset has a meaningful identificator given by the column "PID"

In [17]:
#Dropping 'Order' column
data = data.drop('Order', axis=1)

#### 2.2.b. Encoding Ordinal and Binary Variables
While CatBoost algorithm can weffectively handle categorical variables, for the case of ordinal and binary varibles we decided to rely on a controlled process. For these cases, we will follow the same process explained in Milestone 1 for converting ordinal and binary "string" variables into numerical ones. For more detailed explanations on each case, refer to Milestone 1 notebook.

In [18]:
# Lists of variables by type
ordinal = ['LotShape', 'Utilities', 'LandSlope', 'ExterQual', 'ExterCond', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'HeatingQual', 'KitchenQual', 'Functional', 'FireplaceQu', 'GarageFinish', 'GarageQual', 'GarageCond', 'PoolQual', 'Fence', 'PavedDrive' ]
nominal = ['MSSubClass', 'MSZoning', 'LandContour', 'LotConfig', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'Foundation', 'Heating', 'Electrical', 'GarageType', 'MiscFeature', 'SaleType', 'SaleCondition']
binary = ['Street', 'CentralAir']
other = ['Alley']

##### 2.2.b.1. Encoding Ordinal Variables

In [19]:
# Mapping dictionary
variable_mappings = {
    'LotShape': {'Reg': 4, 'IR1': 3, 'IR2': 2, 'IR3': 1},
    'Utilities': {'AllPub': 4, 'NoSewr': 3, 'NoSeWa': 2, 'ELO': 1},
    'LandSlope': {'Gtl': 1, 'Mod': 2, 'Sev': 3},
    'ExterQual': {'Ex': 5, 'Gd': 4, 'Ta': 3, 'Fa': 2, 'Po': 1},
    'ExterCond': {'Ex': 5, 'Gd': 4, 'Ta': 3, 'Fa': 2, 'Po': 1},
    'BsmtQual': {'Ex': 6, 'Gd': 5, 'Ta': 4, 'Fa': 3, 'Po': 1, 'NA': 0},
    'BsmtCond': {'Ex': 6, 'Gd': 5, 'Ta': 4, 'Fa': 3, 'Po': 1, 'NA': 0},
    'BsmtExposure': {'Ex': 5, 'Gd': 4, 'Ta': 3, 'Fa': 2, 'Po': 1},
    'BsmtFinType1': {'GLQ': 6, 'ALQ': 5, 'BLQ': 4, 'Rec': 3, 'Lwq': 2, 'Unf': 1, 'Na': 0},
    'BsmtFinType2': {'GLQ': 6, 'ALQ': 5, 'BLQ': 4, 'Rec': 3, 'Lwq': 2, 'Unf': 1, 'Na': 0},
    'HeatingQual': {'Ex': 5, 'Gd': 4, 'TA': 3, 'Fa': 2, 'Po': 1},
    'KitchenQual': {'Ex': 5, 'Gd': 4, 'TA': 3, 'Fa': 2, 'Po': 1},
    'Functional': {'Typ': 8, 'Min1': 7, 'Min2': 6, 'Mod': 5, 'Maj1': 4, 'Maj2': 3, 'Sev': 2, 'Sal': 1},
    'FireplaceQu': {'Ex': 5, 'Gd': 4, 'TA': 3, 'Fa': 2, 'Po': 1, 'NA': 0},
    'GarageFinish': {'Fin': 3, 'RFn': 2, 'Unf': 1, 'NA': 0},
    'GarageQual': {'Ex': 5, 'Gd': 4, 'TA': 3, 'Fa': 2, 'Po': 1, 'NA': 0},
    'GarageCond': {'Ex': 5, 'Gd': 4, 'TA': 3, 'Fa': 2, 'Po': 1, 'NA': 0},
    'PoolQual': {'Ex': 4, 'Gd': 3, 'TA': 2, 'Fa': 1, 'Na': 0},
    'Fence': {'GdPrv': 4, 'MnPrv': 3, 'GdWo': 2, 'MnWw': 1, 'NA': 0},
    'PavedDrive': {'N': 0, 'P': 1, 'Y': 2}
}

# List of columns to map
columns_to_map = variable_mappings.keys()

def apply_mappings(data):
    for column in columns_to_map:
        data[column] = data[column].map(variable_mappings[column])

# Applying changes to dataset
apply_mappings(data)

# Iterate through the columns in the 'ordinal' list and encode NA values as 0
for column in ordinal:
    data[column].fillna(0, inplace=True)

##### 2.2.b.2. Encoding Binary Variables

In [20]:
#Convert binary variables into numerical
Street = {'Grvl': 0, 'Pave': 1}
CentralAir = {'N': 0, 'Y': 1}

#Applying changes to ataset
data['Street'] = data['Street'].map(Street)
data['CentralAir'] = data['CentralAir'].map(CentralAir)


For the case of the 'Alley' variable, we also follow what was already discussed in Milestone 1. Given that the variable has 3 categories, 2 indicating different types of alley material (which a small amunt of positive rows each one), and the third type indicating that the house doesn't have an alley, the variable is also coverted to a binary variable, indicating the existence or not of an alley.

In [21]:
#Encode
Alley = {'Grvl': 1, 'Pave': 1, 'NA': 0}

#Applying changes to dataset
data['Alley'] = data['Alley'].map(Alley)

#Encode NA values as 0
data['Alley'].fillna(0, inplace=True)


### 2.3. Data Split into Train and Test

Following the generally agreed good practices on Machine Learning models treatment, the remaining preprocessing steps, related to NA values handling, are performed after the data splitting is done. This workflow assures that there is no data leakage occuring between the training and test sets in the case that the values of existing rows are used to fill missing values (ex: if filling NA with column mean).

The data splitting is done following the same workflow already explained in Milestone 2, in which we demonstrated the importance of permorfiming a neighborhood-based splitting, due to an unbalanced spatial distribution. We also apply the same rows dropping based on the small number of samples in certain neighbirhoods. For more details on this regard, please refer to the mentioned notebook.

In [22]:
# Drop the lines for Landmrk and GrnHill neighborhoods
neighb_todrop = ['Landmrk', 'GrnHill']
data = data[~data['Neighborhood'].isin(neighb_todrop)]

# Divide data into train and test with stratified split
data_train, data_test= train_test_split(data, test_size=0.2, random_state=33, stratify=data['Neighborhood'])

### 2.4. Handling NA Values
The workflow for NA values handling also follows the same process already analyzed, explained and established for each variable in Milestone 1. for more details, please refer to the mentioned notebook.

In [23]:
#Filling NAs on training
data_train['MiscFeature'].fillna('None', inplace=True)
data_train['GarageType'].fillna('None', inplace=True)

#Filling NAs on test
data_test['MiscFeature'].fillna('None', inplace=True)
data_test['GarageType'].fillna('None', inplace=True)

In [24]:
columns_to_check = ['MasVnrType' , 'BsmtHalfBath', 'BsmtFullBath', 'GarageCars', 'Electrical', 'GarageArea']

#  Drop rows with NaN values in specific columns in training
data_train.dropna(subset=columns_to_check, inplace=True)

# Drop rows with NaN values in specific columns in test
data_test.dropna(subset=columns_to_check, inplace=True)

In [25]:
## LotFrontage VARIABLE

#Fill Na values with column mean - training
mean_LotFrontage_train = data_train['LotFrontage'].mean()
data_train['LotFrontage'].fillna(mean_LotFrontage_train, inplace=True)

#Fill Na values with column mean - test
mean_LotFrontage_test = data_test['LotFrontage'].mean()
data_test['LotFrontage'].fillna(mean_LotFrontage_test, inplace=True)

In [28]:
## GarageYrBlt VARIABLE

#Filling GarageYrBlt NA values with YearBuilt in training
data_train['GarageYrBlt'].fillna(data_train['YearBuilt'], inplace=True)

#Filling GarageYrBlt NA values with YearBuilt in test
data_test['GarageYrBlt'].fillna(data_test['YearBuilt'], inplace=True)

Finally, we check that there is not any NA value in both train and test datasets and we visualize how the datasets looks like after preprocessing.

In [31]:
#Check NAs in test
data_test.isna().any().any()

False

In [30]:
#Check NAs in test
data_test.isna().any().any()

False

In [36]:
# Previsualize train dataset
data_train.head()

Unnamed: 0,PID,BldgArea,SoldPrice,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQual,CentralAir,Electrical,X1FloorSF,X2FloorSF,LowQualFinSF,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorchSF,3SsnPorchSF,ScreenPorchSF,PoolArea,PoolQual,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
2891,916225130,2519,335000,60,RL,42.0,26178,1,0.0,3,Lvl,4,Inside,2,Timber,Norm,Norm,1Fam,2Story,7,5,1989,1990,Hip,CompShg,MetalSd,MetalSd,BrkFace,293.0,4.0,0.0,PConc,5.0,0.0,4.0,6.0,965.0,1.0,0.0,245.0,1210.0,GasA,5,1,SBrkr,1238,1281,0,1.0,0.0,2,1,4,1,4,9,8,2,4.0,Attchd,1989.0,2.0,2.0,628.0,3.0,3.0,2,320,27,0,0,0,0,0.0,0.0,,0,4,2006,WD,Normal
125,534427010,1728,84900,90,RL,98.0,13260,1,0.0,3,Lvl,4,Inside,1,NAmes,Norm,Norm,Duplex,1Story,5,6,1962,2001,Hip,CompShg,HdBoard,HdBoard,BrkFace,144.0,0.0,0.0,CBlock,0.0,0.0,0.0,4.0,1500.0,1.0,0.0,228.0,1728.0,GasA,3,1,SBrkr,1728,0,0,2.0,0.0,2,0,6,2,3,10,8,0,0.0,,1962.0,0.0,0.0,0.0,0.0,0.0,2,0,0,0,0,0,0,0.0,0.0,,0,1,2010,Oth,Abnorml
1234,535150210,1098,135000,20,RL,74.0,7390,1,0.0,3,Lvl,4,Inside,1,NAmes,Norm,Norm,1Fam,1Story,5,7,1955,1955,Hip,CompShg,Wd Sdng,Wd Sdng,BrkFace,151.0,0.0,0.0,CBlock,0.0,0.0,0.0,5.0,902.0,1.0,0.0,196.0,1098.0,GasA,3,1,SBrkr,1098,0,0,1.0,0.0,1,0,3,1,3,6,8,0,0.0,Attchd,1955.0,1.0,1.0,260.0,3.0,3.0,2,0,0,0,0,0,0,0.0,0.0,,0,7,2008,WD,Normal
2169,908103090,1088,110000,20,RL,67.0,8308,1,0.0,4,Lvl,4,Inside,1,Edwards,Norm,Norm,1Fam,1Story,4,6,1963,1963,Gable,CompShg,VinylSd,VinylSd,Stone,20.0,0.0,4.0,CBlock,0.0,0.0,0.0,4.0,132.0,0.0,841.0,115.0,1088.0,GasA,3,1,SBrkr,1088,0,0,0.0,0.0,1,0,2,1,3,4,8,0,0.0,Detchd,2002.0,1.0,2.0,520.0,3.0,3.0,1,0,0,0,0,0,0,0.0,0.0,,0,6,2007,COD,Normal
1029,527359030,1444,159900,20,RL,80.0,10400,1,0.0,4,Lvl,4,Inside,1,NWAmes,Norm,Norm,1Fam,1Story,6,5,1976,1976,Gable,CompShg,HdBoard,HdBoard,BrkFace,120.0,0.0,0.0,CBlock,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1444.0,1444.0,GasA,3,1,SBrkr,1444,0,0,0.0,0.0,2,0,2,1,3,5,8,1,4.0,Attchd,1976.0,1.0,2.0,473.0,3.0,3.0,2,0,24,0,0,0,0,0.0,2.0,,0,4,2008,WD,Normal


In [35]:
# Previsualize test dataset
data_test.head()

Unnamed: 0,PID,BldgArea,SoldPrice,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQual,CentralAir,Electrical,X1FloorSF,X2FloorSF,LowQualFinSF,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorchSF,3SsnPorchSF,ScreenPorchSF,PoolArea,PoolQual,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
2510,533221080,1524,166000,160,FV,75.956284,2998,1,0.0,4,Lvl,4,Inside,1,Somerst,Norm,Norm,TwnhsE,2Story,6,5,2000,2000,Gable,CompShg,MetalSd,MetalSd,BrkFace,513.0,4.0,0.0,PConc,5.0,0.0,0.0,6.0,353.0,1.0,0.0,403.0,756.0,GasA,5,1,SBrkr,768,756,0,0.0,0.0,2,1,2,1,4,4,8,0,0.0,Detchd,2000.0,1.0,2.0,440.0,3.0,3.0,2,0,32,0,0,0,0,0.0,0.0,,0,6,2006,WD,Normal
430,528108140,2020,402861,20,RL,94.0,12220,1,0.0,4,Lvl,4,Inside,1,NridgHt,Norm,Norm,1Fam,1Story,10,5,2009,2009,Hip,CompShg,CemntBd,CmentBd,BrkFace,305.0,5.0,0.0,CBlock,6.0,0.0,0.0,6.0,1436.0,1.0,0.0,570.0,2006.0,GasA,5,1,SBrkr,2020,0,0,1.0,0.0,2,1,3,1,5,9,8,1,4.0,Attchd,2009.0,3.0,3.0,900.0,3.0,3.0,2,156,54,0,0,0,0,0.0,0.0,,0,9,2009,New,Partial
2900,916477010,1960,320000,20,RL,95.0,13618,1,0.0,4,Lvl,4,Corner,1,Timber,Norm,Norm,1Fam,1Story,8,5,2005,2006,Gable,CompShg,VinylSd,VinylSd,Stone,198.0,4.0,0.0,PConc,6.0,5.0,0.0,6.0,1350.0,1.0,0.0,378.0,1728.0,GasA,5,1,SBrkr,1960,0,0,1.0,0.0,2,0,3,1,4,8,8,2,4.0,Attchd,2005.0,3.0,3.0,714.0,3.0,3.0,2,172,38,0,0,0,0,0.0,0.0,,0,11,2006,New,Partial
1641,527256030,2234,441929,20,RL,85.0,14082,1,0.0,3,HLS,4,Inside,1,StoneBr,Norm,Norm,1Fam,1Story,8,5,2006,2006,Hip,CompShg,VinylSd,VinylSd,BrkFace,945.0,4.0,0.0,PConc,6.0,5.0,4.0,6.0,1558.0,1.0,0.0,662.0,2220.0,GasA,5,1,SBrkr,2234,0,0,1.0,0.0,1,1,1,1,4,7,8,1,4.0,Attchd,2006.0,2.0,2.0,724.0,3.0,3.0,2,390,80,0,0,0,0,0.0,0.0,,0,1,2007,WD,Normal
2713,905107070,1400,149500,60,RL,74.0,7844,1,0.0,4,Lvl,4,Inside,1,Sawyer,Norm,Norm,1Fam,2Story,6,7,1978,1978,Hip,CompShg,HdBoard,HdBoard,BrkFace,203.0,0.0,0.0,CBlock,0.0,0.0,0.0,5.0,209.0,1.0,0.0,463.0,672.0,GasA,3,1,SBrkr,672,728,0,0.0,0.0,1,1,3,1,3,6,8,1,3.0,Attchd,1978.0,3.0,2.0,440.0,3.0,3.0,2,0,0,0,0,0,0,0.0,0.0,,0,3,2006,WD,Normal


# References
- Notebook GitHub: https://github.com/anantgupta129/CatBoost-in-Python-ML/tree/master
- Catboost paper: https://arxiv.org/pdf/1706.09516.pdf
- Catboost paper 2: http://learningsys.org/nips17/assets/papers/paper_11.pdf
- Catboost library website: https://catboost.ai/
- Comprehensive notebook on catboost feature encoding paramters tunning: https://github.com/catboost/catboost/blob/master/catboost/tutorials/categorical_features/categorical_features_parameters.ipynb