## Dataset

#### Importing pandas

In [1]:
import pandas as pd

#### Memuat dataset sebagai Pandas dataframe

Dataset yang digunakan adalah [Iowa Housing Dataset](http://jse.amstat.org/v19n3/decock.pdf); dataset ini merupakah dataset yang sifatnya public dan dapat digunakan untuk keperluan eksplorasi dan penelitian. <br/>
Di sini dataset training dan dataset testing sudah disediakan terpisah.

In [2]:
X_full = pd.read_csv('./dataset/train.csv', index_col='Id')
X_test_full = pd.read_csv('./dataset/test.csv', index_col='Id')

#### Menentukan ```target``` dan ```features```

In [3]:
y = X_full.SalePrice

features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']

X = X_full[features].copy()
X_test = X_test_full[features].copy()

#### Menyisishkan sebagian dataset training sebagai dataset validation

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X, y, train_size=0.8, test_size=0.2, random_state=0)

#### Menampilkan dataset

In [5]:
X_train.head()

Unnamed: 0_level_0,LotArea,YearBuilt,1stFlrSF,2ndFlrSF,FullBath,BedroomAbvGr,TotRmsAbvGrd
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
619,11694,2007,1828,0,2,3,9
871,6600,1962,894,0,1,2,5
93,13360,1921,964,0,1,2,5
818,13265,2002,1689,0,2,3,7
303,13704,2001,1541,0,2,3,6


## Model

#### Importing RandomForestRegressor

In [6]:
from sklearn.ensemble import RandomForestRegressor

#### Mempersiapkan beberapa model dengan konfigurasi berbeda

In [7]:
model_1 = RandomForestRegressor(n_estimators=50, random_state=0)
model_2 = RandomForestRegressor(n_estimators=100, random_state=0)
model_3 = RandomForestRegressor(n_estimators=100, criterion='mae', random_state=0)
model_4 = RandomForestRegressor(n_estimators=200, min_samples_split=20, random_state=0)
model_5 = RandomForestRegressor(n_estimators=100, max_depth=7, random_state=0)

models = [model_1, model_2, model_3, model_4, model_5]

#### Mengukur performa dari tiap model dengan MAE

In [8]:
from sklearn.metrics import mean_absolute_error

In [13]:
def score_model(model, X_t=X_train, X_v=X_val, y_t=y_train, y_v=y_val):
    model.fit(X_t, y_t)
    y_hat = model.predict(X_v)
    return mean_absolute_error(y_v, y_hat)

In [15]:
for i in range(0, len(models)):
    mae = score_model(models[i])
    print(f'Model {i+1} MAE: {int(mae)}')

Model 1 MAE: 24015
Model 2 MAE: 23740
Model 3 MAE: 23528
Model 4 MAE: 23996
Model 5 MAE: 23706
