# STEP 5: Model Building

* Splitting the data

* Define the validation function

* Modeling the base Models and the stacking model.

* Calculating the Scores of Base models

* Fitting the models

* Blending Models


* #### **Splitting the data**

In [121]:
X = train
Y = y_train

# Partition the dataset in train + validation sets

X_train, X_test, y_train, y_test = train_test_split(X, Y,test_size = 0.33, random_state = 0)
print("X_train : " + str(X_train.shape))
print("X_test : " + str(X_test.shape))
print("y_train : " + str(y_train.shape))
print("y_test : " + str(y_test.shape))

X_train : (975, 222)
X_test : (481, 222)
y_train : (975,)
y_test : (481,)


* ### **Validation function**

In [57]:
kfolds = 5

def rmsle_cv(model):
    kf = KFold(n_folds, shuffle=True, random_state=42).get_n_splits(train.values)
    rmse= np.sqrt(-cross_val_score(model, train.values, y_train, scoring="neg_mean_squared_error", cv = kf))
    return(rmse)

In [122]:
kfolds = KFold(n_splits=10, shuffle=True, random_state=42)

def rmsle(y, y_pred):
    return np.sqrt(mean_squared_error(y, y_pred))

def cv_rmse(model, X=X):
    rmse = np.sqrt(-cross_val_score(model, X, Y, scoring="neg_mean_squared_error", cv=kfolds))
    return (rmse)

* ### **Base models**

In [123]:
alphas_r =[12.3, 14.5, 14.6, 14.7, 14.8, 14.9, 15, 15.1, 15.2, 15.3, 15.4, 15.5]
alphas1 = [1.0, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0006, 0.0007, 0.0008]
alphas2 = [0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0006, 0.0007, 1.0]
l1ratio_en = [0.6, 0.8, 0.85, 0.9, 0.95, 0.99, 1]

* **Ridge**

In [124]:
Ridge = make_pipeline(RobustScaler(), RidgeCV(alphas= alphas_r, cv=kfolds))

* **Lasso**

In [125]:
Lasso = make_pipeline(RobustScaler(), LassoCV(alphas =alphas1, max_iter=2000,cv=kfolds, random_state= 45))

* **ElasticNet**

In [126]:
ElasNet = make_pipeline(RobustScaler(), ElasticNetCV(alphas=alphas2, max_iter=1e7,cv=kfolds, l1_ratio=l1ratio_en))


* **Gradient Boosting Regressor**

In [127]:
GBR = GradientBoostingRegressor(n_estimators=3000,learning_rate=0.05, max_depth=6, min_samples_split=10, min_samples_leaf=15, random_state=45
      ,max_features= 'sqrt', loss='huber')

* **XGB Regressor**

In [128]:
XGBoost = xgb.XGBRegressor(objective ='reg:linear', colsample_bytree = 0.3, learning_rate = 0.01,
                max_depth = 5, alpha = 10, n_estimators = 3400)

* **Support Vector Regressor**

In [129]:
SVR = make_pipeline(RobustScaler(), SVR(C= 20, epsilon= 0.008, gamma=0.0003,))

* **light gbm**

In [130]:
LGBM = LGBMRegressor(objective='regression', 
                                       num_leaves=5,
                                       learning_rate=0.05, 
                                       n_estimators=5000,
                                       max_bin=200, 
                                       bagging_fraction=0.75,
                                       bagging_freq=5, 
                                       bagging_seed=7,
                                       feature_fraction=0.2,
                                       feature_fraction_seed=7,
                                       verbose=-1,
)

### Stacked Regressor

In [131]:
Stack_reg= StackingCVRegressor (regressors = (Ridge, Lasso, ElasNet, GBR, LGBM),
                               meta_regressor= XGBoost,
                               use_features_in_secondary=True)

* ### **Calculating the Scores of Base models**

In [132]:

score = cv_rmse(Ridge)
score = cv_rmse(Lasso)
print("LASSO: {:.4f} ({:.4f})\n".format(score.mean(), score.std()))


Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.7863473588197483, tolerance: 0.01834309762588616


Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.5345952394811135, tolerance: 0.0181213487378757


Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.7864717851423801, tolerance: 0.01861794492996395



LASSO: 0.1112 (0.0106)



In [133]:

score = cv_rmse(ElasNet)
print("Elastic Net: {:.4f} ({:.4f})\n".format(score.mean(), score.std()))


Elastic Net: 0.1112 (0.0106)



In [134]:

score = cv_rmse(SVR)
print("SVR: {:.4f} ({:.4f})\n".format(score.mean(), score.std()))

SVR: 0.1122 (0.0117)



In [135]:
score = cv_rmse(XGBoost)
print("XGBoost: {:.4f} ({:.4f})\n".format(score.mean(), score.std()))

XGBoost: 0.1373 (0.0128)



In [136]:

score = cv_rmse(GBR)
print("GBR: {:.4f} ({:.4f})\n".format(score.mean(), score.std()))

GBR: 0.1140 (0.0151)



In [137]:

score = cv_rmse(LGBM)
print("LGBM: {:.4f} ({:.4f})\n".format(score.mean(), score.std()))

LGBM: 0.1176 (0.0130)



* ## **Fitting the models**

In [138]:
print('stacking_model')

stacking_model = Stack_reg.fit(np.array(X), np.array(Y))

stacking_model


In [139]:

print('ElasticNet')

ElasNet_model = ElasNet.fit(X, Y)

ElasticNet


In [140]:
print('lasso')
lasso_model = Lasso.fit(X, Y)

lasso


In [141]:
print('Ridge')
Ridge_model = Ridge.fit(X, Y)

Ridge


In [142]:

print('Svr')
SVR_model = SVR.fit(X, Y)

Svr


In [143]:
print('GradientBoosting')
GBR_model = GBR.fit(X, Y)

GradientBoosting


In [144]:
print('xgboost')

XGBoost_model = XGBoost.fit(X, Y)

xgboost


In [145]:
print('lightgbm')
LGBM_model = LGBM.fit(X, Y)

lightgbm


* ## **Blending Models**

In [146]:
def blend_models_predict(X):
    return ((0.1 * ElasNet_model.predict(X)) + \
            (0.05 * lasso_model.predict(X)) + \
            (0.1 * Ridge_model.predict(X)) + \
            (0.1 * SVR_model.predict(X)) + \
            (0.1 * GBR_model.predict(X)) + \
            (0.15* XGBoost_model.predict(X)) + \
            (0.1 * LGBM_model.predict(X)) + \
            (0.3 * stacking_model.predict(np.array(X))))

In [147]:
rmsle(Y, blend_models_predict(X))

0.07454261205317905

### Submission

In [None]:
submission_results = pd.read_csv("../input/house-prices-advanced-regression-techniques/sample_submission.csv")

In [None]:
submission_results.iloc[:,1] = np.floor(np.expm1(blend_models_predict(test)))

submission_results.to_csv('submission_results', index=False)

[Stacking Models for Improved Predictions](https://www.kdnuggets.com/2017/02/stacking-models-imropved-predictions.html)