In [None]:
from warnings import filterwarnings
filterwarnings('ignore')

# **Predicting Audi Car Prices**
<span id="0"></span>
1. [Overview](#1)
1. [Importing Modules](#2)
1. [Defining an Evaluation Table](#3)
1. [Creating a Model Evaluation Function and Adjusted $R^{2}$ Function](#4)
1. [Reading the Dataset](#5)
1. [Take a quick look at the data set](#6)
1. [Preprocessing and Visualize](#7)
    * [Handling The Categorical Features](#8)
    * [Visualizing](#9)
    * [Scaling](#10)    
    * [Splitting data into training set and test set](#11)
1. [Linear Regression](#12)
1. [Regularized Linear Models](#13)
    * [Ridge Regression](#14)
    * [Lasso Regression](#15)
    * [Elastic Net](#16)
1. [Polynomial Regression](#17)
1. [Support Vector Machine (SVM)](#18)  
1. [Decision Tree Regressor](#19)
1. [Random Forest Regressor](#20)    
1. [AdaBoost](#21)
1. [Gradient Boosting](#22)
1. [XGBoost (Extreme Gradient Boosting)](#23)
1. [Voting Regressor](#24)
1. [Evaluation Table](#25)
1. [Conclusion](#26)
1. [Resources](#27)

# <span id="1"></span> **Overview**
<hr/>

Welcome to my kernel.I'm trying to learn machine learning. I wanted to write a kernel. In this kernel, I used various regression models to predict Audi car prices. Also, I tried to explain briefly the models I used.

If you have a question or feedback, do not hesitate to write. Thanks 🙂

<img src="https://carfromjapan.com/wp-content/uploads/2016/10/audi-tax-free-cars.jpg" title="source: https://carfromjapan.com/" />

# <span id="2"></span> **Importing Modules**
#### [Return Contents](#0)
<hr/>

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split,cross_val_score,GridSearchCV
from sklearn.metrics import r2_score,mean_squared_error

# <span id="3"></span> **Defining an Evaluation Table**
#### [Return Contents](#0)
<hr/>

I will train many models. So I created a dataframe to better see the model evaluation metrics. This dataframe includes **Root Mean Squared Error (RMSE)**, **R-squared**, **Adjusted R-squared**, **mean of the R-squared values obtained by the k-Fold Cross Validation** and **mean of the Root Mean Squared Error (RMSE) values obtained by the k-Fold Cross Validation**,which are the important metrics to compare different models.Usually having a R-squared value closer to one and smaller RMSE means a better fit.

In [None]:
evaluation = pd.DataFrame({
    "Model":[],
    "R2 Score (train)":[],
    "Adjusted R2 Score (train)":[],
    "R2 Score (test)":[],
    "Adjusted R2 Score (test)":[],
    "Root Mean Squared Error(RMSE) (train)":[],
    "Root Mean Squared Error(RMSE) (test)":[],
    "R2 Score (5-Fold Cross Validation)":[],
    "Root Mean Squared Error(RMSE) (5-Fold Cross Validation)":[]
})

# <span id="4"></span>**Creating a Model Evaluation Function and Adjusted $R^{2}$ Function**
#### [Return Contents](#0)
<hr/>

The $R^{2}$ increases when the number of features increase. Therefore, we may need a stronger evaluation metric to compare different models. We can use Adjusted $R^{2}$ for this purpose. Adjusted $R^{2}$ only increases if adding the feature decreases MSE. It is therefore a better metric than $R^{2}$.

So,I created a function to calculate the $R^{2}$ score, adjusted $R^{2}$ score, and Root Mean Squared Error in the model's training set and test set.

In [None]:
# Function to calculate adjusted r2 score:
# where  n  is the number of instances and  k  is the number of features.
def adjustedR2(r2_score,n,k):
    return 1-(((1-r2_score)*(n-1))/(n-k-1))

In [None]:
# Model Evaluation Function

def evaluateModel(model,X_train,X_test,y_train,y_test,model_name):
    
    if(model_name=="Polynomial Regression"):
        n_train=X_train.shape[1]
        n_test=X_test.shape[1]
    else:
        n_train=len(X_train.columns)
        n_test=len(X_test.columns)
        
    y_predict_test = model.predict(X_test)
    y_predict_train = model.predict(X_train)
    
    r2_score_train = float(format(r2_score(y_train,y_predict_train),'.3f'))
    
    r2_score_test = float(format(r2_score(y_test,y_predict_test),'.3f'))
    
    rmse_train = float(format(np.sqrt(mean_squared_error(y_train,y_predict_train)),'.3f'))
    
    rmse_test = np.sqrt(mean_squared_error(y_test,y_predict_test))
    
    ad_r2_score_train = float(format(adjustedR2(r2_score_train,X_train.shape[0],n_train),'.3f'))
    
    ad_r2_score_test = float(format(adjustedR2(r2_score_test,X_test.shape[0],n_test),'.3f'))
                              
    r2_score_mean = float(format(cross_val_score(model,X_train,y_train,cv=5).mean(),'.3f'))
                              
    rmse_mean = -float(format(cross_val_score(model,X_train,y_train,cv=5,scoring="neg_root_mean_squared_error").mean(),'.3f'))
    
    r = evaluation.shape[0]
    evaluation.loc[r]=[model_name,
                       r2_score_train,ad_r2_score_train,
                       r2_score_test,ad_r2_score_test,
                       rmse_train,rmse_test,
                      r2_score_mean,
                       rmse_mean]
    
    return evaluation.sort_values(by = 'Root Mean Squared Error(RMSE) (5-Fold Cross Validation)', ascending=True)

# <span id="5"></span> **Reading the Dataset**
#### [Return Contents](#0)
<hr/>

In [None]:
# read and load data
df = pd.read_csv("../input/used-car-dataset-ford-and-mercedes/audi.csv")

# <span id="6"></span> **Take a quick look at the data set**
#### [Return Contents](#0)
<hr/>

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.describe()

# <span id="7"></span>**Preprocessing and Visualize**
#### [Return Contents](#0)
<hr/>

I've done some visualization and preprocessing in this section.

I created a copy of df so that my operations do not affect the actual data set:

In [None]:
df_2=df.copy()

## <span id="8"></span>**Handling The Categorical Features**
#### [Return Contents](#0)
<hr/>

Most machine learning algorithms operate with numbers. Therefore, it is necessary to convert categorical features into numbers. For this purpose, I convert categorical features to numbers using the pd.get_dummies() method.

In [None]:

df_2=pd.concat((df_2,pd.get_dummies(df_2["model"]),pd.get_dummies(df_2["transmission"]),pd.get_dummies(df_2["fuelType"]))
               ,axis=1)

df_2.drop(["model","transmission","fuelType"],axis=1,inplace=True)

 ## <span id="9"></span>Visualizing
 #### [Return Contents](#0)
<hr/>

In [None]:
# correlation

plt.figure(figsize=(30,30))
sns.heatmap(df_2.corr(),annot=True)
plt.show()

As you can see in the correlation matrix, there is a very strong relationship between diesel and petrol, let's remove one of the them:

In [None]:
df_2.drop("Petrol",axis=1,inplace=True)

In [None]:
#  Count plot on fuel type

fig, ax1 = plt.subplots(figsize=(5,4))
graph = sns.countplot(ax=ax1,x='fuelType', data=df)
graph.set_xticklabels(graph.get_xticklabels())
for p in graph.patches:
    height = p.get_height()
    graph.text(p.get_x()+p.get_width()/2., height/2,height ,ha="center",fontsize=10)

As you can see in the count plot on fuel type,let's delete the hybrid model because it is very few:

In [None]:
df_2.drop("Hybrid",axis=1,inplace=True)

In [None]:
# Count plot on the transmission

fig, ax1 = plt.subplots(figsize=(5,4))
graph = sns.countplot(ax=ax1,x='transmission', data=df)
graph.set_xticklabels(graph.get_xticklabels())
for p in graph.patches:
    height = p.get_height()
    graph.text(p.get_x()+p.get_width()/2., height/2,height ,ha="center",fontsize=8)

In [None]:
# Pairplotting

sns.pairplot(df)
plt.show()

I deleted the values I considered as outlier:

In [None]:
df_2.drop(index=df_2[df_2["mileage"]>160000].index,inplace=True)
df_2.drop(index=df_2[df_2["year"]<2000].index,inplace=True)

df_2.drop(index=df_2[df_2["engineSize"]==0].index,inplace=True)

## <span id="10"></span>**Scaling**
#### [Return Contents](#0)
<hr/>

If the scales of the features are very different, some features may suppress other features. This can adversely affect the performance of the model. It is therefore necessary to scale the features.

In [None]:
stdn_scaler = StandardScaler().fit(df_2[["year","mileage","tax","mpg","engineSize"]])

df_2[["year","mileage","tax","mpg","engineSize"]] = stdn_scaler.transform(df_2[["year","mileage","tax","mpg","engineSize"]])

## <span id="11"></span> Splitting data into training set and test set
#### [Return Contents](#0)
<hr/>

In [None]:
train,test = train_test_split(df_2,test_size=0.25,random_state=42)

X_train = train.drop("price",axis=1)
y_train = train["price"]

X_test = test.drop("price",axis=1)
y_test = test["price"]

# <span id="12"></span>Linear Regression
#### [Return Contents](#0)
<hr/>

Examining the linear relationship between 2 or more numerical variables is called linear regression. 
<br>
Input variables are called independent variables(or features), and result variables are called dependent variables(or target).
<br>
Usually inputs are matrix, outputs are vector.
<br>
If linear regression is created using an independent variable, it is called simple linear regression. If linear regression is created using two or more independent variables, it is called multiple linear regression.

<b>Simple Linear Regression Equation</b>
$$Y=\beta_{0}+\beta_{1}x_{1}$$

<b>Multiple Linear Regression Equation</b>
$$Y=\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+...+\beta_{n}x_{n}$$

Linear model makes a prediction by simply computing a weighted sum of the input features, plus a constant called the bias term (also called the intercept term).

<b>Linear Regression model prediction</b>

$$\hat{y}=\theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}+...+\theta_{n}x_{n}$$

In this equation:
* $\hat{y}$ is the predicted value.
* $n$ is the number of features.
* $x_{i}$  is the $i^{th}$ feature value
* $\theta_{j}$ is the $j^{th}$  model parameter (including the bias term $θ_{0}$ and the feature weights  $θ_{1}$ ,  $θ_{2}$ , ⋯,  $θ_{n}$ ).

That’s the Linear Regression model but how do we train it? Training a model means setting its parameters so that the model best fits the training set. For this purpose, we first need a measure of how well (or poorly) the model fits the training data.Most
common performance measure of a regression model is the Root Mean Square Error (RMSE).Therefore, to train a Linear Regression model, we need to find the value of $\theta$ that minimizes the RMSE.

<b>Root Mean Square Error (RMSE)</b>
$$RMSE(X, h) = \sqrt{\frac{1}{m}\sum_{i=1}^{m}(\theta^{T}x^{(i)}-y^{(i)})^{2}}$$

When we find the model parameters that will minimize the RMSE, we train the model.So how do we find the model parameters that minimize the RMSE? Approaches for this purpose(This approachs will not be covered in this kernel.):
* Normal Equation
* Singular Value Decomposition (SVD)
* Gradient Descent

Now let's create a linear regression model:

In [None]:
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression().fit(X_train,y_train)

We created the linear regression model. Now let's evaluate our model.

In [None]:
evaluateModel(lin_reg,X_train,X_test,y_train,y_test,"Linear Regression")

# <span id="13"></span>Regularized Linear Models
#### [Return Contents](#0)
<hr/>

A good way to reduce overfitting is to regularize the model.For a linear model, regularization is typically achieved by constraining the weights of the model.

## <span id="14"></span>Ridge Regression
#### [Return Contents](#0)
<hr/>

Ridge Regression (also called Tikhonov regularization) is a regularized version of Linear Regression:a regularization term equal to $\alpha\sum_{i=i}^{n}\theta_{i}^{2}$  is added to the cost function. 
<br>
The purpose of Ridge Regression is to keep model weights as small as possible.
<br>
The hyperparameter $\alpha$ controls how much you want to regularize the model:
* if $\alpha=0$,then Ridge Regression is just Linear Regression.  
* if $\alpha$  is very large, then all weights end up very close to zero.

<b>Ridge Regression cost function</b>
$$J(\theta)=MSE(\theta)+\alpha\frac{1}{2}\sum_{i=1}^{n}\theta_{i}^2$$

### Important Warning

It is important to scale the data before performing Ridge Regression, as it is sensitive to the scale of the input features. This is true of most regularized models.

-----------

Now let's create a ridge regression model:

In [None]:
from sklearn.linear_model import Ridge
ridge_reg=Ridge().fit(X_train,y_train)

We created the model. Now let's evaluate our model.

In [None]:
evaluateModel(ridge_reg,X_train,X_test,y_train,y_test,"Ridge Regression")

## <span id="15"></span>Lasso Regression
#### [Return Contents](#0)
<hr/>

Least Absolute Shrinkage and Selection Operator Regression (usually simply called Lasso Regression) is another regularized version of Linear Regression: just like Ridge Regression, it adds a regularization term to the cost function.

<b>Lasso regression retularization term</b>
$$|\theta_{i}|$$

<b>Lasso Regression cost function</b>
$$J(\theta)=MSE(\theta)+\alpha\sum_{i=1}^{n}|\theta_{i}|$$

An important characteristic of Lasso Regression is that it tends to eliminate the weights of the least important features (i.e., set them to zero). 

Now let's create a lasso regression model:

In [None]:
from sklearn.linear_model import Lasso
lasso_reg = Lasso().fit(X_train,y_train)

We created the model. Now let's evaluate our model.

In [None]:
evaluateModel(lasso_reg,X_train,X_test,y_train,y_test,"Lasso Regression")

## <span id="16"></span>Elastic Net
#### [Return Contents](#0)
<hr/>

Elastic Net is a middle ground between Ridge Regression and Lasso Regression. The regularization term is a simple mix of both Ridge and Lasso’s regularization terms, and you can control the mix ratio r. When r = 0, Elastic Net is equivalent to Ridge Regression, and when r = 1, it is equivalent to Lasso Regression.

<b>Elastic Net cost function</b>
$$J(\theta)=MSE(\theta)+r\alpha\sum_{i=1}^{n}|\theta_{i}|+\frac{1-r}{2}\alpha\sum_{i=1}^{n}\theta_{i}^2$$

Now let's create a Elastic Net regression model:

In [None]:
from sklearn.linear_model import ElasticNet
elastic_reg = ElasticNet().fit(X_train,y_train)

We created the model. Now let's evaluate our model.

In [None]:
evaluateModel(elastic_reg,X_train,X_test,y_train,y_test,"Elastic Net Regression")

Let's try a few parameters for Elastic Net regression with GridSearch and choose the best model:

In [None]:
params={"alpha":[0.0001,0.001,0.01,0.1,1,10,100,1000,10000],"max_iter":[1000,2500,5000,7500,10000],
       "l1_ratio":[0.3,0.4,0.5,0.6,0.7]}

elastic_gs=GridSearchCV(ElasticNet(random_state=42),param_grid=params,cv=5,scoring="neg_root_mean_squared_error")
elastic_gs.fit(X_train,y_train)

elastic_reg_best=elastic_gs.best_estimator_.fit(X_train,y_train)

evaluateModel(elastic_reg_best,X_train,X_test,y_train,y_test,"Best Elastic Net Regression")

# <span id="17"></span>Polynomial Regression
#### [Return Contents](#0)
<hr/>

For the linear models, the main idea is to fit a straight line to our data.What if your data is more complex than a straight line? Surprisingly, you can use a linear model to fit nonlinear data. A simple way to do this is to add powers of each feature as new features, then train a linear model on this extended set of features. This technique is called **Polynomial Regression**.
While using polynomial transformation and deciding to degree, we should be very careful because it migh cause overfitting. 

We have 35 features. Since our new feature number for degree = 3 will be 8435 and this process will take a long time, I used the selected features instead of using all the features.

### Note

PolynomialFeatures(degree=d) transforms an array containing n features into an array containing $\frac{(n+d)!}{d!n!}$ features, where n is the number of samples. 
<br>
Implementation with scipy:

In [None]:
from scipy.special import factorial
factorial(38)/(factorial(35)*factorial(3))

In [None]:
from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree=2, include_bias=False)

X_train_poly = poly_features.fit_transform(X_train[["year","mileage","tax","mpg","engineSize"]])
X_test_poly= poly_features.fit_transform(X_test[["year","mileage","tax","mpg","engineSize"]])

poly_reg=LinearRegression()
poly_reg.fit(X_train_poly,y_train)

evaluateModel(poly_reg,X_train_poly,X_test_poly,y_train,y_test,"Polynomial Regression")

# <span id="18"></span>Support Vector Machine (SVM)
#### [Return Contents](#0)
<hr/>

A Support Vector Machine (SVM) is a powerful and versatile Machine Learning model, capable of performing linear or nonlinear classification, regression, and even outlier detection.
<br>
<br>
Linear SVM Regression tries to fit as many instances as possible on the street(represented by the parallel dashed lines) while limiting margin violations (i.e., instances off the street).The width of the street is controlled by a hyperparameter, ϵ (epsilon). 

![title](https://www.researchgate.net/profile/Hassen_Bouzgou/publication/316351306/figure/fig7/AS:485878301761550@1492853822259/Example-of-linear-SVM-regression-with-tube.png)
<center><b>Resource</b> :https://www.researchgate.net/figure/Example-of-linear-SVM-regression-with-tube_fig7_316351306</center>

### Important Warning

Before performing SVM Regression the training data should be scaled and centered.
This happens automatically because we are using StandartScaller.

Now let's create a linear svm regression model:

In [None]:
from sklearn.svm import LinearSVR

svm_reg = LinearSVR()
svm_reg.fit(X_train, y_train)

We created the model. Now let's evaluate our model.

In [None]:
evaluateModel(svm_reg,X_train,X_test,y_train,y_test,"Linear SVM Regression")

Previously, I tried various parameter values and found the most optimal values and then set the following values to obtain the optimum values I previously found.

Let's try a few parameters for linear SVM regression with GridSearch and choose the best model:

In [None]:
params={"C":[100,1000,10000],
       "dual":[True,False],"epsilon":[1500,4500,7500],
       "fit_intercept":[True,False],"max_iter":[5000,7500,10000]}

svm_gs=GridSearchCV(LinearSVR(),param_grid=params,cv=5,n_jobs=-1,scoring="neg_root_mean_squared_error")
svm_gs.fit(X_train,y_train)

best_linear_svm_reg=svm_gs.best_estimator_
best_linear_svm_reg.fit(X_train,y_train)
evaluateModel(best_linear_svm_reg,X_train,X_test,y_train,y_test,"Best Linear SVM Regression")

Let's try nonlinear SVM regressor

In [None]:
from sklearn.svm import SVR
nonlinear_svm_reg=SVR()
nonlinear_svm_reg.fit(X_train,y_train)
evaluateModel(nonlinear_svm_reg,X_train,X_test,y_train,y_test,"Nonlinear SVM Regression")

Best Nonlinear SVM Regression:

In [None]:
best_nonlinear_svm_reg = SVR(C=100000,degree=2,epsilon=1000,gamma=0.1,kernel="rbf",max_iter=10000)
best_nonlinear_svm_reg.fit(X_train,y_train)
evaluateModel(best_nonlinear_svm_reg,X_train,X_test,y_train,y_test,"Best Nonlinear SVM Regression")

# <span id="19"></span>**Decision Tree Regressor**
#### [Return Contents](#0)
<hr/>

Decision Trees are versatile Machine Learning algorithms that can perform both classification and regression tasks, and even multioutput tasks. They are powerful algorithms, capable of fitting complex datasets.

In [None]:
from sklearn.tree import DecisionTreeRegressor
dec_reg=DecisionTreeRegressor()
dec_reg.fit(X_train,y_train)

In [None]:
evaluateModel(dec_reg,X_train,X_test,y_train,y_test,"Decision Tree Regressor")

# <span id="20"></span>**Random Forest Regressor**
#### [Return Contents](#0)
<hr/>

Random Forest is an ensemble of Decision Trees,that is, a Random Forest contains more than one Decision Tree.

In [None]:
from sklearn.ensemble import RandomForestRegressor
rand_reg=RandomForestRegressor().fit(X_train,y_train)
evaluateModel(rand_reg,X_train,X_test,y_train,y_test,"Random Forest Regressor")

# <span id="21"></span>**AdaBoost**
#### [Return Contents](#0)
<hr/>

To explain Adaboost with an example: a predictor is trained in the training set, then predictions are made in the training set using this predictor, and more attention is paid to training instances where the predictor is undefitted. The weight of the training instances where the predictor is underfitting is increased. A new predictor is then trained in the training set with updated weights.This results in new predictors focusing more and more on the hard cases. This is the technique used by AdaBoost.

Let's train a AdaBoost Regressor:

In [None]:
from sklearn.ensemble import AdaBoostRegressor
adaboost_reg=AdaBoostRegressor().fit(X_train,y_train)

Now let's evaluate our model.

In [None]:
evaluateModel(adaboost_reg,X_train,X_test,y_train,y_test,"Adaboost Regressor")

# <span id="22"></span>**Gradient Boosting**
#### [Return Contents](#0)
<hr/>

Gradient Boosting tries to fit the new predictor to the residual errors made by the previous predictor.For example, suppose you train a model(call this model model1) using an X training set and y targets. To train the next model(call this model model2.), you need to subtract the predictions made by the previous model(i.e,model1) from y, and our new y target values will be these values(y-model1 predictions).

Model 2 is trained using the X training set and the new y target values(y - model1 predictions), and so on.

To make a prediction for a new instances,predictions of all models are summed up.

This is the technique used by Gradient Boosting.


Let's train a Gradient Boosting Regressor:

In [None]:
from sklearn.ensemble import GradientBoostingRegressor
gb_reg=GradientBoostingRegressor().fit(X_train,y_train)

Now let's evaluate model.

In [None]:
evaluateModel(gb_reg,X_train,X_test,y_train,y_test,"Gradient Boosting Regressor")

Gradient Boosting Regressor with optimum n_estimators

In [None]:
from sklearn.metrics import mean_squared_error
gbrt=GradientBoostingRegressor(max_depth=2,n_estimators=10000)
gbrt.fit(X_train,y_train)

errors=[mean_squared_error(y_test,y_pred) for y_pred in gbrt.staged_predict(X_test)]

best_n_est=np.argmin(errors)+1

gbrt_best=GradientBoostingRegressor(n_estimators=best_n_est,max_depth=2).fit(X_train,y_train)
evaluateModel(gbrt_best,X_train,X_test,y_train,y_test,"Gradient Boosting Regressor with optimum n_estimators")

# <span id="23"></span>XGBoost (Extreme Gradient Boosting)
#### [Return Contents](#0)
<hr/>

XGBoost is optimized implementation of Gradient Boosting.

In [None]:
import xgboost

xgb_reg=xgboost.XGBRegressor()
xgb_reg.fit(X_train,y_train)
evaluateModel(xgb_reg,X_train,X_test,y_train,y_test,"XGB Regressor")

XGBoost also offers several nice features, such as automatically taking care of early stopping:

In [None]:
xgb_reg_early_stopping=xgboost.XGBRegressor()
xgb_reg_early_stopping.fit(X_train,y_train,
                           eval_set=[(X_test,y_test)],early_stopping_rounds=2)
evaluateModel(xgb_reg_early_stopping,X_train,X_test,y_train,y_test,"XGB Regressor with early stopping")

# <span id="24"></span>**Voting Regressor**
#### [Return Contents](#0)
<hr/>

We can use multiple regression models as a single model. The VotingRegressor class can be used for this.The prediction of the VotingRegressor model is the arithmetic mean of the prediction of each regression model. For example, if the prediction of linear regression is 25000 and the prediction of Rasso regression is 15000, the estimate of VotingRegressor is (25000 + 15000) / 2 = 20000.

Let's create a VotingRegressor using all the models we have created so far(except Polynomial Regression,because the dataset used to train Polynomial Regression is different from the dataset used to train other models.).

In [None]:
from sklearn.ensemble import VotingRegressor

voting_reg=VotingRegressor(
estimators=[("v_lin_reg",lin_reg),
            ("v_rid_reg",ridge_reg),
            ("v_lasso_reg",lasso_reg),
            ("v_elastic_reg_best",elastic_reg_best),
            ("v_elastic_reg",elastic_reg),
            ("v_lin_svm_reg",svm_reg),
            ("v_best_lin_svm_reg",best_linear_svm_reg),
            ("v_nonlinear_svm_reg",nonlinear_svm_reg),
            ("v_best_nonlinear_svm_reg",best_nonlinear_svm_reg),
            ("v_dec_reg",dec_reg),
            ("v_rand_reg",rand_reg),
            ("v_adaboost_reg",adaboost_reg),
            ("v_gb_reg",gb_reg),
            ("v_gb_reg_best",gbrt_best),
            ("v_xgb_reg",xgb_reg),
            ("v_xgb_reg_best",xgb_reg_early_stopping)
            ]
)

voting_reg.fit(X_train,y_train)

In [None]:
evaluateModel(voting_reg,X_train,X_test,y_train,y_test,"Voting Regression")

# <span id="25"></span>Evaluation Table

In [None]:
evaluation.sort_values(by="Root Mean Squared Error(RMSE) (5-Fold Cross Validation)")

# <span id="26"></span>Conclusion

According to the table, the **"Gradient Boosting Regressor with optimum n_estimators"** model is the best model but looks a bit overfitting. It can be tried to solve the overfitting problem by giving more data to the model or by regularization the model.

**Decision Tree Regressor** and **Random Forest Regressor** are severely overfitting in training set.So the generalization performance will be bad.Generalization performance can be improved by regularization models.

Also, the **Best Nonlinear SVM Regression** model seems to be a good model.

# <center>Thank you for reading my kernel and If you liked it, please do not forget to <font color="blue">UPVOTE </font>🙂 </center>

# <span id="27"></span>Resources

Resources I used while writing this kernel:
* https://www.kaggle.com/burhanykiyakoglu/predicting-house-prices
* Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems 2nd Edition by Aurelien Geron