### <span style = "color:red"> Ensemble Methods </span>

The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.

## <span style="color:blue"> Types of Ensemble </span>

_Two families of ensemble methods are usually distinguished:_

In ***averaging methods***, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.

Examples: Bagging methods, Forests of randomized trees, …

By contrast, in ***boosting methods***, base estimators are built sequentially and one tries to reduce the bias of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble.

Examples: AdaBoost, Gradient Tree Boosting, …


# <span style = "color:green"> Random Forest </span>

A random forest is a meta estimator that fits a number of decision trees on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).

*** As Random forest is built on multiple decision trees, we can apply random forest on both classification and regression models ***

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

In [2]:
usedcars_df = pd.read_csv("Usedcarssales.csv")

In [3]:
y = usedcars_df["Price"]
X = usedcars_df.drop(["Price"],axis=1)

In [4]:
X_train,X_test,y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 42)

In [5]:
reg = RandomForestRegressor()

In [6]:
reg.fit(X_train,y_train)

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

In [7]:
predicted = reg.predict(X_test)

In [8]:
rmse = np.sqrt(mean_squared_error(y_test,predicted))
print("Root Mean Squared Error:", rmse)

Root Mean Squared Error: 0.4526026591598191
