# Inroduction

The main idea of random forest is using of <b>bagging</b> in decision trees (parallel training some number of trees where a train dataset for each of them is some random subset of initial train dataset - for example, random 90% of initial dataset) with making predicates from random subset of set of features in each node of the tree (these features are chosen randomly in each node). In bagging itself and in random forest method in particular we must use deep trees.

There are recommendation about what part of features you need to take in each node while building of the random forest:
* For classification: $\dfrac{1}{3}$ of size of initial set of features
* For regression: square root of size of initial set of features

Then we use majority vote (consider an object as belonged to a class accordance to class for which majority trees voted) for classification problems, and taking average for regression problems on otput of the model.

Advantage of such approach is absence of overfitting - we can't overfit our model while increasing of nuumber of trees.

Disadvantages of the method - fitting and obtaining of predictions takes much time (we must train all the trees and make prediction for our object on all the models) even with using of parallel training of the trees; sometimes the method create systematic errors despite of huge depth of our trees (it can be fixed sometimes by XGBoost).

# On the way to random forest

## Binary decision tree

For start, let's build a decision tree without limits of depth on data of Boston dataset again.

In [None]:
!pip install mlxtend

In [1]:
from mlxtend.data import boston_housing_data # right way to load boston dataset

In [2]:
from sklearn.model_selection import train_test_split

In [3]:
X, y = boston_housing_data()
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.3,
                                                    random_state=123,
                                                    shuffle=True)

In [4]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

In [5]:
dt = DecisionTreeRegressor()
dt.fit(X_train, y_train)
mean_squared_error(y_test, dt.predict(X_test))

19.084473684210526

In [6]:
mean_squared_error(y_train, dt.predict(X_train))

# You can see zero error on tree, so the model is overfitted

0.0

Let's look on deconposition of bias and variance

In [7]:
from mlxtend.evaluate import bias_variance_decomp

In [8]:
avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(
        dt, X_train, y_train, X_test, y_test, loss='mse')

In [9]:
avg_expected_loss

32.26238322368421

In [10]:
avg_bias

14.433280445723684

In [11]:
avg_var

17.829102777960525

We can see that our tree is overfitted (large dispersion and low variance). We can fix it by using composition of trees trained on some subsets of initial train dataset - i.e. <b>bagging</b>.

We will see further that composition of trees has the same variance as one tree but much more lower dispersion.

## Bagging

In [12]:
from sklearn.ensemble import BaggingRegressor

In [13]:
base_tree = DecisionTreeRegressor(random_state=123)

# bagging with 20 trees
bagging_regressor = BaggingRegressor(base_tree, 20)
bagging_regressor.fit(X_train, y_train)

avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(
        bagging_regressor, X_train, y_train, X_test, y_test, loss='mse')

In [14]:
avg_expected_loss

19.601350712993423

In [15]:
avg_bias

15.408263925193266

In [16]:
avg_var

4.193086787800165

In [17]:
mean_squared_error(y_test, bagging_regressor.predict(X_test))

20.718313486842106

In [18]:
mean_squared_error(y_train, bagging_regressor.predict(X_train))

# You can see non-zero error on tree, so the model isn't overfitted

8.225731779661018

And finally let's consider random forest

## Random forest

Compared to usual parameters in one decision tree building here we consider the following parameters:
* `max_features` - number of features based on which you build your predicates in nodes of the trees
* `n_estimators` - number of trees in random forest 

In [19]:
from sklearn.ensemble import RandomForestRegressor

In [20]:
random_forest = RandomForestRegressor(n_estimators=20)
random_forest.fit(X_test, y_test)

In [21]:
avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(
        random_forest, X_train, y_train, X_test, y_test, loss='mse')

In [22]:
avg_expected_loss

19.69901223930921

In [23]:
avg_bias

15.761492524527148

In [24]:
avg_var

3.9375197147820717

In [25]:
mean_squared_error(y_test, random_forest.predict(X_test))

21.00887730263157

In [26]:
mean_squared_error(y_train, random_forest.predict(X_train))

# You can see non-zero error on tree, so the model isn't overfitted

6.321338559322034

As you can see the lowest avg_var has random forest and avg_bias almost the same in all the methods.

### Out-of-bag error

While making a random forest, the same way as in bagging, each tree is builded on bootstraped subset of initial train dataset by random choise with repeats. Some objects are included in this subset a few times and some objects aren't included in this subset at all. For each tree we can consider objects which weren't included into bootstraped subset and use it for validation.

Average validation error from all the trees in this case is <b>Out-of-bag error</b>.

In [27]:
X, y = boston_housing_data()
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.3,
                                                    random_state=123,
                                                    shuffle=True)

rf = RandomForestRegressor(n_estimators=100, random_state=123, oob_score=True)
rf.fit(X_train, y_train)
# oob_score_ is R2 error on objects not included into train subset for trees of the random forest
rf.oob_score_

0.8760889947613861

As one decision tree, random forest can evaluate weights of the features, too

In [31]:
rf.feature_importances_

array([0.0285272 , 0.00148259, 0.00466126, 0.00051459, 0.01864846,
       0.50848227, 0.01436128, 0.05178391, 0.00490432, 0.01712143,
       0.01432396, 0.00889071, 0.326298  ])