### ** Trees: Ensemble Methods - Extra Trees

In Extra Trees, the features and splits are selected at random. All the data available in the training set is used to build each stump.

*Since splits are chosen at random for each feature in the Extra Trees Classifier, it’s less computationally expensive than a Random Forest. Extra Trees also show lower variance compared to Random Forests.*

![](images/extratrees.png)

The main difference between random forests and extra trees lies in the fact that, instead of computing the locally optimal feature/split combination (for the random forest), for each feature under consideration, a random value is selected for the split (for the extra trees).

In [6]:
import pandas as pd
import numpy as np

from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import f1_score

In [7]:
#load dataset

data = pd.read_csv('data/rf_data/data.csv', index_col=0)
y = data['shoe_size']
x = data.drop('shoe_size', axis=1)

#train,test split

X_train,X_test,y_train,y_test = train_test_split(x,y,random_state=42)

#ExtraTrees with gini
etc = ExtraTreesClassifier(criterion='gini',max_depth=5,n_estimators=200)

etc.fit(X_train,y_train)

etc_predict = etc.predict(X_test)

#f1_score(y_test, etc_predict, average=None)

mean_absolute_error(y_test, etc_predict)

1.6341463414634145