# Implementation of random forest in classification task

### Abstract
Random forest in the classification task implemented in Python. During the construction of each tree,
the tests (from the full set of tests) are selected by threshold selection in combination with roulette,
i.e. we reject some of the worst, and then each of the other tests has a chance of being selected
proportional to its quality.

### Import utility functions and classifiers

In [24]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from classifiers import DecisionTreeClassifier, RandomForestClassifier
import time

### Accuracy measure of quality

In [2]:
def accuracy(y_true, y_pred):
    return np.sum(y_true == y_pred) / len(y_true)

## Tests on different datasets

### Cancer breast dataset

In [3]:
data = datasets.load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=1
)

In [25]:
clf = RandomForestClassifier(n_estimators=10, max_depth=10, min_sample_split=2)
clf.fit(X_train, y_train)


Training tree number 1
Training tree number 2
Training tree number 3
Training tree number 4
Training tree number 5
Training tree number 6
Training tree number 7
Training tree number 8
Training tree number 9
Training tree number 10


In [26]:
y_pred = clf.predict(X_test)
print(y_pred)
print("Accuracy: {:.2f}%".format(100 * accuracy(y_test, y_pred)))

[0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1]
Accuracy: 93.86%


9
