# RandomForest Classifier

This is a basic implementation of a RandomForest classifier

In [1]:
import pandas as pd
import numpy as np

from RandomForest import RandomForest
from RandomForest import build_forest

## Dataset

The iris dataset is loaded and shuffled for training and testing purposes

In [2]:
iris_df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
iris_df = iris_df.sample(frac=1).reset_index(drop=True)

## Building The Forest

It is very straightforward to build the forest through the `build_forest` function. The parameters of the function work as follows:<br>
`attributes_sampling_rate` indicates the fraction of attributes that each tree of the forest will see during the training phase.<br>
`data_sampling_rate` indicates the fractions of the data that each tree will see during the training phase.<br>
`n_trees` is the number of trees that will be build.<br>
`data` is the training dataset.<br>
`label_column` is the name of the column that contains the labels.
<br><br>
The attributes and the data are of course sampled randomly for the training of each tree.

In [4]:
forest = build_forest(attributes_sampling_rate=.5, data_sampling_rate=.5, n_trees=6, data=iris_df[:-10], label_column=iris_df.columns[-1])

## Making Predictions

In order to make predictions with the trained forest it is sufficient to call the `predict` method of the `RandomForest` object, passing it the data one wants to predict labels for. In addition to the predicted class, the method will return a numeric value that indicates the fraction of trees that predicted that outcome.

In [10]:
row_idx = 5

In [11]:
forest.predict(iris_df[iris_df.columns[:-1]][row_idx:row_idx+1])

[['virginica', 0.8333333333333334]]

### ..more predictions...

In [5]:
iris_df[-20:]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
130,5.8,2.7,5.1,1.9,virginica
131,6.1,2.8,4.7,1.2,versicolor
132,5.0,2.0,3.5,1.0,versicolor
133,7.2,3.2,6.0,1.8,virginica
134,5.2,2.7,3.9,1.4,versicolor
135,6.4,2.7,5.3,1.9,virginica
136,6.4,3.1,5.5,1.8,virginica
137,6.5,3.2,5.1,2.0,virginica
138,6.3,2.7,4.9,1.8,virginica
139,5.0,3.4,1.6,0.4,setosa


In [7]:
forest.predict(iris_df[iris_df.columns[:-1]][-20:])

[['virginica', 0.8333333333333334],
 ['versicolor', 1.0],
 ['versicolor', 0.5],
 ['virginica', 1.0],
 ['versicolor', 0.8333333333333334],
 ['virginica', 1.0],
 ['virginica', 1.0],
 ['virginica', 1.0],
 ['virginica', 0.6666666666666666],
 ['setosa', 1.0],
 ['setosa', 1.0],
 ['virginica', 0.6666666666666666],
 ['versicolor', 0.8333333333333334],
 ['versicolor', 0.8333333333333334],
 ['setosa', 1.0],
 ['setosa', 1.0],
 ['virginica', 0.8333333333333334],
 ['virginica', 0.5],
 ['versicolor', 1.0],
 ['versicolor', 0.6666666666666666]]