# Random Forest Classifiers
---

Using a Random Forest to make predictions about a categorical variable using numerical inputs.

`scikit-learn` has the iris dataset if you need it.

In [1]:
from sklearn import datasets

In [2]:
iris = datasets.load_iris()

Check it out:

In [3]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [4]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

We'll need `pandas` of course, as well as a classifier from `sklearn` and the `train_test_split` function.

In [5]:
import pandas as pd

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

So our `DataFrame` is

In [6]:
data = pd.DataFrame({
    'sepal_length' : iris.data[:, 0],
    'sepal_width' : iris.data[:, 1],
    'petal_length' : iris.data[:, 2],
    'petal_width' : iris.data[:, 3],
    'species' : iris.target,
})

In [7]:
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


Splitting features from target

In [8]:
x = data.loc[:, :'petal_width']
y = data.loc[:, 'species']

70/30 split into train/test data

In [9]:
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.3)

Making and training our classifier on train data

In [10]:
classifier = RandomForestClassifier(n_estimators=100)

In [11]:
classifier.fit(xtrain, ytrain)

RandomForestClassifier()

We can make predictions:

In [12]:
predictions = classifier.predict(xtest)

Or check our classifier's accuracy on the test set:

In [13]:
classifier.score(xtest, ytest)

0.9333333333333333