# Random Forests - Python

### <b>Welcome to Lab 2c of Machine Learning with Python.</b>
<p> <b>Machine Learning is a form of artificial intelligence (AI), where the system can "learn" without explicitly being coded</b></p>

In this lab exercise, you will learn some popular machine learning algorithms. For <b>supervised learning</b>, we will discuss <b>decision trees</b> and <b>random forests</b>. In <b>unsupervised learning</b>, we will discuss <b>k-means clustering</b>, <b>agglomerative hierarchical clustering</b>, and <b>density-based clustering</b> or <b>DBSCAN</b>.


### Some Notebook Commands Reminders:
<ul>
    <li>Run a cell: CTRL + ENTER</li>
    <li>Create a cell above a cell: a</li>
    <li>Create a cell below a cell: b</li>
    <li>Change a cell to Markdown: m</li>
    
    <li>Change a cell to code: y</li>
</ul>

<b> If you are interested in more keyboard shortcuts, go to Help -> Keyboard Shortcuts </b>

# <u>Random Forests with RandomForestClassifier</u>

Import the <b>RandomForestClassifier</b> class from <b>sklearn.ensemble</b>

In [2]:
from sklearn.ensemble import RandomForestClassifier

Create an instance of the <b>RandomForestClassifier()</b> called <b>skullsForest</b>, where the forest has <b>10 decision tree estimators</b> (<i>n_estimators=10</i>) and the <b>criterion is entropy</b> (<i>criterion="entropy"</i>)

In [3]:
import pandas
from sklearn.cross_validation import train_test_split

my_data = pandas.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/HSAUR/skulls.csv", delimiter=",")
X = my_data.drop(my_data.columns[[0,1]], axis=1).values
y = my_data["epoch"]

X_trainset, X_testset, y_trainset, y_testset = train_test_split(X, y, test_size = 0.3)

skullsForest = RandomForestClassifier(criterion = "entropy", n_estimators=10)
skullsForest.fit(X_trainset, y_trainset)



RandomForestClassifier(bootstrap=True, class_weight=None, criterion='entropy',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

Let's use the same <b>X_trainset</b>, <b>y_trainset</b> datasets that we made when dealing with the <b>Decision Trees</b> above to fit <b>skullsForest</b>.
<br> <br>

<b>Note</b>: Make sure you have ran through the Decision Trees section.

Let's now create a variable called <b>predForest</b> using a predict on <b>X_testset</b> with <b>skullsForest</b>.

In [4]:
predForest = skullsForest.predict(X_testset)

You can print out <b>predForest</b> and <b>y_testset</b> if you want to visually compare the prediction to the actual values.

In [5]:
print(predForest)
print(y_testset)

['c200BC' 'c200BC' 'cAD150' 'c200BC' 'c200BC' 'c4000BC' 'c200BC' 'c4000BC'
 'c4000BC' 'c3300BC' 'c1850BC' 'c1850BC' 'c4000BC' 'c4000BC' 'c200BC'
 'cAD150' 'c200BC' 'c200BC' 'c1850BC' 'c3300BC' 'c200BC' 'c1850BC'
 'c200BC' 'c200BC' 'c1850BC' 'c200BC' 'c1850BC' 'c3300BC' 'c1850BC'
 'c3300BC' 'c200BC' 'c4000BC' 'cAD150' 'cAD150' 'cAD150' 'c3300BC'
 'c3300BC' 'c1850BC' 'cAD150' 'c1850BC' 'c200BC' 'c3300BC' 'c1850BC'
 'c200BC' 'c1850BC']
5      c4000BC
70     c1850BC
112     c200BC
91      c200BC
41     c3300BC
69     c1850BC
36     c3300BC
85     c1850BC
42     c3300BC
35     c3300BC
73     c1850BC
45     c3300BC
57     c3300BC
147     cAD150
9      c4000BC
109     c200BC
126     cAD150
148     cAD150
7      c4000BC
29     c4000BC
133     cAD150
141     cAD150
149     cAD150
49     c3300BC
25     c4000BC
79     c1850BC
146     cAD150
95      c200BC
75     c1850BC
13     c4000BC
119     c200BC
12     c4000BC
94      c200BC
53     c3300BC
34     c3300BC
113     c200BC
33     c3300BC
4      c

Let's check the accuracy of our model. <br>

Note: Make sure you have metrics imported from sklearn

In [6]:
import sklearn.metrics as metrics

print("RandomForests's Accuracy: "), metrics.accuracy_score(y_testset, predForest)

RandomForests's Accuracy: 


(None, 0.2222222222222222)

We can also see what trees are in our <b> skullsForest </b> variable by using the <b> .estimators_ </b> attribute. This attribute is indexable, so we can look at any individual tree we want.

In [7]:
skullsForest.estimators_

[DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None,
             max_features='auto', max_leaf_nodes=None,
             min_impurity_decrease=0.0, min_impurity_split=None,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, presort=False,
             random_state=1949699857, splitter='best'),
 DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None,
             max_features='auto', max_leaf_nodes=None,
             min_impurity_decrease=0.0, min_impurity_split=None,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, presort=False,
             random_state=2113252142, splitter='best'),
 DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None,
             max_features='auto', max_leaf_nodes=None,
             min_impurity_decrease=0.0, min_impurity_split=None,
             min_samples_leaf=1, min_samples_split=2,
          