# Decision Trees and Random Forests

* In this activity, you will compare the performance of a decision tree to a random forest classifier using the Pima Diabetes DataSet.

## Instructions

* Use the Pima Diabetes DataSet and train a decision tree classifier to predict the diabetes label (positive or negative). Print the score for the trained model using the test data.

* Repeat the exercise using a Random Forest Classifier with SciKit-Learn. You will need to investigate the SciKit-Learn documentation to determine how to build and train this model.

* Experiment with different numbers of estimators in your random forest model. Try different values between 100 and 1000 and compare the scores.

### Import dependencies

In [None]:
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import os

### Load data

In [None]:
df = pd.read_csv(os.path.join("..", "Resources", "diabetes.csv"))
df.head()

### Pull our target column from the data and create a list of our outcome values

In [None]:
target = df["Outcome"]
target_names = ["negative", "positive"]

### Drop the target column from our data 

In [None]:
data = df.drop("Outcome", axis=1)
feature_names = data.columns
data.head()

### Split the data into training and test sets

In [None]:
X_train, X_test, y_train, y_test = train_test_split(data, target, random_state=42)

### Create a Decision Tree Classifier and fit the training data and score with the test data

In [None]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)
clf.score(X_test, y_test)

### Create a Random Forest Classifier and fit the training data and score with the test data

In [None]:
rf = RandomForestClassifier(n_estimators=200)
rf = rf.fit(X_train, y_train)
rf.score(X_test, y_test)

### BONUS: View the features, sorted by importance

In [None]:
sorted(zip(rf.feature_importances_, feature_names), reverse=True)