# Decision Trees and Random Forests in Python


**Learning Objectives**


1. Explore and analyze data using a Pairplot
2. Train a single Decision Tree
3. Predict and evaluate the Decision Tree
4. Compare the Decision Tree model to a Random Forest


## Introduction 

In this lab, you explore and analyze data using a Pairplot, train a single Decision Tree, predict and evaluate the Decision Tree, and compare the Decision Tree model to a Random Forest.  Recall that the [Decision Tree](https://en.wikipedia.org/wiki/Decision_tree_learning) algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving both regression and classification problems too.  Simply, the goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from prior data(training data).

 Each learning objective will correspond to a _#TODO_ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive2/launching_into_ml/solutions/decision_trees_and_random_Forests_in_Python.ipynb)

## Load necessary libraries 
We will start by importing the necessary libraries for this lab.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

%matplotlib inline

## Get the Data

In [None]:
df = pd.read_csv("../kyphosis.csv")

In [None]:
df.head()

## Exploratory Data Analysis

**Lab Task #1:** Check a pairplot for this small dataset.

In [None]:
# TODO 1
# TODO -- Your code here.

## Train Test Split

Let's split up the data into a training set and a test set!

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X = df.drop("Kyphosis", axis=1)
y = df["Kyphosis"]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)

## Decision Trees

**Lab Task #2:** Train a single decision tree.

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
dtree = DecisionTreeClassifier()

In [None]:
# TODO 2
# TODO -- Your code here.

## Prediction and Evaluation 

Let's evaluate our decision tree.

In [None]:
predictions = dtree.predict(X_test)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
# TODO 3a
# TODO -- Your code here.

In [None]:
# TODO 3b
print(confusion_matrix(y_test, predictions))

## Tree Visualization

Scikit learn actually has some built-in visualization capabilities for decision trees, you won't use this often and it requires you to install the pydot library, but here is an example of what it looks like and the code to execute this:

In [None]:
from io import StringIO

import pydot
from IPython.display import Image
from sklearn.tree import export_graphviz

features = list(df.columns[1:])
features

In [None]:
dot_data = StringIO()
export_graphviz(
    dtree, out_file=dot_data, feature_names=features, filled=True, rounded=True
)

graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph[0].create_png())

## Random Forests

**Lab Task #4:** Compare the decision tree model to a random forest.

In [None]:
from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(n_estimators=100)
rfc.fit(X_train, y_train)

In [None]:
rfc_pred = rfc.predict(X_test)

In [None]:
# TODO 4a
# TODO -- Your code here.

In [None]:
# TODO 4b
# TODO -- Your code here.

Copyright 2021 Google Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.