# Chapter 17. Decision Trees

In [3]:
from __future__ import division
from collections import Counter, defaultdict
from functools import partial
import math, random

DataSciencester's VP of Talent has interviewed a number of job candidates from the site, with varying degrees of success.  
He's collected a data set consisting of several (qualitative) attributes of each candidate, as well as whether that candidate interviewed well or poorly.  
Could you, he asks, use this data to build a model identifying which candidates will interview well, so that he doesn't have to waste time conducting interviews?  
This task seems like a good fit for a [decision tree](https://en.wikipedia.org/wiki/Decision_tree), which is another predictive modeling tool in the data scientist's kit.

## What is a Decision Tree?

A decision tree uses a tree structure to represent a number of possible [decision paths](http://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html) and an outcome for each path.  
If you have ever played the game [Twenty Questions](https://en.wikipedia.org/wiki/Twenty_Questions), then you are familiar with decision trees and how they work.

Decision trees have a lot to recommend them.  
They are very easy to understand and interpret, and the process by which they reach a prediction is very transparent.  
Decision trees can easily handle a mix of numeric and categorical attributes as well as classify data for which attributes are missing.  

However, finding an "optimal" decision tree for a set of training data is computationally a very hard problem.  
We will work around this problem by trying to build a good-enough tree rather than an optimal one, although for large data sets this can still be a lot of work.  
More importantly, it is very easy (and very bad) to build decision trees that are [overfitted](https://en.wikipedia.org/wiki/Overfitting) to the training data, and that don't generalize well to unseen data.  
We'll look at ways to address this.

Decision trees are often divided into [classification trees](http://www.stat.wisc.edu/~loh/treeprogs/guide/wires11.pdf), which produce categorical outputs, and [regression trees](http://www.stat.wisc.edu/~loh/treeprogs/guide/wires11.pdf), which produce numeric outputs.  
In this chapter, we will focus on classification trees.  
We'll work through the [ID3 algorithm](https://en.wikipedia.org/wiki/ID3_algorithm) for learning a decision tree from a set of labeled data, which should help us understand how decision trees actually work.  
To make things simple, we'll restrict ourselves to problems with binary outputs like, "Should I hire this candidate?"

## Entropy