### Understanding a Decision Tree

A decision tree is the building block of a random forest and is an intuitive model. 

We can think of a decision tree as a series of yes/no questions asked about our data eventually leading to a predicted class (or continuous value in the case of regression). This is an interpretable model because it makes classifications much like we do: we ask a sequence of queries about the available data we have until we arrive at a decision (in an ideal world).

The technical details of a decision tree are in how the questions about the data are formed. 

**In the CART algorithm, a decision tree is built by determining the questions (called splits of nodes) that, when answered, lead to the greatest reduction in Gini Impurity. What this means is the decision tree tries to form nodes containing a high proportion of samples (data points) from a single class by finding values in the features that cleanly divide the data into classes.**

<p>
    <img src = assets/1.png height = 400px width = 400px>
</p>

Our data only has two features (predictor variables), x1 and x2 with 6 data points — samples — divided into 2 different labels. Although this problem is simple, it’s not linearly separable, which means that we can’t draw a single straight line through the data to classify the points.


We can however draw a series of straight lines that divide the data points into boxes, which we’ll call nodes. In fact, this is what a decision tree does during training. Effectively, a decision tree is a non-linear model built by constructing many linear boundaries.

#### Visualizing a Decision Tree

So what’s actually going on when we train a decision tree?

<p>
    <img src = assets/2.png height = 600px width = 600px>
</p>

All the nodes, except the leaf nodes (colored terminal nodes), have 5 parts:

- Question asked about the data based on a value of a feature. Each question has either a True or False answer that splits the node. Based on the answer to the question, a data point moves down the tree.
- `gini`: The Gini Impurity of the node. The average weighted Gini Impurity decreases as we move down the tree.
- `samples`: The number of observations in the node.
- `value`: The number of samples in each class. For example, the top node has 2 samples in class 0 and 4 samples in class 1.
- `class`: The majority classification for points in the node. In the case of leaf nodes, this is the prediction for all samples in the node.

**The leaf nodes do not have a question because these are where the final predictions are made. To classify a new point, simply move down the tree, using the features of the point to answer the questions until you arrive at a leaf node where the `class` is the prediction.**

To make see the tree in a different way, we can draw the splits built by the decision tree on the original data.

<p>
    <img src = assets/3.png height = 400px width = 400px>
</p>

Each split is a single line that divides data points into nodes based on feature values. For this simple problem and with no limit on the maximum depth, the divisions place each point in a node with only points of the same class. (Again, later we’ll see that this perfect division of the training data might not be what we want because it can lead to overfitting.)

### REFERENCES:
- [Random Forest Implementation and Explanation](https://towardsdatascience.com/an-implementation-and-explanation-of-the-random-forest-in-python-77bf308a9b76)