## Decision Trees

### Introduction to decision trees

A **decision tree** is a set of questions that check if a condition is met and flow from one into the other to make a decision. Depending on the answer, further questions are asked to narrow down possible outcomes and arrive at a decision. A decision tree is typically depicted as an upside down tree. The parts of a decision tree are:

- **Decision node:** where a feature of the data is tested.
- **Root node:** the first decision node in the tree.
- **Branch:** a connection between decision nodes in the tree.
- **Layer:** all of the decision nodes that are the same distance from the root node.
- **Leaf node:** a node that does not test a feature and so is where a decision is made.
- **Depth:** the number of layers in a decision tree.

A **binary decision tree** is a decision tree in which each node has two branches.

### Classification trees

A classification tree is a decision tree used for classifying an object or event into a categorical feature. Classification is the most common use of decision trees. For classification trees, the leaves represent the class of a new instance. Ex: A bank classifies transactions into "fraudulent" or "non fraudulent"; a mobile advertiser decides whether to send a phone user an advertisement.

| ![Simple Decision Tree](An-example-of-a-simple-decision-tree.png) |
|:--:|
| <b>Source: https://online.visual-paradigm.com/knowledge/decision-tree/what-is-decision-tree/</b>|

1. This decision tree determines whether or not to purchase a car. The first question, "Is the car red?", is the root node.
2. The two possible answers are yes or no. Each answer is a branch on the decision tree.
3. If the answer is no, we come to another decision point. "Is the color yellow?" has two possible answers.  If the answer is no, we come to the 'Don't Buy" node - a leaf node.
4. If the car is red, then model year is examined at a decision node.
5. If the model year is newer than 2010, then the car should be purchased. Otherwise, there is another decision point regarding odometer miles before a leaf node is reached and a decision made.

### Introduction to regression trees

A regression tree is a model to predict a numerical value from a decision tree. A regression tree is built by dividing a feature into regions and finding the mean of each region. Generally, a regression tree model performs better than a linear regression model when the relationship is not clearly linear. Ex: The price of a car with respect to its age, and the expected value of a basketball shot based on its distance. But, cross validation techniques should always be used to compare model performance for any regression task.

| ![Decision Tree Regression](DecisionTreeRegression.png) |
|:--:|
| <b>Source: **Data Science Foundations with Python with zyLabs**</b>|

1. The regression tree first considers the house's number of bedrooms. If the house has one bedroom, the tree suggests \\$120 per square foot for the house's price.
2. If the house has two or more bedrooms, the tree checks the number of bathrooms. If two or more bathrooms exist, the tree suggests a price of $210 per square foot.
3. If there is only one bathroom, the number of parking spaces is used to estimate the price for a house.



**Advantages of a regression tree over linear regression.**

| Linear regression	| Regression tree |
| :---------------- | :-------------- |
| Fits linear data better	| Has more flexibility with non-linear data
| Missing values are discarded	| Missing values can be converted to a separate class
| Heavily affected by outliers	| Not affected by outliers
| Difficult to interpret graphically when the model has many features	| Easy to interpret even when the model has many features
| Statistical significance of each feature can be measured easily	| Determining the importance of each feature is not straightforward |



### Classification and Regression Trees (CART) algorithm

**Classification and Regression Tree algorithm (CART)** is a method used to build trees by repeatedly splitting data with a threshold into two regions. A **threshold** is a numerical value that divides a feature into two parts: values above the threshold and below. A regression tree using CART is obtained by finding the nodes and branches that minimize the **residual sums of squares (RSS)** and consequently, **mean squared error (MSE)**, at each step.

**CART algorithm for regression.**

For each decision node:

Step 1: Select one of the features, $ x $.

Step 2: Divide the training data into two regions, $ R_1 $ and $ R_2 $, by setting a threshold for $ x $ to split the data or choosing a particular value of $ x $ if $ x $ is categorical.

Step 3: Calculate RSS where the predicted values $ \hat{y}_i $ are the average of the observed values $ y_i $ within each region.

Step 4: Repeat steps 2 and 3 for each possible threshold or particular value.

Step 5: Repeat steps 1-4 for each possible feature.

Step 6: The feature and threshold/particular value with the lowest RSS are selected to split the training set at the node. Splitting stops according to a pre-set criterion such as having a minimum number of instances in a leaf or a maximum number of terminal nodes.