# Decision Tree
Decision trees are a type of supervised learning algorithm used for both classification and regression tasks, though they are more commonly used for classification. They are called "decision trees" because the model uses a tree-like structure of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.

```{image} https://cdn.mathpix.com/snip/images/Hhr4nUXPvrS7xn4jX0WeDJFYuaZyBZT_P1WBCr1W4_k.original.fullsize.png
:alt: Decision Trees
:align: center
:width: 90%
```

## How Decision Trees Work

The algorithm divides the data into two or more homogeneous sets based on the most significant attributes making the groups as distinct as possible. It uses a method called "recursive partitioning" or "splitting" to do this, which starts at the top of the tree (the "root") and splits the data into subsets by making decisions based on feature values. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the algorithm cannot make any further splits or when it reaches a predefined condition set by the user, such as a maximum tree depth or a minimum number of samples per leaf.

```{image} https://cdn.mathpix.com/snip/images/TnyrrmBOFzHW1cLDSclfhls4EIah5V3T39FNii0hGl0.original.fullsize.png
:alt: Decision Trees
:align: center
:width: 90%
```

### Components of Decision Trees

- **Root Node**: Represents the entire dataset, which gets divided into two or more homogeneous sets.
- **Splitting**: Process of dividing a node into two or more sub-nodes based on certain conditions.
- **Decision Node**: After splitting, the sub-nodes become decision nodes, where further splits can occur.
- **Leaf/Terminal Node**: Nodes that do not split further, representing the outcome or decision.
- **Pruning**: Reducing the size of decision trees by removing parts of the tree that do not provide additional power to classify instances. This is done to make the tree simpler and to avoid overfitting.

```{image} https://cdn.mathpix.com/snip/images/Vq66kx7sefZXguwCIqy1TM60U-zcpGlsFYILno-HBQY.original.fullsize.png
:alt: Knn
:align: center
:width: 90%
```

### Criteria for Splitting

Decision trees use various metrics to decide how to split the data at each step:
- For classification tasks, commonly used metrics are Gini impurity, Entropy, and Classification Error.
- For regression tasks, variance reduction is often used.

### Example

Imagine you want to decide on the activity for your weekend. The decision could depend on multiple factors such as the weather and whether you have company. A decision tree for this scenario might look something like this:

- The root node starts with the question: "Is it raining?" 
    - If "Yes", the tree might direct you to a decision "Stay in and read".
    - If "No", it then asks, "Do you have company?" 
        - If "Yes", the decision might be "Go hiking".
        - If "No", the decision could be "Visit a cafe".

This example simplifies the decision tree concept. In real-world data science tasks, decision trees consider many more variables and outcomes, and the decisions are based on quantitative data from the features of the dataset.

### Advantages and Disadvantages

**Advantages:**
- Easy to understand and interpret.
- Requires little data preparation.
- Can handle both numerical and categorical data.
- Can handle multi-output problems.

**Disadvantages:**
- Prone to overfitting, especially with complex trees.
- Can be unstable because small variations in the data might result in a completely different tree being generated.
- Decision boundaries are linear, which may not accurately represent the data's actual structure.

To combat overfitting, techniques like pruning (reducing the size of the tree), setting a maximum depth for the tree, and ensemble methods like Random Forests are often used.

## Decision Tree Regressor
A Decision Tree Regressor is a type of machine learning model used for predicting continuous values, unlike its counterpart, the Decision Tree Classifier, which predicts categorical outcomes. It works by breaking down a dataset into smaller subsets while simultaneously developing an associated decision tree. The final result is a tree with decision nodes and leaf nodes.

The Decision Tree Regressor uses the Mean Squared Error (MSE) as a measure to decide on the best split at each decision node. MSE is a popular metric used to evaluate the performance of a regression model, indicating the average squared difference between the observed actual outturns and the predictions made by the model. The goal of the regressor is to minimize the MSE at each step of building the tree.

### How it Works Using MSE

1. **Starting at the Root**: The entire dataset is considered as the root.
2. **Best Split Decision**: To decide on a split, it calculates the MSE for every possible split in every feature and chooses the one that results in the lowest MSE. This split is the one that, if used to split the dataset into two groups, would result in the most similar responses within each group.
3. **Recursion on Subsets**: This process of finding the best split is then recursively applied to each resulting subset. The recursion is completed when the algorithm reaches a predetermined stopping criterion, such as a maximum depth of the tree or a minimum number of samples required to split a node further.
4. **Prediction**: For a prediction, the input features of a new data point are fed through the decision tree. The path followed by the data point through the tree leads to a leaf node. The average of the values in this leaf node is used as the prediction.

```{image} https://cdn.mathpix.com/snip/images/SX0hVPA_he1R54UP1hajzXcda2lWuXlqpSFPk89WX0M.original.fullsize.png
:alt: Decision Tree
:align: center
:width: 80%
```
---

```{image} https://cdn.mathpix.com/snip/images/ZlP7-kSqXm9bkBqv5Wb51JKMgQC-LhitRem27EIHcYs.original.fullsize.png
:alt: Decision Tree
:align: center
:width: 80%
```

---

```{image} https://cdn.mathpix.com/snip/images/PF7mlEk6tZHSX8YYJ79JD_4T7owgaCL_3XHwni9npjg.original.fullsize.png
:alt: Decision Tree
:align: center
:width: 80%
```

### Example

Imagine we are using a dataset of houses, where our features include the number of bedrooms, the number of bathrooms, square footage, and the year built, and our target variable is the house price.

1. **Root**: Initially, the entire dataset is the root.
2. **Best Split Calculation**: The algorithm evaluates all features and their possible values to find the split that would result in subsets with the most similar house prices (lowest MSE). Suppose the best initial split divides the dataset into houses with less than 2 bathrooms and houses with 2 or more bathrooms.
3. **Recursive Splitting**: This process continues, with each subset being split on features and feature values that minimize the MSE within each resulting subset. For instance, within the subset of houses with less than 2 bathrooms, the next split might be on the number of bedrooms.
4. **Stopping Criterion Reached**: Eventually, when the stopping criteria are met (for example, a maximum depth of the tree), the splitting stops.
5. **Making Predictions**: To predict the price of a new house, we would input its features into the decision tree. The house would follow a path down the tree determined by its features until it reaches a leaf node. The prediction would be the average price of the houses in that leaf node.

This example simplifies the complexity involved in building a decision tree regressor but gives an outline of how MSE is used to create a model that can predict continuous outcomes like house prices.

```{image} https://cdn.mathpix.com/snip/images/bE3EaEGRPMjmp_y36LHCt8eMgv31K5AY-hpV3U3TpyM.original.fullsize.png
:alt: Decision Tree
:align: center
:width: 80%
```

---

```{image} https://cdn.mathpix.com/snip/images/7f1Ex-fm3pieQJhP7RZ8L8vqyHHhs2AFLPlHl0LgElg.original.fullsize.png
:alt: Decision Tree
:align: center
:width: 80%
```
