# Decision Trees

```{image} https://cdn.mathpix.com/snip/images/Hhr4nUXPvrS7xn4jX0WeDJFYuaZyBZT_P1WBCr1W4_k.original.fullsize.png
:alt: Decision Trees
:align: center
:width: 90%
```

Decision trees are a type of supervised learning algorithm used for both classification and regression tasks, though they are more commonly used for classification. They are called "decision trees" because the model uses a tree-like structure of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.

## How Decision Trees Work

The algorithm divides the data into two or more homogeneous sets based on the most significant attributes making the groups as distinct as possible. It uses a method called "recursive partitioning" or "splitting" to do this, which starts at the top of the tree (the "root") and splits the data into subsets by making decisions based on feature values. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the algorithm cannot make any further splits or when it reaches a predefined condition set by the user, such as a maximum tree depth or a minimum number of samples per leaf.

## Components of Decision Trees

- **Root Node**: Represents the entire dataset, which gets divided into two or more homogeneous sets.
- **Splitting**: Process of dividing a node into two or more sub-nodes based on certain conditions.
- **Decision Node**: After splitting, the sub-nodes become decision nodes, where further splits can occur.
- **Leaf/Terminal Node**: Nodes that do not split further, representing the outcome or decision.
- **Pruning**: Reducing the size of decision trees by removing parts of the tree that do not provide additional power to classify instances. This is done to make the tree simpler and to avoid overfitting.

```{image} https://cdn.mathpix.com/snip/images/Vq66kx7sefZXguwCIqy1TM60U-zcpGlsFYILno-HBQY.original.fullsize.png
:alt: Knn
:align: center
:width: 90%
```

## Criteria for Splitting

Decision trees use various metrics to decide how to split the data at each step:
- For classification tasks, commonly used metrics are Gini impurity, Entropy, and Classification Error.
- For regression tasks, variance reduction is often used.

## Example

Imagine you want to decide on the activity for your weekend. The decision could depend on multiple factors such as the weather and whether you have company. A decision tree for this scenario might look something like this:

- The root node starts with the question: "Is it raining?" 
    - If "Yes", the tree might direct you to a decision "Stay in and read".
    - If "No", it then asks, "Do you have company?" 
        - If "Yes", the decision might be "Go hiking".
        - If "No", the decision could be "Visit a cafe".

This example simplifies the decision tree concept. In real-world data science tasks, decision trees consider many more variables and outcomes, and the decisions are based on quantitative data from the features of the dataset.

## Advantages and Disadvantages

**Advantages:**
- Easy to understand and interpret.
- Requires little data preparation.
- Can handle both numerical and categorical data.
- Can handle multi-output problems.

**Disadvantages:**
- Prone to overfitting, especially with complex trees.
- Can be unstable because small variations in the data might result in a completely different tree being generated.
- Decision boundaries are linear, which may not accurately represent the data's actual structure.

To combat overfitting, techniques like pruning (reducing the size of the tree), setting a maximum depth for the tree, and ensemble methods like Random Forests are often used.
