
# Day 12: Decision Trees for Regression and Classification

Today, we will dive into Decision Trees, one of the most interpretable and widely-used models in machine learning. Decision trees are flexible and powerful tools that can handle both regression and classification tasks. They provide a clear structure that resembles a flowchart, making them easy to understand and interpret.

Topics Covered:
- Introduction to Decision Trees
- How Decision Trees Work
- Key Concepts: Entropy, Gini Impurity, Information Gain
- Decision Trees for Classification
- Decision Trees for Regression
- Advantages and Disadvantages of Decision Trees
- Evaluation Metrics for Decision Trees

## 1. Introduction to Decision Trees

A Decision Tree is a model that uses a tree-like structure to make decisions based on the input data. At each node in the tree, the model asks a question, and depending on the answer, it moves to a subsequent node. This process continues until a final decision is reached (a leaf node), either for classification or regression.

### Key charactertics:

- Nodes: 
    - Where a decision is made.
- Edges: 
    - The result of a decision, leading to the next node.
- Leaves: 
    - The final decision or prediction at the end of a path.

Decision trees can be used for:

- Classification: Predicting a category or label (e.g., whether an email is spam or not).
- Regression: Predicting a continuous value (e.g., predicting house prices).


## 2. How Decision Trees Work

A decision tree builds itself by splitting the data at each node based on a specific feature that results in the highest information gain (or the lowest Gini impurity). 
The goal is to create branches that effectively divide the data into subsets where the target variable is as homogenous as possible.

### Example: Predicting Customer Churn

Imagine you want to predict whether a customer will churn based on various factors like contract length, monthly charges, and tenure. The decision tree model will first identify the most important feature (e.g., monthly charges) to split the data. Based on the answer (e.g., whether monthly charges are above or below a threshold), the tree moves to the next node, asking another question (e.g., how long the contract is), and so on.


## 3. Key Concepts: Entropy, Gini Impurity, and Information Gain

### 3.1 Entropy

Entropy is a measure of disorder or impurity in the dataset. The higher the entropy, the more mixed the data is.

Formula for Entropy:

Where:
- $ P(x_i) $ is the probability of class $ i $.
- Entropy is 0 if all the data belongs to one class (perfectly pure), and it increases as the data becomes more mixed.

### 3.2 Gini Impurity

Gini Impurity is another measure of how often a randomly chosen element from the set would be incorrectly classified. Lower Gini Impurity values indicate better splits.

Where:
- $ P(x_i) $ is the probability of class $ i $.

### 3.3 Information Gain

Information Gain tells us how much entropy is reduced after splitting the dataset on a particular feature. It helps the decision tree decide which feature to split on at each node.

Where:
- $ H(parent) $ is the entropy of the parent node.
- $ H(child_i) $ is the entropy of the child nodes after the split.