## Decision Tree Algorithms

Implementations: https://medium.com/@ldeassis/decision-tree-algorithms-a-comprehensive-guide-8a15a5ddc318#9d83

Algorithms: https://www.geeksforgeeks.org/machine-learning/decision-tree-algorithms/

#### Decision Tree Types
- **Classification Trees**: Used for predicting categorical outcomes like spam or not spam. These trees split the data based on features to classify data into predefined categories.
- **Regression Trees**: Used for predicting continuous outcomes like predicting house prices. Instead of assigning categories, it provides numerical predictions based on the input features.

#### Splitting Criteria
In a Decision Tree, the process of splitting data at each node requires splitting criteria. The splitting criteria finds the best feature to split the data on. Common splitting criteria include Gini Impurity and Entropy.
- **Gini Impurity**: This criterion measures how "impure" a node is. The lower the Gini Impurity the better the feature splits the data into distinct categories.
- **Entropy**: This measures the amount of uncertainty or disorder in the data. The tree tries to reduce the entropy by splitting the data on features that provide the most information about the target variable.

Gini vs. Entropy "Why use Gini instead of Entropy?"
- **Gini**: Easier to calculate (no logarithms involved). It is the default in libraries like scikit-learn.
- **Entropy**: Uses $log_2$. It is slightly more computationally expensive but can be more sensitive to small changes in probabilities.
- **In Practice**: They yield the same results 95% of the time. Choosing one over the other rarely changes the model's performance significantly.

#### Advantages of Decision Trees
- **Easy to Understand**: Decision Trees are visual which makes it easy to follow the decision-making process.
- **Versatility**: Can be used for both classification and regression problems.
- **No Need for Feature Scaling**: Unlike many machine learning models, it doesnâ€™t require us to scale or normalize our data.
- **Handles Non-linear Relationships**: It capture complex, non-linear relationships between features and outcomes effectively.
- **Interpretability**: The tree structure is easy to interpret helps in allowing users to understand the reasoning behind each decision.
- **Handles Missing Data**: It can handle missing values by using strategies like assigning the most common value or ignoring missing data during splits.

#### Disadvantages of Decision Trees
- **Overfitting**: They can overfit the training data if they are too deep which means they memorize the data instead of learning general patterns. This leads to poor performance on unseen data.
- **Instability**: It can be unstable which means that small changes in the data may lead to significant differences in the tree structure and predictions.
- **Bias towards Features with Many Categories**: It can become biased toward features with many distinct values which focuses too much on them and potentially missing other important features which can reduce prediction accuracy.
- **Difficulty in Capturing Complex Interactions**: Decision Trees may struggle to capture complex interactions between features which helps in making them less effective for certain types of data.
- **Computationally Expensive for Large Datasets**: For large datasets, building and pruning a Decision Tree can be computationally intensive, especially as the tree depth increases.

#### Applications
- **Loan Approval in Banking**: Banks use Decision Trees to assess whether a loan application should be approved. The decision is based on factors like credit score, income, employment status and loan history. This helps predict approval or rejection helps in enabling quick and reliable decisions.
- **Medical Diagnosis**: In healthcare they assist in diagnosing diseases. For example, they can predict whether a patient has diabetes based on clinical data like glucose levels, BMI and blood pressure. This helps classify patients into diabetic or non-diabetic categories, supporting early diagnosis and treatment.
- **Predicting Exam Results in Education**: Educational institutions use to predict whether a student will pass or fail based on factors like attendance, study time and past grades. This helps teachers identify at-risk students and offer targeted support.

### Pruning
Pruning is the technique of removing branches that have little importance, which helps to prevent overfitting. Pruning can be done pre-pruning (stopping the tree growth early) or post-pruning (removing branches from a fully grown tree). 
- Pre-pruning: Stops the tree growth early based on criteria like maximum depth or minimum number of samples required to split a node.
- Post-pruning: Allows the tree to grow fully and then removes nodes that do not provide substantial predictive power.