# Decision Tree Classifier and Model Evaluation

This notebook provides detailed explanations and examples for decision tree classification and classification model evaluation metrics.

## Q1: Describe the Decision Tree Classifier Algorithm
A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It is a tree-like model where each internal node represents a decision based on a feature, each branch represents an outcome, and each leaf node represents a class label.

**How it works:**
1. The dataset is split into subsets based on feature values.
2. The splitting process continues recursively using criteria like Gini Impurity or Information Gain.
3. The process stops when a stopping condition is met (e.g., maximum depth, minimum samples per leaf).
4. To make predictions, the input data is traversed down the tree until a leaf node is reached.

## Q2: Mathematical Intuition Behind Decision Tree Classification
A Decision Tree splits data based on measures like Gini Impurity or Entropy. 

- **Entropy (Information Gain)**: Measures disorder in the dataset.
  \[ H(S) = - \sum p_i \log_2 p_i \]
  Information Gain is calculated as:
  \[ IG = H(parent) - \sum \frac{|S_{child}|}{|S_{parent}|} H(S_{child}) \]

- **Gini Impurity**: Measures impurity in the dataset.
  \[ Gini = 1 - \sum p_i^2 \]

A decision tree selects the feature that maximizes Information Gain or minimizes Gini Impurity.

## Q3: Using Decision Tree for Binary Classification
For a binary classification problem, a Decision Tree repeatedly splits the data into two groups until each subset is pure or meets a stopping criterion.

### Example: Classifying Emails as Spam or Not Spam
- Features: 'Contains word FREE?', 'Has attachment?', etc.
- The tree splits based on these features to classify an email as spam or not spam.

## Q4: Geometric Intuition of Decision Trees
A Decision Tree divides the feature space into rectangular regions.
Each split creates a new decision boundary perpendicular to the feature axis.

For example, a 2D dataset with two features (X, Y) will have axis-aligned decision boundaries, partitioning the space into different regions corresponding to class labels.

## Q5: Confusion Matrix and Model Evaluation
A **confusion matrix** is a table used to evaluate classification models. It has four values:

| Actual \ Predicted | Positive | Negative |
|--------------------|----------|----------|
| **Positive**      | TP       | FN       |
| **Negative**      | FP       | TN       |

- **True Positive (TP)**: Correctly predicted positive samples.
- **False Negative (FN)**: Incorrectly predicted as negative.
- **False Positive (FP)**: Incorrectly predicted as positive.
- **True Negative (TN)**: Correctly predicted negative samples.

In [None]:
import numpy as np
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score

# Example: True labels and predicted labels
y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0, 1, 0])

# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
f1 = f1_score(y_true, y_pred)

cm, precision, recall, f1


## Q7: Importance of Choosing the Right Evaluation Metric
Choosing the correct metric depends on the problem:
- **Accuracy**: Good for balanced datasets.
- **Precision**: Important when False Positives must be minimized (e.g., spam detection).
- **Recall**: Important when False Negatives must be minimized (e.g., cancer detection).
- **F1-score**: Useful when the dataset is imbalanced.

## Q8: Example Where Precision is Important
**Example: Email Spam Detection**

- If an email is mistakenly classified as spam (False Positive), an important email may be lost.
- High precision ensures fewer false positives, avoiding misclassification of non-spam emails.

## Q9: Example Where Recall is Important
**Example: Medical Diagnosis for Cancer**

- Missing a positive case (False Negative) means a person with cancer is not diagnosed.
- High recall ensures that fewer actual positive cases are missed, even at the cost of some false positives.