# Module67 Decision Tree Assignment1

Q1. Describe the decision tree classifier algorithm and how it works to make predictions.

A1. A Decision Tree Classifier is a supervised machine learning algorithm used for both classification and regression tasks. It splits the data into subsets based on feature values, forming a tree-like structure where:

Internal nodes represent decision points on features.
Branches represent outcomes of decisions.
Leaf nodes represent class labels.

### How It Works:

**1.) Split Selection:** At each node, the algorithm selects the feature and threshold that best split the data into subsets, minimizing impurity.

**2.) Impurity Metrics:**

a.) Gini Index

```G = 1 − ∑(from i=1 to C)​ pi^2​ ```

b.) Entropy

``` H = - ∑(fro i= 1 to C) pi * log 2 (pi ) ```

**3.) Recursive Splitting:** Splits continue until stopping criteria are met (e.g., max depth or min samples per leaf).

**4.) Prediction:** A new data point traverses the tree based on feature values, and the label of the leaf node is returned.

Q2. Provide a step-by-step explanation of the mathematical intuition behind decision tree classification.

A2. The steps are following :-

**1.) Start at the Root:**

Consider the entire dataset as the root node.

**2.) Split the Data:**

For each feature, calculate the impurity (Gini or Entropy) before and after splitting:

```Information Gain(IG) = H(parent)​ −∑(from k=1 to n)​  (Nk / N) * Hk​ ```

H(parent) : Impurity of the parent node.

H(k) : Impurity of child k.

Nk / N: Weighted proportion of child samples.

**3.) Choose the Best Split:**

Select the feature and threshold that maximize Information Gain or minimize Gini Index.


**4.) Repeat for Subsets:**

Recursively apply the process to each child node until a stopping criterion is met.


**5.) Assign Labels:**

At leaf nodes, assign the majority class or average value.

Q3. Explain how a decision tree classifier can be used to solve a binary classification problem.

A3. Example Problem: Classify emails as "Spam" or "Not Spam" based on features like word frequency or sender reputation.

**Steps:**

1.) Start with the root node containing all emails.

2.) Evaluate splits for each feature (e.g., word_frequency > 50).

3.) Select the feature and threshold that best split the data.

4.) Recursively split the subsets until stopping criteria are met.

5.) Classify emails by traversing the tree to leaf nodes.

Q4. Discuss the geometric intuition behind decision tree classification and how it can be used to make
predictions.

A4. Geometric Intuition:

1.) Decision trees partition the feature space into axis-aligned regions.

2.) Each split divides the space using a vertical or horizontal line, corresponding to a feature threshold.

3.) The process continues until all regions (leaf nodes) correspond to a specific class.

**Example:**

In a 2D feature space with Feature_1 and Feature_2, splits like Feature_1 > 5 and Feature_2 <= 3 create rectangles. Each rectangle is associated with a class label.

Q5. Define the confusion matrix and describe how it can be used to evaluate the performance of a
classification model.

A5. **Definition:**

A confusion matrix summarizes the performance of a classification model by comparing predicted vs. actual values.

**Structure:**

matrix[Actual Positive][Predicted Positive] = TP

matrix[Actual Positive][Predicted Negative] = FN

matrix[Actual Negative][Predicted Positive] = FP

matrix[Actual Negative][Predicted Negative] = TN

where,

TP = True Positive

FN = False Negative

FP = False Positive

TN = True Negative

**Insights:**

It helps evaluate metrics like accuracy, precision, recall, and F1-score.

**Usage:**

Evaluate metrics like accuracy, precision, recall, and F1-score.

Analyze error patterns (e.g., high FN or FP).




Q6. Provide an example of a confusion matrix and explain how precision, recall, and F1 score can be
calculated from it.

A6. Example Confusion Matrix:
```
______________, Predicted Positive, Predicted Negative
Actual Positive,	50,             	10
Actual Negative,	5,	              35
```

Calculations:

1.) Precision = TP / (TP + FP) = 50/(50+5) = 0.91

2.) Recall = TP / (TP+FN) = 50 / (50+10) = 0.83

3.) F1-score = 2 * Precision * Recall / (Precision + Recall) = 2 * 0.91 * 0.83 / (0.91 + 0.83) = 0.87



Q7. Discuss the importance of choosing an appropriate evaluation metric for a classification problem and
explain how this can be done.

A7.
# Importance:

Different problems prioritize different types of errors.

Using inappropriate metrics (e.g., accuracy in imbalanced data) can mislead model evaluation.

# How to Choose:

1.) Domain Knowledge: Understand the cost of errors (FP vs. FN).

2.) Imbalanced Data: Use metrics like F1-score, Precision-Recall AUC.

3.) Objective: Focus on metrics aligned with business goals.


Q8. Provide an example of a classification problem where precision is the most important metric, and
explain why.

A8. Example: Spam Email Detection.

**Reason:** Minimizing false positives (important emails marked as spam) is crucial. High precision ensures fewer important emails are incorrectly flagged.

Q9. Provide an example of a classification problem where recall is the most important metric, and explain
why.

A9. **Example:** Disease Diagnosis.

**Reason:** Minimizing false negatives (undiagnosed patients) is critical. High recall ensures most patients with the disease are correctly identified, even if some false positives occur.