In [None]:
# Q1: Describe the decision tree classifier algorithm and how it works to make predictions

# The Decision Tree classifier is a supervised learning algorithm that recursively splits the data
# into subsets based on feature values, creating a tree-like structure of decisions.
# The decision tree works by choosing the best feature to split the data at each node.
# It uses metrics like Gini Impurity or Entropy to select the best splits. The process is repeated
# until a stopping criterion is met (e.g., max depth, minimum samples per leaf).

from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# Q2: Step-by-step explanation of the mathematical intuition behind decision tree classification

# 1. **Split the dataset**: At each node, the dataset is split based on a feature value.
# 2. **Measure the quality of a split**: The algorithm uses a metric like Gini Impurity or Entropy.
#    - Gini Impurity: A measure of how often a randomly chosen element from the set would be incorrectly labeled.
#    - Entropy: A measure of the disorder or impurity in the dataset.
# 3. **Choose the best feature**: The feature that minimizes the impurity or maximizes information gain is chosen.
# 4. **Repeat recursively**: The process is repeated until a stopping condition is met, such as maximum depth or pure nodes.

# Q3: Explain how a decision tree classifier can be used to solve a binary classification problem

# In binary classification, a decision tree will recursively split the data into two subsets based on features.
# At each node, the algorithm will choose the feature that best separates the classes (e.g., by minimizing Gini Impurity).
# The tree continues splitting until it reaches leaves, where each leaf contains a predicted class label.
# Example: If we are classifying whether an email is spam or not, the tree will split the data based on features like
# the presence of specific words, and each leaf will contain the label "spam" or "not spam."

# Q4: Geometric intuition behind decision tree classification

# Geometrically, decision trees partition the feature space into rectangular regions.
# Each split in the tree corresponds to a perpendicular hyperplane that divides the space into two parts.
# As the tree grows deeper, the feature space becomes more divided, creating regions where all points
# belong to a specific class. This way, decision trees create step-like decision boundaries.

# Example:
# For a two-feature dataset, the decision boundaries can be visualized as vertical and horizontal lines
# that separate the data points based on the class.

import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
plt.figure(figsize=(10,7))
plot_tree(model, filled=True, feature_names=X_train.columns, class_names=["Not Spam", "Spam"])
plt.show()

# Q5: Define the confusion matrix and describe how it can be used to evaluate the performance of a classification model

# A confusion matrix is a table that is used to evaluate the performance of a classification model.
# It shows the true positive (TP), true negative (TN), false positive (FP), and false negative (FN) counts.
# These values help calculate various performance metrics like accuracy, precision, recall, and F1 score.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, predictions)
print(cm)

# Q6: Provide an example of a confusion matrix and explain how precision, recall, and F1 score can be calculated from it

# Example confusion matrix for a binary classification problem:
# [[TP, FP],
#  [FN, TN]]
# In this case:
# TP = 50, FP = 10, FN = 5, TN = 35

# Precision: The proportion of predicted positives that are actually positive.
# Precision = TP / (TP + FP)

# Recall: The proportion of actual positives that are correctly predicted.
# Recall = TP / (TP + FN)

# F1 Score: The harmonic mean of precision and recall.
# F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

from sklearn.metrics import precision_score, recall_score, f1_score

precision = precision_score(y_test, predictions)
recall = recall_score(y_test, predictions)
f1 = f1_score(y_test, predictions)

print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")

# Q7: Importance of choosing an appropriate evaluation metric for a classification problem

# The evaluation metric is crucial for assessing model performance, especially in imbalanced datasets.
# Depending on the problem, metrics like accuracy, precision, recall, or F1 score may be more relevant.
# For example:
# - In medical diagnosis, recall (sensitivity) is often more important than precision because false negatives (missed diagnoses) are critical.
# - In email spam detection, precision might be more important, as users prefer to avoid false positives (non-spam emails marked as spam).

# Q8: Example of a classification problem where precision is the most important metric

# In fraud detection, where a bank wants to flag potentially fraudulent transactions, precision is important.
# A higher precision ensures that flagged transactions are more likely to be genuinely fraudulent,
# reducing the number of legitimate transactions incorrectly marked as fraudulent.

# Q9: Example of a classification problem where recall is the most important metric

# In medical diagnosis (e.g., cancer detection), recall is more important than precision.
# A high recall ensures that most of the positive cases (e.g., patients with cancer) are identified,
# minimizing false negatives (i.e., patients who have cancer but are missed by the model).

