

### 1) **What is a Decision Tree, and how does it work?**

A **Decision Tree** is a supervised machine learning algorithm used for both classification and regression tasks. It splits the dataset into subsets based on feature values, using a tree-like structure. Each internal node represents a feature (or attribute), each branch represents a decision rule, and each leaf node represents an output label or value.

**How it works:**

* Start at the root node.
* At each step, choose the best feature to split the data based on an impurity measure (like Gini or Entropy).
* Repeat the process recursively for each child node until stopping conditions are met (e.g., pure leaves, max depth).

---

### 2) **What are impurity measures in Decision Trees?**

Impurity measures quantify the **degree of disorder or impurity** in a set of examples. The goal of a decision tree is to reduce this impurity at each step.

Common impurity measures:

* **Gini Impurity**
* **Entropy (Information Gain)**
* **Variance (for regression tasks)**

---

### 3) **What is the mathematical formula for Gini Impurity?**

The **Gini Impurity** of a node is:

$$
Gini = 1 - \sum_{i=1}^{C} p_i^2
$$

Where:

* $C$ is the number of classes.
* $p_i$ is the probability (frequency) of class $i$ at that node.

---

### 4) **What is the mathematical formula for Entropy?**

The **Entropy** of a node is:

$$
Entropy = -\sum_{i=1}^{C} p_i \log_2(p_i)
$$

Where:

* $p_i$ is the proportion of class $i$ in the node.
* $C$ is the total number of classes.

---

### 5) **What is Information Gain, and how is it used in Decision Trees?**

**Information Gain** measures the **reduction in entropy or impurity** after splitting a node.

$$
IG = Entropy(parent) - \sum_{j=1}^{k} \frac{N_j}{N} \cdot Entropy(child_j)
$$

Where:

* $N$ is the total number of samples in the parent node.
* $N_j$ is the number of samples in the child node $j$.
* The tree chooses the split that gives the highest Information Gain.

---

### 6) **What is the difference between Gini Impurity and Entropy?**

| Feature  | Gini Impurity                     | Entropy                         |
| -------- | --------------------------------- | ------------------------------- |
| Formula  | $1 - \sum p_i^2$                  | $-\sum p_i \log_2(p_i)$         |
| Speed    | Slightly faster to compute        | Slightly slower                 |
| Behavior | Similar for binary classification | Entropy penalizes impurity more |
| Usage    | Default in **CART**               | Used in **ID3** algorithm       |

---

### 7) **What is the mathematical explanation behind Decision Trees?**

Mathematically, decision trees perform:

* **Recursive binary partitioning**: At each node, select a feature and a threshold to split the data to minimize impurity.
* **Objective Function**:

  $$
  \text{Choose feature and threshold that minimizes: } \sum \frac{N_j}{N} \cdot \text{Impurity}(child_j)
  $$
* This is a greedy algorithm aiming to optimize each split locally.

---

### 8) **What is Pre-Pruning in Decision Trees?**

**Pre-Pruning** stops the tree growth **before** it becomes too complex. Conditions include:

* Maximum depth reached
* Minimum samples at node
* Impurity reduction is below a threshold
* Node becomes "pure"

**Goal**: Prevent overfitting early by stopping unnecessary splits.

---

### 9) **What is Post-Pruning in Decision Trees?**

**Post-Pruning** allows the tree to grow fully and **then removes** branches that have little impact on prediction accuracy.

Techniques:

* Cost Complexity Pruning (CCP)
* Reduced error pruning using validation set

**Goal**: Simplify the model while retaining high accuracy.

---

### 10) **What is the difference between Pre-Pruning and Post-Pruning?**

| Feature      | Pre-Pruning                       | Post-Pruning                       |
| ------------ | --------------------------------- | ---------------------------------- |
| When Applied | During tree building              | After full tree is built           |
| Method       | Prevents splitting at a point     | Removes branches from full tree    |
| Risk         | May stop too early (underfitting) | More flexible and accurate pruning |

---

### 11) **What is a Decision Tree Regressor?**

A **Decision Tree Regressor** is a type of decision tree used for **predicting continuous values** rather than class labels.

**Splitting Criterion**: Minimizes **Mean Squared Error (MSE)** or **Mean Absolute Error (MAE)**.

---

### 12) **What are the advantages and disadvantages of Decision Trees?**

**Advantages:**

* Easy to interpret and visualize
* Handles both numerical and categorical data
* No need for feature scaling
* Can capture non-linear relationships

**Disadvantages:**

* Prone to overfitting
* Sensitive to small data changes
* Can create biased trees if one class dominates

---

### 13) **How does a Decision Tree handle missing values?**

Decision Trees can handle missing values using:

* **Surrogate splits**: Use an alternative feature that gives a similar split.
* **Imputation**: Replace missing values with mean/median/mode before training.
* **Skipping**: Skip samples with missing values during training (less preferred).

---

### 14) **How does a Decision Tree handle categorical features?**

* Splits can be done on **specific categories**: e.g., “Color = Red?”
* Internally, the algorithm treats each category as a discrete value.
* For features with many categories, grouping might be done to optimize splits.

---

### 15) **What are some real-world applications of Decision Trees?**

* **Medical Diagnosis**: Predict disease based on symptoms
* **Loan Approval**: Decide if a loan should be granted
* **Customer Churn Prediction**: Identify customers likely to leave
* **Fraud Detection**: Detect abnormal transaction patterns
* **Credit Scoring**: Assess creditworthiness of a customer
* **Recommendation Systems**: Suggest products based on preferences




In [None]:
# 1) Train a Decision Tree Classifier on the Iris dataset and print model accuracy
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, mean_squared_error
from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor, export_graphviz
from sklearn.preprocessing import StandardScaler
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import seaborn as sns
import graphviz
import numpy as np

iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print("\n#1) Accuracy:", accuracy_score(y_test, y_pred))

# 2) Train using Gini Impurity and print feature importances
clf_gini = DecisionTreeClassifier(criterion='gini', random_state=42)
clf_gini.fit(X_train, y_train)
print("\n#2) Feature importances (Gini):", clf_gini.feature_importances_)

# 3) Train using Entropy and print model accuracy
clf_entropy = DecisionTreeClassifier(criterion='entropy', random_state=42)
clf_entropy.fit(X_train, y_train)
y_pred_entropy = clf_entropy.predict(X_test)
print("\n#3) Accuracy (Entropy):", accuracy_score(y_test, y_pred_entropy))

# 4) Decision Tree Regressor on housing dataset with MSE
housing = fetch_california_housing()
Xh_train, Xh_test, yh_train, yh_test = train_test_split(housing.data, housing.target, test_size=0.3, random_state=42)
reg = DecisionTreeRegressor()
reg.fit(Xh_train, yh_train)
y_pred_housing = reg.predict(Xh_test)
print("\n#4) MSE (Housing):", mean_squared_error(yh_test, y_pred_housing))

# 5) Visualize the Decision Tree using Graphviz
dot_data = export_graphviz(clf, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
graph = graphviz.Source(dot_data)
print("\n#5) Tree Visualization saved as 'iris_tree.png'")
graph.render("iris_tree", format='png', cleanup=True)

# 6) Max depth of 3 vs full depth tree
clf_full = DecisionTreeClassifier(random_state=42)
clf_limited = DecisionTreeClassifier(max_depth=3, random_state=42)
clf_full.fit(X_train, y_train)
clf_limited.fit(X_train, y_train)
print("\n#6) Full Depth Accuracy:", accuracy_score(y_test, clf_full.predict(X_test)))
print("#6) Max Depth=3 Accuracy:", accuracy_score(y_test, clf_limited.predict(X_test)))

# 7) min_samples_split=5 vs default
clf_default = DecisionTreeClassifier(random_state=42)
clf_split5 = DecisionTreeClassifier(min_samples_split=5, random_state=42)
clf_default.fit(X_train, y_train)
clf_split5.fit(X_train, y_train)
print("\n#7) Default Tree Accuracy:", accuracy_score(y_test, clf_default.predict(X_test)))
print("#7) min_samples_split=5 Accuracy:", accuracy_score(y_test, clf_split5.predict(X_test)))

# 8) Apply feature scaling before training and compare
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_tr_scaled, X_te_scaled, y_tr_scaled, y_te_scaled = train_test_split(X_scaled, y, test_size=0.3, random_state=42)
clf_scaled = DecisionTreeClassifier()
clf_scaled.fit(X_tr_scaled, y_tr_scaled)
print("\n#8) Accuracy with Unscaled:", accuracy_score(y_test, clf_default.predict(X_test)))
print("#8) Accuracy with Scaled:", accuracy_score(y_te_scaled, clf_scaled.predict(X_te_scaled)))

# 9) Use One-vs-Rest for multiclass classification
ovr = OneVsRestClassifier(DecisionTreeClassifier(random_state=42))
ovr.fit(X_train, y_train)
print("\n#9) Accuracy with OvR:", accuracy_score(y_test, ovr.predict(X_test)))

# 10) Display feature importance scores
print("\n#10) Feature Importances (Default Tree):")
for name, importance in zip(iris.feature_names, clf.feature_importances_):
    print(f"{name}: {importance:.4f}")

# 11) Regressor with max_depth=5 vs unrestricted
reg_default = DecisionTreeRegressor(random_state=42)
reg_depth5 = DecisionTreeRegressor(max_depth=5, random_state=42)
reg_default.fit(Xh_train, yh_train)
reg_depth5.fit(Xh_train, yh_train)
print("\n#11) MSE (Default):", mean_squared_error(yh_test, reg_default.predict(Xh_test)))
print("#11) MSE (Max Depth=5):", mean_squared_error(yh_test, reg_depth5.predict(Xh_test)))

# 12) Cost Complexity Pruning and visualize effect
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = path.ccp_alphas[:-1]
clfs = [DecisionTreeClassifier(random_state=0, ccp_alpha=alpha).fit(X_train, y_train) for alpha in ccp_alphas]
acc = [accuracy_score(y_test, clf.predict(X_test)) for clf in clfs]
plt.figure(figsize=(8, 5))
plt.plot(ccp_alphas, acc, marker='o')
plt.title("#12) Accuracy vs CCP Alpha")
plt.xlabel("Alpha")
plt.ylabel("Accuracy")
plt.grid(True)
plt.show()

# 13) Evaluate using Precision, Recall, and F1-Score
y_pred_pr = clf.predict(X_test)
print("\n#13) Precision:", precision_score(y_test, y_pred_pr, average='macro'))
print("#13) Recall:", recall_score(y_test, y_pred_pr, average='macro'))
print("#13) F1-Score:", f1_score(y_test, y_pred_pr, average='macro'))

# 14) Visualize confusion matrix
cm = confusion_matrix(y_test, y_pred_pr)
sns.heatmap(cm, annot=True, cmap='Blues', fmt='d', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.title("#14) Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

# 15) GridSearchCV for max_depth and min_samples_split
params = {
    'max_depth': [2, 3, 4, 5, None],
    'min_samples_split': [2, 5, 10]
}
grid = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid=params, cv=5)
grid.fit(X_train, y_train)
print("\n#15) Best Params:", grid.best_params_)
print("#15) Best Accuracy:", grid.best_score_)
