

                          **Decision Tree**

Question 1: What is a Decision Tree, and how does it work in the context of classification?

Answer- A decision tree is a supervised machine learning algorithm, shaped like a flowchart, used for both classification and regression tasks. In classification, it works by recursively partitioning data based on feature values, creating a tree-like structure that leads to a class prediction at the leaf nodes.

How it works:
1.	Root Node: The tree starts with a root node representing the entire dataset.
2.	Splitting: The algorithm selects the best feature to split the data based on criteria like information gain or Gini impurity. This creates branches representing different outcomes for that feature.
3.	Internal Nodes: Each internal node represents a decision based on a feature and its value.
4.	Branches: The branches represent the possible outcomes or values of the chosen feature.
5.	Leaf Nodes: The process continues until a leaf node is reached, which represents the final class prediction for that branch.

Example:
Imagine classifying emails as spam or not spam. The decision tree might start with the question: "Does the email contain the word 'free'?" If yes, it might split into a branch leading to "spam" if other indicators are present. If no, it could lead to another decision node, like "Does it contain a suspicious link?". This process continues until the final classification (spam or not spam) is made.

Question 2: Explain the concepts of Gini Impurity and Entropy as impurity measures.How do they impact the splits Decision Tree?

Answer- Gini Impurity and Entropy are key metrics used in decision trees to determine the best way to split data at each node. They quantify the impurity or disorder within a node, guiding the algorithm to make splits that result in more homogeneous (less impure) child nodes. The goal is to minimize impurity at each split, leading to more accurate classifications.    
Gini Impurity:

•	Measures the probability of misclassifying a randomly chosen element in a node if it were randomly labeled according to the class distribution in that node.

•	Ranges from 0 to 0.5 in binary classification. A Gini impurity of 0 indicates perfect purity (all elements belong to the same class), while 0.5 indicates maximum impurity (an equal distribution of classes).

•	Calculated as 1 - Σ(pᵢ²), where pᵢ is the probability of an element belonging to class i.

Entropy:
•	Measures the uncertainty or randomness in a dataset. A node with high entropy has a more varied distribution of classes, while a node with low entropy has a more concentrated distribution of classes.

•	Ranges from 0 to 1. An entropy of 0 indicates a pure node, and 1 indicates maximum impurity (equal distribution of classes).

•	Calculated using the formula: - Σ(pᵢ * log₂(pᵢ)), where pᵢ is the probability of an element belonging to class i.

Impact on Splits:

•	Decision trees use these measures to evaluate different possible splits at each node.

•	For each potential split, the Gini Impurity or Entropy of the resulting child nodes is calculated.

•	The split that results in the lowest Gini Impurity or Entropy (or highest Information Gain, which is derived from these measures) is chosen as the best split.

•	This process is repeated recursively for each child node until a stopping criterion is met (e.g., maximum depth, minimum number of samples per node).

•	By minimizing impurity at each split, decision trees aim to create branches that lead to increasingly pure leaf nodes, which represent predictions for the data.

•	This process helps the decision tree to classify data more accurately.

Question 3: What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of using each.

Answer –
Pre Pruning (Early Stopping)

What it is:

Pre-pruning involves setting limits on tree growth during the construction phase. For example, you  might specify a maximum depth for the tree, a minimum number of samples required to split a node, or a minimum information gain needed for a split to be considered.

Practical Advantage: A key advantage of pre-pruning is its efficiency. By stopping the tree's growth early, it reduces computational cost and training time, especially beneficial for large datasets.
Post-Pruning (Reduced Error Pruning)

What it is:

Post-pruning starts with a fully grown decision tree and then removes branches that don't significantly contribute to the model's performance on a validation set or through cross-validation.

Practical Advantage: Post-pruning offers a more thorough evaluation of the tree's structure, potentially leading to better-informed pruning decisions. It allows the algorithm to consider the impact of removing a branch on the overall performance of the tree.

Feature	Pre-Pruning	Post-Pruning

When applied	During tree building	After full tree is built
Control method	Limit depth/splits/criteria	Remove branches using validation
Goal	Avoid overfitting early	Fix overfitting after full growth
Example benefit	Faster training	Better accuracy on test data

Question 4: What is Information Gain in Decision Trees, and why is it important for choosing the best split ?

Answer - Information gain in decision trees quanti fies how well a feature separates data  based on its target classification, essentially measuring how much the uncertainty (or impurity) of the data is reduced by splitting on that feature. It's crucial for selecting the best split because the feature with the highest information gain will create the most homogeneous child nodes, leading to a more accurate and efficient tree.

1. Entropy and Impurity:

Entropy measures the impurity or randomness in a dataset. A dataset with evenly distributed classes has high  entropy, while a dataset with classes concentrated in one or a few categories has low entropy (or high purity).  Information Gain is derived from entropy and indicates how much the entropy of the data decreases after a split based on a particular feature.

2. How Information Gain Works:

Calculation: Information Gain (IG) is calculated by subtracting the weighted average entropy of the child nodes from the entropy of the parent node.
Maximizing IG: When building a decision tree, the algorithm evaluates the IG for each feature and chooses the feature that results in the highest IG for splitting the current node.

Optimal Splits: Features with high information gain are preferred because they create more homogeneous child nodes, which leads to a more effective tree.

3. Why is Information Gain Important?

Feature Selection:

It helps in selecting the best features for splitting the data at each node.

Efficiency:

By choosing features with high IG, the algorithm can create smaller, more accurate   decision trees with fewer splits.   

Accuracy:

A tree that is built by maximizing information gain tends to classify data more accurately than a tree built on  features with low information gain.

Question 5: What are some common real-world applications of Decision Trees, and what   are their main advantages and limitations?    

Answer- Decision trees are a versatile machine learning method with applications in various fields. They are often used for classification and regression tasks, and they are known for their interpretability and ease of understanding. However, they can be prone to overfitting, and small changes in the data can lead to different tree structures.

Common Real-World Applications:

•	Medical Diagnosis:

Decision trees can analyze patient symptoms and history to help diagnose diseases like diabetes, based on factors like glucose levels and blood pressure.

•	Customer Segmentation:

Businesses use decision trees to group customers based on demographics, purchase history, and behavior to tailor marketing campaigns.

•	Credit Scoring:

Loan applications can be evaluated by decision trees based on factors like income, credit score, and employment status to assess creditworthiness.

•	Fraud Detection:

Financial institutions use decision trees to identify fraudulent transactions by analyzing spending patterns and deviations from normal behavior.

•	Recommendation Systems:

E-commerce platforms and other businesses use decision trees to recommend products to users based on their browsing history and past purchases.

•	Churn Prediction:

Companies use decision trees to predict which customers are likely to leave (churn) by analyzing their behavior patterns and interactions.

Advantages of Decision Trees:

•	Interpretability and Explainability:

Decision trees are easy to understand and visualize, making them suitable for non-expert users and for explaining the reasoning behind predictions.

•	Handles Both Numerical and Categorical Data:

Decision trees can handle different types of data without needing extensive preprocessing.

•	Automatic Feature Selection:

Decision trees can identify the most important features for making decisions.

•	Handles Non-Linear Relationships:

They can capture complex non-linear relationships between variables.


•	Low Data Preparation Requirements:

Compared to some other algorithms, decision trees require less data preparation and cleaning.

Limitations of Decision Trees:

•	Overfitting:

Decision trees can easily overfit the training data, especially if they are deep and complex.

•	Instability:

Small changes in the data can lead to drastically different tree structures, making them less robust.

•	Bias Towards Features with Many Levels:

Features with many categories can sometimes be favored, potentially leading to bias.

•	Limited Precision for Certain Problems:

Some complex relationships might not be well captured by the tree structure.
•	Greedy Algorithm:

Decision trees are built using a greedy approach, which may not always find the optimal tree structure.


Question 6: Write a Python program to:

● Load the Iris Dataset

● Train a Decision Tree Classifier using the Gini criterion

● Print the model’s accuracy and feature importances

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

In [None]:
# 1. Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names


In [None]:
# 2. Split into train/test sets (e.g. 80% train / 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

In [None]:
# 3. Create and train the Decision Tree classifier (using Gini criterion)
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)

In [None]:
# 4. Make predictions and evaluate accuracy
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy*100:.2f}%")

Accuracy: 93.33%


In [None]:
# 5. Print feature importances
importances = clf.feature_importances_
print("Feature importances (Gini importance):")
for name, imp in zip(feature_names, importances):
    print(f"  {name}: {imp:.4f}")

Feature importances (Gini importance):
  sepal length (cm): 0.0062
  sepal width (cm): 0.0292
  petal length (cm): 0.5586
  petal width (cm): 0.4060


Question 7: Write a Python program to:

● Load the Iris Dataset

● Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to a fully-grown tree.

In [None]:
# 1. Load Iris dataset
data = load_iris()
X, y = data.data, data.target
feature_names = data.feature_names

In [None]:
# 2. Split training and testing data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

In [None]:
# 3. Train shallow tree (max_depth = 3)
clf_shallow = DecisionTreeClassifier(criterion='gini', max_depth=3, random_state=42)
clf_shallow.fit(X_train, y_train)


In [None]:
# 4. Train full tree (no depth limit)
clf_full = DecisionTreeClassifier(criterion='gini', random_state=42)
clf_full.fit(X_train, y_train)

In [None]:
# 5. Predict & evaluate both models
y_pred_shallow = clf_shallow.predict(X_test)
y_pred_full = clf_full.predict(X_test)
acc_shallow = accuracy_score(y_test, y_pred_shallow)
acc_full = accuracy_score(y_test, y_pred_full)

In [None]:
print(f"Accuracy (max_depth=3): {acc_shallow * 100:.2f}%")
print(f"Accuracy (no max_depth): {acc_full * 100:.2f}%\n")

Accuracy (max_depth=3): 96.67%
Accuracy (no max_depth): 93.33%



In [None]:
# 6. Feature importances
print("Feature importances (max_depth=3):")
for name, imp in zip(feature_names, clf_shallow.feature_importances_):
    print(f"  {name}: {imp:.4f}")
print("\nFeature importances (full tree):")
for name, imp in zip(feature_names, clf_full.feature_importances_):
    print(f"  {name}: {imp:.4f}")

Feature importances (max_depth=3):
  sepal length (cm): 0.0000
  sepal width (cm): 0.0000
  petal length (cm): 0.5791
  petal width (cm): 0.4209

Feature importances (full tree):
  sepal length (cm): 0.0062
  sepal width (cm): 0.0292
  petal length (cm): 0.5586
  petal width (cm): 0.4060


Question 8: Write a Python program to:

● Load the Boston Housing Dataset

● Train a Decision Tree Regressor

**●** Print the Mean Squared Error (MSE) and feature importances

In [18]:
!pip install scikit-learn pandas numpy




In [19]:
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [20]:
# --- Load the Boston dataset from OpenML ---
boston = fetch_openml(name="Boston", version=1, as_frame=True)
X = boston.data
y = boston.target.astype(float)

In [21]:
# --- Split into train/test sets ---
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

In [22]:
# --- Train Decision Tree Regressor ---
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)


In [23]:
# --- Predict and evaluate ---
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.3f}")

Mean Squared Error: 10.416


In [24]:
# --- Feature importances ---
importances = pd.Series(model.feature_importances_, index=X.columns)
importances = importances.sort_values(ascending=False)
print("\nFeature importances:")
print(importances)


Feature importances:
RM         0.600326
LSTAT      0.193328
DIS        0.070688
CRIM       0.051296
NOX        0.027148
AGE        0.013617
TAX        0.012464
PTRATIO    0.011012
B          0.009009
INDUS      0.005816
ZN         0.003353
RAD        0.001941
CHAS       0.000002
dtype: float64


Question 9: Write a Python program to:

● Load the Iris Dataset

● Tune the Decision Tree’s max_depth and min_samples_split using GridSearchCV

● Print the best parameters and the resulting model accuracy

In [17]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

In [4]:
# 1. Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

In [5]:
# 2. Train‑test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)


In [6]:
# 3. Set up the Decision Tree classifier
clf = DecisionTreeClassifier(random_state=42)


In [7]:
# 4. Define hyper‑parameter grid
param_grid = {
    'max_depth': [1, 2, 3, 4, 5],
    'min_samples_split': [2, 3, 4, 5]
}


In [8]:
# 5. GridSearchCV setup with 5‑fold CV
grid = GridSearchCV(
    estimator=clf,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    refit=True,       # fits best model on full training set
    verbose=1
)


In [9]:
# 6. Fit grid search
grid.fit(X_train, y_train)


Fitting 5 folds for each of 20 candidates, totalling 100 fits


In [None]:
# 7. Retrieve best params and CV score
print("Best hyperparameters (CV):", grid.best_params_)
print(f"Best cross‑validation accuracy: {grid.best_score_:.4f}")

Best hyperparameters (CV): {'max_depth': 4, 'min_samples_split': 2}
Best cross‑validation accuracy: 0.9417


In [None]:
# 8. Evaluate on held‑out test set
best_clf = grid.best_estimator_
test_acc = best_clf.score(X_test, y_test)
print(f"Test set accuracy: {test_acc:.4f}")

Test set accuracy: 1.0000


Question 10: Imagine you’re working as a data scientist for a healthcare company that wants to predict whether a patient has a certain disease. You have a large dataset with mixed data types and some missing values. Explain the step-by-step process you would follow to:

● Handle the missing values

● Encode the categorical features

● Train a Decision Tree model

● Tune its hyperparameters

● Evaluate its performance And describe what business value this model could provide in the real-world setting.


Answer- To build a disease prediction model with a mixed-type, missing-value dataset, a data scientist would follow a structured process: 1) handle missing values using imputation or deletion, 2) encode categorical features with techniques like one-hot or label encoding, 3) train a decision tree model, 4) optimize hyperparameters using techniques like grid search or randomized search, and 5) evaluate the model's performance using appropriate metrics. This model can provide significant business value by enabling early disease detection, personalized treatment plans, and optimized resource allocation for the healthcare provider.

Here's a more detailed breakdown:

1.Handling Missing Values:

Identify Missing Data:
Determine the extent and patterns of missing data in the dataset.

Choose a Strategy:

Consider imputation (replacing missing values with reasonable estimates) or deletion (removing rows or columns with missing data).

Imputation Methods:

Mean/Median/Mode Imputation: Replace missing numerical values with the mean, median, or most frequent value, respectively.

K-Nearest Neighbors (KNN) Imputation: Use the values of the nearest neighbors to estimate the missing value, according to DataCamp.
Model-Based Imputation: Employ algorithms like decision trees or regression to predict missing values based on other features.

Deletion:

Carefully evaluate the impact of removing rows or columns on the model's performance and statistical significance.

2.Encoding Categorical Features:

Identify Categorical Features:

Determine which features have non-numerical values (e.g., "smoker: yes/no", "blood type: A/B/AB/O").

One-Hot Encoding:

Create a new binary column for each category within a feature, representing whether the original value is present or not.

Label Encoding:

Assign a unique numerical label to each category within a feature, notes upGrad.
Ordinal Encoding:

Similar to label encoding, but used when there's an inherent order or ranking among the categories.

3. Training a Decision Tree Model:

Select the Model:

Choose a decision tree algorithm suitable for classification (e.g., ID3, C4.5, CART).

Split the Data:

Divide the dataset into training and testing sets (e.g., 80/20 split) to evaluate the model's generalization ability.

Fit the Model:

Train the decision tree model using the training data, allowing it to learn the relationships between features and the target variable.

4.Tuning Hyperparameters:

Identify Hyperparameters:

Determine which parameters of the decision tree model can be adjusted (e.g., max_depth, min_samples_split, min_samples_leaf).

Grid Search:

Systematically evaluate all possible combinations of hyperparameter values within a predefined range to find the optimal settings.

Randomized Search:

Randomly sample hyperparameter values from a specified distribution, often more efficient than grid search.

Cross-Validation:

Use techniques like k-fold cross-validation to estimate the model's performance on unseen data during hyperparameter tuning, according to DataCamp.

5. Evaluating Performance:

Confusion Matrix:

Visualize the model's predictions against the actual values, showing true positives, true negatives, false positives, and false negatives.

Classification Metrics:

Calculate metrics like accuracy, precision, recall, and F1-score to assess the model's performance across different aspects.

ROC Curve and AUC:

Analyze the Receiver Operating Characteristic (ROC) curve and its associated Area Under the Curve (AUC) to evaluate the model's ability to distinguish between classes.