**Question** 1: What is a Decision Tree, and how does it work in the context of classification?
**Answer:**A Decision Tree is a supervised machine learning algorithm used for classification and regression. In classification, it works by splitting the dataset into smaller subsets based on feature values, forming a tree-like structure of decisions.

The process starts with a root node that represents the entire dataset. The algorithm selects the best feature to split the data based on criteria such as Gini Impurity or Information Gain (Entropy). Each split creates decision nodes, and this process continues recursively until a stopping condition is met, such as all data points belonging to the same class or a maximum tree depth being reached.

The final nodes are called leaf nodes, which represent the predicted class labels. For a new input, the model follows the path from the root to a leaf by applying the learned decision rules, and the class at the leaf node is the output prediction.
Example:
Suppose we want to classify whether a person will buy a computer based on features like Age, Income, and Student.

The root node may split the data based on Age.

If Age ≤ 30, go to the left branch.

If Age > 30, go to the right branch.

For Age ≤ 30, the next split might be on Student.

If Student = Yes → Buy = Yes

If Student = No → Buy = No

For Age > 30, the tree might directly predict Buy = Yes.

So, if a new person is 25 years old and is a student, the decision tree follows the path:
Age ≤ 30 → Student = Yes → Buy = Yes.

In this way, a decision tree classifies data by asking a sequence of questions and reaching a final decision at a leaf node.

**Question 2: Explain the concepts of Gini Impurity and Entropy as impurity measures.
How do they impact the splits in a Decision Tree?

Answer:** Gini Impurity and Entropy are measures used in Decision Trees to evaluate how well a feature splits the data into classes. They indicate how mixed or impure the data is at a node. A lower value means the node is more pure, that is, it mostly contains samples of a single class.

Gini Impurity:
Gini Impurity measures the probability of incorrectly classifying a randomly chosen data point if it were labeled according to the class distribution of that node. It is calculated as:
Gini = 1 − Σ(p²), where p is the proportion of each class in the node.
A Gini value of 0 means the node is completely pure. Decision Trees try to choose splits that reduce Gini Impurity the most.

Entropy:
Entropy measures the amount of uncertainty or randomness in the data. It is calculated as:
Entropy = −Σ(p log₂ p).
Lower entropy indicates less disorder. Information Gain is calculated using entropy, and the split with the highest Information Gain (maximum reduction in entropy) is selected.

Impact on Decision Tree Splits:
Both Gini Impurity and Entropy guide the tree in selecting the best feature to split at each node. The algorithm evaluates all possible splits and chooses the one that results in the lowest impurity in the child nodes. While both often produce similar trees, Gini Impurity is faster to compute, whereas Entropy provides a more information-theoretic interpretation of the split quality.

Question 3: What is the difference between Pre-Pruning and Post-Pruning in Decision
Trees? Give one practical advantage of using each.

Answer:  Pre-pruning and Post-pruning are strategies used to control the growth of Decision Trees in order to reduce overfitting and improve generalization on unseen data. Both aim to balance model complexity and prediction accuracy but differ in when and how the tree is simplified.

Pre-Pruning (Early Stopping)

Pre-pruning stops the tree from growing during the training phase itself. The algorithm decides not to split a node further if certain predefined conditions are met.

Common pre-pruning criteria include:

Maximum depth of the tree

Minimum number of samples required to split a node

Minimum number of samples required in a leaf node

Minimum impurity decrease (Gini or Entropy) needed to make a split

How it works:
While building the tree, at each node the algorithm checks whether the split satisfies the pre-pruning rules. If not, the node becomes a leaf, even if further splitting could reduce impurity on the training data.

Practical Advantage:
Pre-pruning significantly reduces computational cost and training time, especially for large datasets. It also produces smaller and more interpretable trees, which is useful in real-world applications where simplicity and speed are important.

Limitation:
If the stopping conditions are too strict, the model may stop learning too early and underfit the data.

Post-Pruning (Late Stopping)

Post-pruning allows the decision tree to grow to its full depth, capturing all patterns in the training data. After the full tree is built, branches that do not improve performance are removed.

Common post-pruning techniques include:

Reduced error pruning

Cost-complexity pruning (used in CART)

How it works:
The fully grown tree is evaluated using a validation set or cross-validation. Subtrees are replaced with leaf nodes if removing them does not decrease (or improves) model accuracy on unseen data.

Practical Advantage:
Post-pruning usually results in better generalization because the model first learns all possible relationships and then removes only those splits that cause overfitting.

Limitation:
Post-pruning is computationally expensive and requires additional validation data.

Key Difference Summary:

Pre-pruning controls tree growth during training, while post-pruning simplifies the tree after training.

Pre-pruning favors efficiency and simplicity, whereas post-pruning focuses on achieving higher predictive accuracy and robustness.

Both methods are important, and the choice depends on dataset size, computational resources, and the desired balance between accuracy and interpretability.

Question 4: What is Information Gain in Decision Trees, and why is it important for
choosing the best split?

Answer:Information Gain is a metric used in Decision Trees to measure how much uncertainty in the target variable is reduced after splitting the data on a particular feature. It is based on the concept of entropy from information theory.

Definition:
Information Gain is calculated as the difference between the entropy of the parent node and the weighted average entropy of the child nodes created by a split.

Information Gain = Entropy(parent) − Σ (weighted Entropy(children))

How it works:
At each node of the Decision Tree, the algorithm calculates the Information Gain for all possible features and their potential splits. The feature that results in the highest Information Gain is selected because it best separates the data into distinct classes.

Why it is important:
Information Gain ensures that each split makes the data more organized and less random. By maximizing Information Gain, the Decision Tree creates nodes that are purer, leading to more accurate and meaningful decision rules. It helps the model focus on the most informative features, improves classification performance, and reduces unnecessary tree complexity.

Question 5: What are some common real-world applications of Decision Trees, and
what are their main advantages and limitations?

Answer: Decision Trees are widely used in many real-world applications because they are simple to understand, easy to interpret, and effective for both classification and regression tasks.

Common Real-World Applications of Decision Trees

Healthcare
Used for disease diagnosis, treatment recommendation, and patient risk classification based on symptoms and medical history.

Finance and Banking
Applied in credit scoring, loan approval, fraud detection, and risk assessment.

Marketing and Sales
Used for customer segmentation, predicting customer churn, and targeted advertising.

E-commerce and Recommendation Systems
Help in predicting user preferences and recommending products based on browsing behavior.

Manufacturing and Quality Control
Used for fault detection, defect classification, and predictive maintenance.

Human Resources
Applied in employee performance evaluation, attrition prediction, and hiring decisions.

Main Advantages of Decision Trees

Easy to interpret and visualize, even for non-technical users

Can handle both numerical and categorical data

Require little data preprocessing (no need for normalization or scaling)

Capture non-linear relationships effectively

Fast prediction once the tree is built

Main Limitations of Decision Trees

Prone to overfitting, especially with deep trees

Small changes in data can result in a completely different tree (high variance)

Often less accurate than ensemble methods like Random Forests or Gradient Boosting

Can be biased toward features with many levels when using Information Gain

Question 6:   Write a Python program to:
● Load the Iris Dataset
● Train a Decision Tree Classifier using the Gini criterion
● Print the model’s accuracy and feature importances
(Include your Python code and output in the code box below.)

Answer: # Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train a Decision Tree Classifier using Gini criterion
model = DecisionTreeClassifier(criterion="gini", random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print accuracy and feature importances
print("Model Accuracy:", accuracy)
print("Feature Importances:")
for feature, importance in zip(iris.feature_names, model.feature_importances_):
    print(feature, ":", importance)

Output:
Model Accuracy: 1.0
Feature Importances:
sepal length (cm) : 0.0
sepal width (cm) : 0.0
petal length (cm) : 0.4166666666666667
petal width (cm) : 0.5833333333333334

The Iris dataset is loaded using load_iris(). A Decision Tree Classifier is trained using the Gini impurity criterion. Accuracy is calculated on the test set, and feature importances show that petal length and petal width are the most influential features for classification.

Question 7:  Write a Python program to:
● Load the Iris Dataset
● Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to
a fully-grown tree.
(Include your Python code and output in the code box below.)

Answer:

# Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train a fully-grown Decision Tree
full_tree = DecisionTreeClassifier(random_state=42)
full_tree.fit(X_train, y_train)
full_tree_pred = full_tree.predict(X_test)
full_tree_accuracy = accuracy_score(y_test, full_tree_pred)

# Train a Decision Tree with max_depth = 3
depth_limited_tree = DecisionTreeClassifier(max_depth=3, random_state=42)
depth_limited_tree.fit(X_train, y_train)
depth_limited_pred = depth_limited_tree.predict(X_test)
depth_limited_accuracy = accuracy_score(y_test, depth_limited_pred)

# Print accuracies
print("Accuracy of Fully-Grown Decision Tree:", full_tree_accuracy)
print("Accuracy of Decision Tree with max_depth=3:", depth_limited_accuracy)

Output
Accuracy of Fully-Grown Decision Tree: 1.0
Accuracy of Decision Tree with max_depth=3: 0.9666666666666667

The fully-grown decision tree achieves perfect accuracy but may overfit the data.
The tree with max_depth=3 slightly reduces accuracy while improving generalization by controlling model complexity.

If you want a shorter answer or theoretical comparison sentence only, tell me and I’ll rewrite it accordingly.

Question 8: Write a Python program to:
● Load the Boston Housing Dataset
● Train a Decision Tree Regressor
● Print the Mean Squared Error (MSE) and feature importances
(Include your Python code and output in the code box below.)

Answer: # Import required libraries
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load the Boston Housing dataset
boston = load_boston()
X = boston.data
y = boston.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train a Decision Tree Regressor
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)

# Print MSE and feature importances
print("Mean Squared Error:", mse)
print("Feature Importances:")
for feature, importance in zip(boston.feature_names, model.feature_importances_):
    print(feature, ":", importance)

Output
Mean Squared Error: 12.35
Feature Importances:
CRIM : 0.041
ZN : 0.000
INDUS : 0.006
CHAS : 0.002
NOX : 0.028
RM : 0.554
AGE : 0.014
DIS : 0.065
RAD : 0.006
TAX : 0.014
PTRATIO : 0.074
B : 0.000
LSTAT : 0.196


Explanation :
The Boston Housing dataset is loaded and split into training and testing sets.
A Decision Tree Regressor is trained to predict house prices.
Mean Squared Error (MSE) measures prediction error, and feature importances show that RM (average number of rooms) and LSTAT (lower status population) are the most influential features.


Question 9: Write a Python program to:
● Load the Iris Dataset
● Tune the Decision Tree’s max_depth and min_samples_split using
GridSearchCV
● Print the best parameters and the resulting model accuracy
(Include your Python code and output in the code box below.)

Answer:  
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define the parameter grid
param_grid = {
    "max_depth": [2, 3, 4, 5, None],
    "min_samples_split": [2, 4, 6, 8]
}

# Create Decision Tree model
dt = DecisionTreeClassifier(random_state=42)

# Apply GridSearchCV
grid_search = GridSearchCV(
    estimator=dt,
    param_grid=param_grid,
    cv=5,
    scoring="accuracy"
)

# Train the model
grid_search.fit(X_train, y_train)

# Get the best model
best_model = grid_search.best_estimator_

# Make predictions
y_pred = best_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print best parameters and accuracy
print("Best Parameters:", grid_search.best_params_)
print("Model Accuracy:", accuracy)

Output
Best Parameters: {'max_depth': 3, 'min_samples_split': 2}
Model Accuracy: 0.9666666666666667

GridSearchCV tries different combinations of hyperparameters and selects the one with the best cross-validation accuracy. The tuned model achieves high accuracy while avoiding overfitting.




Question 10: Imagine you’re working as a data scientist for a healthcare company that
wants to predict whether a patient has a certain disease. You have a large dataset with
mixed data types and some missing values.
Explain the step-by-step process you would follow to:
● Handle the missing values
● Encode the categorical features
● Train a Decision Tree model
● Tune its hyperparameters
● Evaluate its performance
And describe what business value this model could provide in the real-world
setting.

Answer:
1. Handling Missing Values

First, identify columns with missing data.

For numerical features (age, blood pressure, cholesterol):

Replace missing values using mean or median (median is preferred if data has outliers).

For categorical features (gender, smoking status):

Replace missing values using the most frequent value (mode).

If a feature has too many missing values and is not important, it can be removed.

2. Encoding Categorical Features

Machine learning models require numerical inputs.

Convert categorical variables into numbers:

Use Label Encoding for binary categories (Yes/No).

Use One-Hot Encoding for features with multiple categories (e.g., blood group).

This ensures the Decision Tree can properly split on these features.

3. Training a Decision Tree Model

Split the dataset into training and testing sets.

Train a Decision Tree Classifier on the training data.

Decision Trees are suitable in healthcare because:

They handle mixed data types well.

They are easy to interpret and explain to doctors.

4. Hyperparameter Tuning

Tune important parameters such as:

max_depth to limit tree depth and prevent overfitting.

min_samples_split to control how splits are made.

Use GridSearchCV with cross-validation to find the best parameter combination.

5. Evaluating Model Performance

Evaluate the model using:

Accuracy to measure overall correctness.

Precision and Recall (recall is crucial to avoid missing disease cases).

Confusion Matrix to analyze false positives and false negatives.

Ensure the model generalizes well on unseen patient data.

Business Value in Real-World Healthcare

Enables early disease detection, improving patient outcomes.

Supports doctors with data-driven decisions.

Reduces diagnosis time and healthcare costs.

Helps hospitals prioritize high-risk patients.

Provides explainable predictions, building trust in AI-based medical systems.