Ques 1 :-  What is a Decision Tree, and how does it work in the context of
classification?

Answer:- A Decision Tree is a supervised learning algorithm used for classification that works by recursively splitting data into subsets based on the most informative features, forming a tree-like structure of decisions. At the top, the root node represents the entire dataset, and each internal node applies a test on a feature (like "Is age > 30?"), creating branches for possible outcomes. This process continues until reaching leaf nodes, which assign a final class label. The algorithm chooses splits using measures such as Information Gain or the Gini Index to maximize separation between classes. Decision Trees are intuitive, easy to visualize, and handle both categorical and numerical data, but they can overfit if grown too deep, which is why ensemble methods like Random Forests are often used to improve performance.

Ques 2 :-  Explain the concepts of Gini Impurity and Entropy as impurity measures.
How do they impact the splits in a Decision Tree?

Answer:-Gini Impurity and Entropy are two measures used in decision trees to evaluate how “mixed” the classes are within a node, guiding the algorithm toward the best split. Gini Impurity calculates the probability of incorrectly classifying a randomly chosen element if it were labeled according to the distribution of classes in the node; lower values mean purer nodes. Entropy, derived from information theory, measures the amount of disorder or uncertainty in the node’s class distribution, with higher entropy indicating more mixed classes. When building a decision tree, the algorithm compares possible splits using these metrics and chooses the one that most reduces impurity—this reduction is called Information Gain in the case of entropy. In practice, both measures often lead to similar splits, but Gini is computationally simpler, while entropy can be more sensitive to class imbalance. Their role is crucial: they ensure that each split moves the tree toward purer, more homogeneous groups, improving classification accuracy.

Ques 3 :-  What is the difference between Pre-Pruning and Post-Pruning in Decision
Trees? Give one practical advantage of using each.

Answer:-  **Pre-Pruning** and **Post-Pruning** are techniques used to prevent overfitting in decision trees, but they differ in timing. **Pre-Pruning** (also called early stopping) halts the tree growth during construction if a split does not provide sufficient improvement, using criteria like maximum depth, minimum samples per node, or threshold on impurity reduction. A practical advantage of pre-pruning is that it reduces computation time and memory usage since the tree never grows unnecessarily large. **Post-Pruning**, on the other hand, allows the tree to grow fully and then trims back branches that do not improve predictive performance, often using validation data to decide which branches to remove. A practical advantage of post-pruning is that it typically yields more accurate models because it considers the entire tree before deciding which parts are redundant.

Ques 4 :-  What is Information Gain in Decision Trees, and why is it important for
choosing the best split?

Answer:- Information Gain in decision trees is a metric that measures how much uncertainty (or impurity) is reduced when the dataset is split based on a particular feature. It is calculated as the difference between the impurity of the parent node (using measures like entropy) and the weighted sum of the impurities of the child nodes after the split. In other words, it tells us how much “information” a feature provides about the class labels. The higher the Information Gain, the better the feature is at separating the data into pure subsets. This is important because decision trees rely on choosing the best split at each step, and Information Gain ensures that the tree grows in a way that maximizes class separation, leading to more accurate and efficient classification.

Ques 5 :- What are some common real-world applications of Decision Trees, and
what are their main advantages and limitations?

Answer :- Decision Trees are widely applied in real-world scenarios such as finance (credit risk assessment, fraud detection), healthcare (diagnosis support, treatment decisions), marketing (customer segmentation, churn prediction), and operations (supply chain optimization, quality control). Their main advantages include being easy to interpret and visualize, handling both categorical and numerical data, and requiring minimal preprocessing. However, they also have limitations: they are prone to overfitting if not pruned properly, can be unstable since small changes in data may lead to very different trees, and may struggle with complex relationships compared to ensemble methods like Random Forests or Gradient Boosted Trees. In practice, decision trees are often used as building blocks in these ensemble methods to balance interpretability with predictive power.

In [1]:
"""Ques 6 :- Dataset Info:

● Iris Dataset for classification tasks (sklearn.datasets.load_iris() or provided CSV).

● Boston Housing Dataset for regression tasks (sklearn.datasets.load_boston() or provided CSV). Question 6: Write a Python program to:

● Load the Iris Dataset

● Train a Decision Tree Classifier using the Gini criterion

● Print the model’s accuracy and feature importances"""

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Decision Tree Classifier using Gini criterion
clf = DecisionTreeClassifier(criterion="gini", random_state=42)
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Print accuracy
print("Model Accuracy:", accuracy_score(y_test, y_pred))

# Print feature importances
print("Feature Importances:")
for feature, importance in zip(iris.feature_names, clf.feature_importances_):
    print(f"{feature}: {importance:.4f}")


Model Accuracy: 1.0
Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0191
petal length (cm): 0.8933
petal width (cm): 0.0876


In [2]:
""" Ques 7 :- Write a Python program to:
● Load the Iris Dataset
● Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to
a fully-grown tree."""

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Decision Tree with max_depth=3
clf_limited = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)
clf_limited.fit(X_train, y_train)
y_pred_limited = clf_limited.predict(X_test)
accuracy_limited = accuracy_score(y_test, y_pred_limited)

# Train fully-grown Decision Tree
clf_full = DecisionTreeClassifier(criterion="gini", random_state=42)
clf_full.fit(X_train, y_train)
y_pred_full = clf_full.predict(X_test)
accuracy_full = accuracy_score(y_test, y_pred_full)

# Print results
print("Accuracy with max_depth=3:", accuracy_limited)
print("Accuracy with fully-grown tree:", accuracy_full)


Accuracy with max_depth=3: 1.0
Accuracy with fully-grown tree: 1.0


In [5]:
"""Ques 8 :-  Write a Python program to:
● Load the Boston Housing Dataset
● Train a Decision Tree Regressor
● Print the Mean Squared Error (MSE) and feature importances"""

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load the California Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Decision Tree Regressor
regressor = DecisionTreeRegressor(random_state=42)
regressor.fit(X_train, y_train)

# Predict on test set
y_pred = regressor.predict(X_test)

# Print Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Print feature importances
print("Feature Importances:")
for feature, importance in zip(housing.feature_names, regressor.feature_importances_):
    print(f"{feature}: {importance:.4f}")


Mean Squared Error: 0.5280096503174904
Feature Importances:
MedInc: 0.5235
HouseAge: 0.0521
AveRooms: 0.0494
AveBedrms: 0.0250
Population: 0.0322
AveOccup: 0.1390
Latitude: 0.0900
Longitude: 0.0888


In [4]:
"""Ques 9 :-  Write a Python program to:
● Load the Iris Dataset
● Tune the Decision Tree’s max_depth and min_samples_split using
GridSearchCV
● Print the best parameters and the resulting model accuracy"""

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Define the parameter grid
param_grid = {
    'max_depth': [2, 3, 4, 5, None],
    'min_samples_split': [2, 3, 4, 5, 10]
}

# Initialize Decision Tree Classifier
clf = DecisionTreeClassifier(criterion="gini", random_state=42)

# Perform GridSearchCV
grid_search = GridSearchCV(estimator=clf, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Get best parameters
best_params = grid_search.best_params_
print("Best Parameters:", best_params)

# Evaluate accuracy with best model
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy with Best Parameters:", accuracy)


Best Parameters: {'max_depth': 4, 'min_samples_split': 10}
Model Accuracy with Best Parameters: 1.0


Ques 10 :- Imagine you’re working as a data scientist for a healthcare company that
wants to predict whether a patient has a certain disease. You have a large dataset with
mixed data types and some missing values.
Explain the step-by-step process you would follow to:

● Handle the missing values

● Encode the categorical features

● Train a Decision Tree model

● Tune its hyperparameters

● Evaluate its performance

And describe what business value this model could provide in the real-world
setting.

Answer:- To build a disease prediction model with decision trees, I would first handle missing values by imputing numerical features with mean or median values and categorical features with the most frequent category or an “Unknown” label, ensuring no patient records are lost. Next, I would encode categorical features using label encoding for ordinal variables and one‑hot encoding for nominal ones so the model can process them numerically. Then, I would train a Decision Tree Classifier on the cleaned dataset, splitting it into training and testing sets to evaluate performance. To improve generalization, I would tune hyperparameters such as max_depth, min_samples_split, and min_samples_leaf using GridSearchCV or RandomizedSearchCV. Finally, I would evaluate the model using metrics like accuracy, precision, recall, F1‑score, and a confusion matrix, with particular emphasis on recall to minimize missed diagnoses. In a real‑world healthcare setting, such a model provides business value by helping doctors identify high‑risk patients earlier, improving treatment outcomes, reducing unnecessary diagnostic costs, and offering interpretable decision rules that support compliance and trust in clinical decision‑making.