**Question 1: What is a Decision Tree, and how does it work in the context of classification?**

Ans1. A Decision Tree is a supervised machine learning algorithm used for both classification and regression tasks. In the context of classification, it acts like a flowchart where each internal node represents a decision based on a feature, each branch represents an outcome of the decision, and each leaf node represents a class label.

Working:

The algorithm begins at the root node and splits the dataset based on feature values that best separate the classes.

This splitting continues recursively until a stopping condition is met (like max depth or pure nodes).

The path from the root to a leaf represents a classification rule.

**Question 2: Explain the concepts of Gini Impurity and Entropy as impurity measures. How do they impact the splits in a Decision Tree?**

Ans 2. Gini Impurity measures the probability of misclassifying a randomly chosen element if it were labeled randomly according to the class distribution.

𝐺
𝑖
𝑛
𝑖
=
1
−
∑
𝑖
=
1
𝑛
𝑝
𝑖
2
Gini=1− i=1∑npi2
​

Entropy measures the amount of disorder or randomness in the data.

𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
=
−
∑
𝑖
=
1
𝑛
𝑝
𝑖
log
⁡
2
𝑝
𝑖
Entropy=−
i=1
∑
n
​
 p
i
​
 log
2
​
 p
i
​

Impact on Splits:

Both are used to evaluate how "pure" a split is.

The split that results in the lowest Gini or Entropy is chosen.

Gini is faster to compute, whereas Entropy gives more information-theoretic insight.



**Question 3: What is the difference between Pre-Pruning and Post-Pruning in Decision Trees? Give one practical advantage of using each.**

Ans 3. Pre-Pruning: Stops the tree from growing beyond a certain depth or if a split doesn’t improve the metric significantly.

Advantage: Prevents overfitting and saves computation.

Post-Pruning: Builds a full tree and then removes branches that have little impact.

Advantage: Allows the model to explore complex patterns first and then simplifies.

** Question 4: What is Information Gain in Decision Trees, and why is it important for choosing the best split?**

Ans 4. Information Gain is the reduction in entropy (or impurity) after a dataset is split on an attribute. It is calculated as:

𝐼
𝑛
𝑓
𝑜
𝑟
𝑚
𝑎
𝑡
𝑖
𝑜
𝑛

𝐺
𝑎
𝑖
𝑛
=
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
(
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
)
−
∑
(
∣
𝑐
ℎ
𝑖
𝑙
𝑑
∣
∣
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
∣
×
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
(
𝑐
ℎ
𝑖
𝑙
𝑑
)
)
Information Gain=Entropy(parent)−∑(
∣parent∣
∣child∣
​
 ×Entropy(child))
Importance:

Higher information gain indicates a better split.

Helps in selecting the most informative feature at each node.



*Question 5: What are some common real-world applications of Decision Trees, and what are their main advantages and limitations?**

Ans 5. Applications:

Medical diagnosis

Customer churn prediction

Credit risk assessment

Fraud detection

Marketing segmentation

Advantages:

Easy to interpret and visualize

Handles both categorical and numerical data

Requires little data preparation

Limitations:

Prone to overfitting

Unstable to small data changes

Can be biased with imbalanced data

**Question 6: Python Program (Iris Dataset, Gini Criterion)**

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load data
data = load_iris()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
clf = DecisionTreeClassifier(criterion='gini')
clf.fit(X_train, y_train)

# Evaluate
accuracy = accuracy_score(y_test, clf.predict(X_test))
print("Accuracy:", accuracy)
print("Feature Importances:", clf.feature_importances_)


Accuracy: 1.0
Feature Importances: [0.01911002 0.         0.89326355 0.08762643]


** Question 7: Compare Decision Trees with max_depth=3 and fully-grown tree**

In [2]:
# Fully grown tree
clf_full = DecisionTreeClassifier()
clf_full.fit(X_train, y_train)
acc_full = accuracy_score(y_test, clf_full.predict(X_test))

# Limited depth tree
clf_limited = DecisionTreeClassifier(max_depth=3)
clf_limited.fit(X_train, y_train)
acc_limited = accuracy_score(y_test, clf_limited.predict(X_test))

print("Full Tree Accuracy:", acc_full)
print("Depth=3 Tree Accuracy:", acc_limited)


Full Tree Accuracy: 1.0
Depth=3 Tree Accuracy: 1.0


**Question 8: Decision Tree Regressor on Boston Housing**

In [None]:
from sklearn.datasets import load_boston
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load data
data = load_boston()
X, y = data.data, data.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
regressor = DecisionTreeRegressor()
regressor.fit(X_train, y_train)

# Evaluate
predictions = regressor.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print("MSE:", mse)
print("Feature Importances:", regressor.feature_importances_)


**Question 9: GridSearchCV Tuning on Iris Dataset**

In [4]:
from sklearn.model_selection import GridSearchCV

params = {
    'max_depth': [2, 3, 4, 5],
    'min_samples_split': [2, 3, 4]
}

grid = GridSearchCV(DecisionTreeClassifier(), params, cv=3)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Accuracy:", grid.best_score_)
from sklearn.model_selection import GridSearchCV

params = {
    'max_depth': [2, 3, 4, 5],
    'min_samples_split': [2, 3, 4]
}

grid = GridSearchCV(DecisionTreeClassifier(), params, cv=3)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Accuracy:", grid.best_score_)


Best Parameters: {'max_depth': 2, 'min_samples_split': 4}
Best Accuracy: 0.9238095238095237
Best Parameters: {'max_depth': 2, 'min_samples_split': 4}
Best Accuracy: 0.9238095238095237


**Question 10: Step-by-step for Disease Prediction Model**

Ans 10. Handle Missing Values:

Use imputation: SimpleImputer(strategy='mean' or 'most_frequent')

Drop rows/columns with too many missing values

Encode Categorical Features:

Use OneHotEncoder or LabelEncoder

Train Decision Tree:

Split into train-test sets

Use DecisionTreeClassifier().fit(X_train, y_train)

Tune Hyperparameters:

Use GridSearchCV with parameters like max_depth, min_samples_split

Evaluate Performance:

Accuracy, Precision, Recall, F1-score

Confusion Matrix and ROC Curve

Business Value:

Early detection leads to timely treatment

Reduces operational costs by automating diagnosis

Enhances patient care and prioritization

