1.What is a Decision Tree, and how does it work in the context of
classification?
- A decision tree is a supervised learning algorithm used for both classification and regression. In classification, it works by recursively partitioning the data into subsets based on feature values, creating a tree-like structure that predicts the class of a new data point based on its features.
How Decision Trees Work?
1. Start with the Root Node: It begins with a main question at the root node which is derived from the dataset’s features.

2. Ask Yes/No Questions: From the root, the tree asks a series of yes/no questions to split the data into subsets based on specific attributes.

3. Branching Based on Answers: Each question leads to different branches:

If the answer is yes, the tree follows one path.
If the answer is no, the tree follows another path.
4. Continue Splitting: This branching continues through further decisions helps in reducing the data down step-by-step.

5. Reach the Leaf Node: The process ends when there are no more useful questions to ask leading to the leaf node where the final decision or prediction is made.

2.: Explain the concepts of Gini Impurity and Entropy as impurity measures.
How do they impact the splits in a Decision Tree?
- Gini Impurity and Entropy are both measures of impurity used in decision tree algorithms to determine the best way to split data at each node. They quantify how "mixed" or "pure" a node is in terms of the classes present. The goal of a decision tree is to minimize impurity at each split, creating nodes that are increasingly pure as you move down the tree.

Gini Impurity:
Measures the probability of misclassifying a randomly chosen element in a dataset if it were randomly labeled according to the class distribution within that node.
Ranges from 0 to 0.5 in binary classification. A Gini impurity of 0 indicates perfect purity (all data points belong to the same class), while 0.5 represents maximum impurity (equal distribution of classes).
 Faster to compute than entropy because it doesn't involve logarithmic calculations.

Entropy:
Measures the amount of uncertainty or disorder in a dataset.
Also ranges from 0 to 1. 0 indicates a pure node, and 1 indicates maximum impurity.
Provides a more nuanced measure of impurity compared to Gini impurity but is computationally more expensive.

Impact on Splits in a Decision Tree:
- Finding the best split:
Decision trees use these impurity measures to evaluate different possible splits at each node.
- Minimizing impurity:
The algorithm selects the split that results in the lowest Gini impurity or entropy for the resulting child nodes.
- Information Gain:
When using entropy, the concept of information gain is crucial. Information gain is the reduction in entropy achieved by splitting on a particular feature. The feature with the highest information gain is preferred.
- Greedy Approach:
Decision trees use a greedy approach, meaning they make the best split at each node locally, without considering the global structure of the tree.
- Iterative Process:
The process of calculating impurity and selecting splits is repeated recursively for each child node until a stopping criterion is met (e.g., maximum depth reached, minimum number of samples in a node).

3.What is the difference between Pre-Pruning and Post-Pruning in Decision
Trees? Give one practical advantage of using each.
-  
Pre-Pruning (Early Stopping)

Definition: In pre-pruning, the decision tree stops growing early during its construction if a certain condition is met (like maximum depth reached, minimum number of samples at a node, or minimum information gain).

Goal: Prevents the tree from becoming too large and overfitting during training.

Practical Advantage:

Efficiency: Pre-pruning reduces training time and memory usage because the tree does not grow unnecessarily deep. This is especially useful when working with large datasets.

Post-Pruning (Pruning after Full Growth)

Definition: In post-pruning, the decision tree is first grown fully (allowing it to overfit), and then it is simplified by removing branches or subtrees that do not provide significant predictive power (using validation data or statistical tests).

Goal: Improves generalization by reducing overfitting after seeing the full structure.

Practical Advantage:

Better Accuracy: Post-pruning usually results in a more accurate model on unseen data, since it considers the entire tree and prunes only those parts that are harmful to generalization

4.What is Information Gain in Decision Trees, and why is it important for
choosing the best split?
- Information Gain (IG) in Decision Trees:
Information Gain is a metric used in decision trees to decide which feature to split on at each step. It measures the reduction in uncertainty (or impurity) about the target variable when a dataset is split on a given attribute.
Imortance:
Guides Feature Selection:
It helps the decision tree algorithm pick the attribute that gives the purest child nodes (less mixed, more class-homogeneous).
Prevents Random Splits:
Without IG, splits could be chosen arbitrarily. IG ensures the tree grows in a way that maximizes learning at each step.
Leads to Better Accuracy:
By always picking the attribute that provides the most information, the decision tree reduces classification error.
Balances Tree Growth:
It avoids unnecessary complexity by favoring splits that contribute meaningful separation of classes.

5.What are some common real-world applications of Decision Trees, and
what are their main advantages and limitations?
 Decision trees are widely used in both classification and regression tasks because they are interpretable and easy to implement.
Healthcare & Medicine:
-  diseases (e.g., predicting whether a patient has diabetes based on test results).
- Treatment recommendation systems.
- Risk prediction (e.g., likelihood of heart disease).
Finance & Banking:
- Credit risk assessment (approve/deny loan).
- Fraud detection in transactions.
- Customer segmentation for targeted offers.
Retail & Marketing:
- Predicting customer churn (who is likely to leave a service).
- Product recommendation.
- Sales forecasting.
Human Resources:
Employee attrition prediction.
Hiring decision support.
Manufacturing & Operations:
Predictive maintenance (when a machine might fail).
Quality control (defect classification).

Advantages of Decision Trees

- Easy to Understand & Interpret:
Looks like a flowchart → even non-technical people can follow the reasoning.

- Handles Both Numerical & Categorical Data:
Works on a mix of feature types without much preprocessing.

- No Need for Feature Scaling:
Unlike SVM or Logistic Regression, decision trees don’t require normalization/standardization.
Limitations of Decision Trees

- Overfitting (High Variance):
Trees can become too complex, memorizing training data instead of generalizing.

- Solution: pruning, max depth, or ensemble methods like Random Forest.

- Unstable to Small Changes in Data:
A small change in training data can lead to a very different tree structure.

- Biased Towards Features with Many Levels:
Features with more unique values (e.g., ID numbers) can dominate splits.



7.Write a Python program to:
● Load the Iris Dataset
● Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to
a fully-grown tree.


In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train Decision Tree with max_depth=3
tree_depth3 = DecisionTreeClassifier(max_depth=3, random_state=42)
tree_depth3.fit(X_train, y_train)

# Predictions and accuracy for max_depth=3 tree
y_pred_depth3 = tree_depth3.predict(X_test)
accuracy_depth3 = accuracy_score(y_test, y_pred_depth3)

# Train a fully-grown Decision Tree (no max_depth specified)
tree_full = DecisionTreeClassifier(random_state=42)
tree_full.fit(X_train, y_train)

# Predictions and accuracy for fully-grown tree
y_pred_full = tree_full.predict(X_test)
accuracy_full = accuracy_score(y_test, y_pred_full)

print("Accuracy with max_depth=3:", accuracy_depth3)
print("Accuracy with fully-grown tree:", accuracy_full)


Accuracy with max_depth=3: 0.9666666666666667
Accuracy with fully-grown tree: 0.9333333333333333


8.: Write a Python program to:
● Load the California Housing dataset from sklearn
● Train a Decision Tree Regressor
● Print the Mean Squared Error (MSE) and feature importances


In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load California Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Decision Tree Regressor
regressor = DecisionTreeRegressor(random_state=42)
regressor.fit(X_train, y_train)

# Predictions
y_pred = regressor.predict(X_test)

# Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Feature Importances
print("\nFeature Importances:")
for name, importance in zip(housing.feature_names, regressor.feature_importances_):
    print(f"{name}: {importance:.4f}")


Mean Squared Error: 0.495235205629094

Feature Importances:
MedInc: 0.5285
HouseAge: 0.0519
AveRooms: 0.0530
AveBedrms: 0.0287
Population: 0.0305
AveOccup: 0.1308
Latitude: 0.0937
Longitude: 0.0829


9.Write a Python program to:
● Load the Iris Dataset
● Tune the Decision Tree’s max_depth and min_samples_split using
GridSearchCV
● Print the best parameters and the resulting model accuracy

In [3]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Define the Decision Tree Classifier
dt = DecisionTreeClassifier(random_state=42)

# Define parameter grid for GridSearch
param_grid = {
    'max_depth': [2, 3, 4, 5, None],   # Try different tree depths
    'min_samples_split': [2, 3, 4, 5, 10]  # Min samples required to split
}

# Perform GridSearchCV
grid_search = GridSearchCV(
    estimator=dt,
    param_grid=param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

# Best parameters
print("Best Parameters:", grid_search.best_params_)

# Best model
best_model = grid_search.best_estimator_

# Predictions on test data
y_pred = best_model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy on Test Data:", accuracy)


Best Parameters: {'max_depth': 3, 'min_samples_split': 2}
Model Accuracy on Test Data: 0.9777777777777777


10.: Imagine you’re working as a data scientist for a healthcare company that
wants to predict whether a patient has a certain disease. You have a large dataset with
mixed data types and some missing values.
Explain the step-by-step process you would follow to:
● Handle the missing values
● Encode the categorical features
● Train a Decision Tree model
● Tune its hyperparameters
● Evaluate its performance
And describe what business value this model could provide in the real-world
setting.

- 1. Handle the Missing Values

Identify missing data: Use methods like df.isnull().sum() to see which features have missing values and their percentage.
Strategies:
- Numerical features: Impute with mean/median (if data is symmetric/skewed), or use KNN imputation for more accuracy.
- Categorical features: Impute with mode (most frequent value), or use a special category like "Unknown".
- High missing rate (>40%): Consider dropping such columns if they don’t add much value.
2. Encode the Categorical Features
Decision Trees handle categorical data in split-based form, but sklearn’s implementation requires numerical encoding.
Options:
- One-Hot Encoding: For nominal categories (e.g., blood type: A, B, AB, O).
- Ordinal Encoding: For ordered categories (e.g., disease stage: mild < moderate < severe).
Use ColumnTransformer in sklearn to apply different encodings to different feature types in one pipeline.
3. Train a Decision Tree Model
- Split the dataset into train/test sets (e.g., 70/30 split) or use stratified k-fold cross-validation (important if classes are imbalanced).
4. Tune Hyperparameters
Decision Trees can overfit if not pruned. Use GridSearchCV or RandomizedSearchCV.
Key hyperparameters:
max_depth: Controls tree depth.
min_samples_split: Minimum samples to split a node.
min_samples_leaf: Minimum samples at a leaf node.
criterion: "gini" or "entropy" (impurity measure).
5. Evaluate Model Performance
Use multiple metrics (accuracy alone can be misleading in healthcare):
Confusion Matrix (TP, FP, TN, FN).
Precision (how many predicted positives are correct).
Recall / Sensitivity (how many actual positives are detected).
F1-score (balance between precision and recall).
ROC-AUC (probability model ranks a random positive higher than a random negative).
6. Business Value in Healthcare
- Early Diagnosis: Helps flag high-risk patients for further medical tests.
Resource Allocation: Hospitals can prioritize limited resources (ICU beds, diagnostic tests) for high-risk patients.
- Personalized Treatment: Assists doctors in tailoring treatments based on predicted risk.
- Cost Savings: Reduces unnecessary tests for low-risk patients, saving both time and money.
- Patient Outcomes: Ultimately improves survival rates by catching diseases earlier.

