#Decision tree


Q 1.What is a Decision Tree, and how does it work in the context of
classification?

Ans--A Decision Tree is a supervised learning algorithm that mimics human decision-making by breaking down a dataset into smaller subsets based on feature values. It consists of nodes and branches that represent decisions and their possible consequences. The main components of a Decision Tree include:

-   **Root Node:** Represents the entire dataset and the first decision point.

-   **Decision Nodes:** Internal nodes that represent tests on attributes (features) of the data.

-   **Leaf Nodes:** Terminal nodes that provide the final output or class label (e.g., "spam" or "not spam").

-   **Branches:** Lines connecting nodes, representing the outcome of a test and leading to the next node or leaf.

Q 2. Explain the concepts of Gini Impurity and Entropy as impurity measures.
How do they impact the splits in a Decision Tree?

Ans--**Definition:** Gini Impurity measures the likelihood of misclassifying a randomly chosen element from the dataset. It quantifies the probability of a data point being incorrectly labeled if it were randomly assigned a label according to the distribution of labels in the subset.

**Formula:** The Gini Impurity G for a node is calculated as:

\[ G = 1 - \sum_{i=1}\^{c} p_i\^2 \]

where
p<sub>i</sub>
  is the probability of a data point belonging to class
i and
c is the number of classes. The Gini value ranges from 0 (pure node) to 0.5 (maximum impurity for binary classification).

-  **Impact on Splits:** When building a decision tree, the algorithm selects the feature that results in the lowest Gini Impurity for the child nodes after the split. This helps create more homogeneous groups, improving the model's predictive power.

**Entropy**
-  **Definition:** Entropy is a measure of the disorder or uncertainty in a dataset. It quantifies the impurity of a node by measuring the unpredictability of the class labels in that node.

**Formula:** The Entropy H for a node is calculated as:

-     \[ H = -\sum_{i=1}\^{c} p_i \log_2(p_i) \]
where
p<sub>i</sub> is the probability of a data point belonging to class
i. Entropy values range from 0 (pure node) to
log<sub>2</sub>(c)(maximum impurity).

-   **Impact on Splits:** Similar to Gini Impurity, the decision tree algorithm uses Entropy to determine the best feature for splitting. The feature that results in the highest information gain (the reduction in entropy) is chosen for the split, leading to more informative and effective decision-making.

 **Comparison and Practical Implications**

-  **Computational Efficiency:** Gini Impurity is generally faster to compute than Entropy because it does not involve logarithmic calculations. This can make Gini a preferred choice in many practical implementations of decision trees.

-  **Performance:** While both measures often yield similar results, the choice between Gini Impurity and Entropy can depend on the specific dataset and problem context. Gini tends to be more sensitive to class distribution, while Entropy provides a more comprehensive view of the uncertainty in the dataset.

In summary, both Gini Impurity and Entropy are essential for guiding the splits in decision trees, helping to create models that are both accurate and efficient in classifying data. Understanding these concepts is crucial for anyone working with decision trees in machine learning.

Q 3.What is the difference between Pre-Pruning and Post-Pruning in Decision
Trees? Give one practical advantage of using each.

Ans--**1.Pre-Pruning (Early Stopping)**

-   **Definition:** Pre-pruning stops the tree growth during training before it becomes fully complex. The tree does not grow beyond a certain point if further splits do not meet a predefined criterion. Common criteria include minimum information gain, minimum samples per node, maximum depth, or statistical significance tests (like chi-square).

-   **Key Features:**
Halts the creation of nodes while building the tree.
Prevents creation of branches that do not significantly improve performance on the training set.

-   **Practical Advantage:**
Reduces overfitting early and saves computational resources because fewer nodes are created and evaluated. This is especially useful for very large datasets where a fully grown tree would be expensive to build.

**2. Post-Pruning (Cost-Complexity Pruning / Reduced-Error Pruning)**

-   **Definition:** Post-pruning allows the tree to be fully grown first and then prunes back nodes that do not contribute significantly to predictive accuracy based on a validation set or a complexity measure. Techniques include cost-complexity pruning (CART) or validating on a separate data set.

**Key Features:**

Initially builds a potentially overfitted tree capturing all patterns in the training data.
Iteratively removes branches/nodes that add minimal predictive value, simplifying the tree.

**Practical Advantage:**
Achieves higher predictive accuracy on unseen data because the full tree captures complex relationships before pruning, allowing smarter decisions about which branches are truly redun

Q 4.What is Information Gain in Decision Trees, and why is it important for
choosing the best split?

Ans--Information Gain (IG) is a metric used in Decision Trees to measure how much uncertainty (impurity) in the target variable is reduced after splitting the dataset based on a particular feature.

It is based on the concept of Entropy from information theory.

Information Gain is calculated as

IG=Entropy(parent)−∑(
N<sub>parent</sub>/
N<sub>child</sub> × Entropy(child))

Where:

-  Entropy(parent) = impurity before split

-  Entropy(child) = impurity after split

Weighted sum accounts for proportion of samples in each child node

**Why Information Gain is Important**


-  Information Gain is used to choose the best feature for splitting at each node.

**The Decision Tree algorithm:**


-   Computes Information Gain for all possible features.

-   Selects the feature with the highest Information Gain.

-   Performs the split.

-   Repeats recursively.

**Reason:**

-  The feature with the highest IG:

-  Produces the purest child nodes

-   Reduces uncertainty the most

-  Improves classification accuracy

Q 5.What are some common real-world applications of Decision Trees, and
what are their main advantages and limitations?

Ans--**Decision Trees are widely used in real-world applications such as finance, healthcare, and marketing due to their interpretability, simplicity, and versatility.**
**Common Applications**
-   1. Finance and Banking:
Decision Trees are used for credit risk analysis, loan approval, and fraud detection. They help in classifying applicants as low or high risk based on various financial and demographic features.

-   2. Healthcare and Medicine
In medical diagnostics, Decision Trees assist in disease prediction and patient outcome analysis. For example, physicians can predict the likelihood of illnesses like diabetes or heart disease based on patient data.

-   3. Marketing and Customer Analytics
Businesses use Decision Trees to segment customers, predict customer churn, or recommend products. They classify customers into groups based on purchasing behavior, demographics, and engagement history.

**Main Advantages**

-  Interpretability and Transparency:Each decision is represented as a simple rule in the tree, making it easy for humans to understand the reasoning behind predictions.
-  Handling of Both Numerical and Categorical Data:
Decision Trees can manage diverse data types without extensive preprocessing.
-  Nonparametric Nature:
They do not assume any underlying distribution of data, allowing them to capture complex patterns and relationships.
-   Automatic Feature Selection:By evaluating splits based on information gain or Gini impurity, Decision Trees inherently highlight the most important variables.
-   Ease of Use and Visualization:Their structure is intuitive and can be visualized clearly, making them suitable for decision-making and communication with non-technical stakeholders.
-   Versatility:They can be used for classification, regression, and even multi-output predictions, making them broadly applicable across domains.

Overall, Decision Trees offer a powerful combination of clarity, flexibility, and practical utility across diverse industries, making them a popular choice for both predictive modeling and prescriptive analytics.

Q 6.Write a Python program to:
● Load the Iris Dataset
● Train a Decision Tree Classifier using the Gini criterion
● Print the model’s accuracy and feature importances

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)

# Predictions
y_pred = clf.predict(X_test)

# Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))

# Feature importance
print("Feature Importances:", clf.feature_importances_)


Accuracy: 1.0
Feature Importances: [0.         0.01911002 0.89326355 0.08762643]


Q 7.Write a Python program to:
● Load the Iris Dataset
● Train a Decision Tree Classifier with max_depth=3 and compare its accuracy to
a fully-grown tree.

In [2]:
# Fully grown tree
full_tree = DecisionTreeClassifier(random_state=42)
full_tree.fit(X_train, y_train)
full_acc = accuracy_score(y_test, full_tree.predict(X_test))

# Limited depth tree
limited_tree = DecisionTreeClassifier(max_depth=3, random_state=42)
limited_tree.fit(X_train, y_train)
limited_acc = accuracy_score(y_test, limited_tree.predict(X_test))

print("Fully Grown Tree Accuracy:", full_acc)
print("Max Depth=3 Accuracy:", limited_acc)


Fully Grown Tree Accuracy: 1.0
Max Depth=3 Accuracy: 1.0


Q 8. Write a Python program to:
● Load the California Housing dataset from sklearn
● Train a Decision Tree Regressor
● Print the Mean Squared Error (MSE) and feature importances

In [3]:
from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load dataset
housing = fetch_california_housing()
X = housing.data
y = housing.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
reg = DecisionTreeRegressor(random_state=42)
reg.fit(X_train, y_train)

# Predict
y_pred = reg.predict(X_test)

# MSE
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("Feature Importances:", reg.feature_importances_)


Mean Squared Error: 0.5280096503174904
Feature Importances: [0.52345628 0.05213495 0.04941775 0.02497426 0.03220553 0.13901245
 0.08999238 0.08880639]


Q 9.Write a Python program to:

● Load the Iris Dataset

● Tune the Decision Tree’s max_depth and min_samples_split using
GridSearchCV

● Print the best parameters and the resulting model accuracy

In [5]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV

# Load Iris dataset (as per Q9)
iris = load_iris()
X = iris.data
y = iris.target

# Split data (as per previous examples)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

param_grid = {
    'max_depth': [2, 3, 4, 5, None],
    'min_samples_split': [2, 5, 10]
}

grid = GridSearchCV(DecisionTreeClassifier(random_state=42),
                    param_grid,
                    cv=5)

grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Accuracy:", grid.best_score_)


Best Parameters: {'max_depth': 4, 'min_samples_split': 10}
Best Accuracy: 0.9428571428571428


Q 10.Imagine you’re working as a data scientist for a healthcare company that
wants to predict whether a patient has a certain disease. You have a large dataset with
mixed data types and some missing values.
Explain the step-by-step process you would follow to:

● Handle the missing values

● Encode the categorical features

● Train a Decision Tree model

● Tune its hyperparameters

● Evaluate its performance

And describe what business value this model could provide in the real-world setting.



Ans--
step 1: **Handle Missing Values**


-  Numerical → Mean/Median imputation

-  Categorical → Mode imputation

-  Use SimpleImputer

Step 2: **Encode Categorical Features**

-  Nominal → One-Hot Encoding

-  Ordinal → Label Encoding

Step 3: **Train Decision Tree Model**

-  Split data (train/test)

-  Use DecisionTreeClassifier

-  Select impurity criterion

Step 4: **Hyperparameter Tuning**

-  Tune:

   -    max_depth

   -    min_samples_split

   -    min_samples_leaf

-   Use GridSearchCV

-   Apply cross-validation

Step 5: **Evaluate Performance**

-  Accuracy

-  Precision

-   Recall

-  F1-score

-  Confusion Matrix

-  ROC-AUC

**Business Value**

-  Early disease detection

-  Reduced healthcare costs

-  Faster diagnosis

-  Better treatment prioritization

-  Improved patient survival rates

A well-tuned Decision Tree model can assist doctors in decision support systems and risk stratification.