##Q 1. What is a Decision Tree, and how does it work?
**Ans** - A Decision Tree is a supervised machine learning algorithm used for classification and regression tasks. It works by splitting data into branches based on feature conditions, forming a tree-like structure where each internal node represents a decision rule, each branch represents an outcome, and each leaf node represents a final decision or prediction.

**Working of Decision Tree**
1. Start with the Root Node
  * The entire dataset is considered at the root.
  * The best feature is selected to split the data based on some criteria.

2. Splitting the Data
  * The dataset is divided into subsets based on feature values.
  * Each subset is assigned to a new node.

3. Recursive Splitting
  * The process is repeated for each child node until one of the stopping conditions is met:
  * All data in a node belongs to the same class.
  * A predefined depth limit is reached.
  * Further splitting does not improve the model significantly.

4. Leaf Nodes
  * Once splitting stops, leaf nodes are created, representing the final decision or class label.

**Types of Decision Trees**
* Classification Tree: Used when the output is categorical.
* Regression Tree: Used when the output is continuous.

**Splitting Criteria**
* Gini Impurity: Measures how often a randomly chosen element would be incorrectly classified.
* Entropy & Information Gain: Measures the impurity reduction.
* Mean Squared Error: Measures variance reduction.

**Advantages**
* Easy to understand and interpret.
* Handles both numerical and categorical data.
* Requires little data preprocessing.

**Disadvantages**
* Can overfit if too deep.
* Sensitive to small variations in data.
* Not always the most accurate model compared to other algorithms like Random Forest.

##Q 2. What are impurity measures in Decision Trees?
**Ans** - **Impurity Measures in Decision Trees**

Impurity measures help determine how well a dataset is split at each node of a Decision Tree. A node is considered "pure" if all the data points in it belong to the same class. The goal of the decision tree is to reduce impurity at each step.

**1. Gini Impurity**
* Measures how often a randomly chosen element would be incorrectly classified if randomly labeled according to the distribution of labels in the node.

* Formula:

      Gini = 1-∑ᶜᵢ₌₁ pᵢ²
where pᵢ is the probability of a class i in the node.

* Example:
If a node has 70% class A and 30% class B,

      Gini = 1-(0.7²+0.3²) = 1−(0.49+0.09) = 0.42
A lower Gini value means a purer node.

* Used In: CART (Classification and Regression Trees).

**2. Entropy (Information Gain)**
* Measures the disorder in a dataset.
* Formula:

      Entropy = -∑ᶜᵢ₌₁pᵢlog₂pᵢ
* Example:

If a node has 50% class A and 50% class B,

    Entropy = -[0.5log₂(0.5)+0.5log₂(0.5)]
            = -[0.5(−1)+0.5(−1)] = 1
Higher entropy means more impurity.
* Used In: ID3, C4.5, and C5.0 decision tree algorithms.

**3. Variance Reduction (for Regression Trees)**
* Used in regression trees where the output is continuous.
* Measures the decrease in variance after splitting.
* Formula:

      Variance = 1/n∑(yᵢ- ȳ)²
  where ȳ is the mean of the target variable.

* The split that results in the largest variance reduction is chosen.
* Used In: Regression trees.

**Comparison of Impurity Measures**

|Measure	|Type	|Best Value	|Used In|
|-||||
|Gini Impurity	|Classification	|0 (pure split)	|CART Algorithm|
|Entropy	Classification	|0 (pure split)	|ID3, C4.5|
|Variance Reduction	|Regression	|Low Variance	|Regression Trees|

##Q 3. What is the mathematical formula for Gini Impurity?
**Ans** - The mathematical formula for Gini Impurity is:

    Gini = 1-∑ᶜᵢ₌₁pᵢ²
where:
  * C = total number of classes
  * pᵢ = proportion of class i in the node

**Example Calculation**

Suppose a node has two classes:
* Class A: 70% (pA=0.7p A=0.7)
* Class B: 30% (pB=0.3p B=0.3)

      Gini = 1-(0.72+0.32)
      Gini = 1-(0.49+0.09) = 1-0.58 = 0.42
A lower Gini value means the node is purer, meaning the samples in the node mostly belong to one class.

##Q 4. What is the mathematical formula for Entropy?
**Ans** - The mathematical formula for Entropy in a Decision Tree is:

    Entropy=−∑ᶜᵢ₌₁pᵢlog₂pᵢ

where:
  * C = total number of classes
  * pᵢ = proportion of class i in the node

**Example Calculation**

Suppose a node has two classes:
* Class A: 70% (pA=0.7p A=0.7)
* Class B: 30% (pB=0.3p B=0.3)

      Entropy=-(0.7log₂0.7+0.3log₂0.3)
Using approximate log values:

      log₂0.7≈-0.514
      log₂0.3≈-1.737
      Entropy=-[(0.7×−0.514)+(0.3×−1.737)]
            =-[−0.3598−0.5211]
            =0.881

**Key Points**
* Entropy = 0 → Node is pure (all samples belong to one class).
* Entropy is highest when classes are equally distributed (e.g., 50%-50% gives Entropy = 1).
* Lower entropy means better splits.

##Q 5. What is Information Gain, and how is it used in Decision Trees?
**Ans** - Information Gain is a measure used in Decision Trees to determine which feature provides the most useful information for classification. It calculates the reduction in entropy after splitting the dataset based on a feature.

**Mathematical Formula**

    IG = Entropy(Parent)-∑ᵏᵢ₌₁|Sᵢ|/|S|Entropy(Sᵢ)
where:
* Entropy(Parent) = Entropy before the split
*  Sᵢ= Subsets created by splitting the dataset based on a feature
* |Sᵢ|/|S| = Weight of each subset (proportion of samples in that subset)
* Entropy(Sᵢ) = Entropy of the subset

**Information Gain is Used in Decision Trees**
1. Calculate Entropy of the Parent Node.
2. Split the Data based on a feature.
3. Calculate Weighted Entropy of the Child Nodes.
4. Compute Information Gain by subtracting the new entropy from the original entropy.
5. Choose the Feature with the Highest Information Gain for splitting.

**Example Calculation**

Suppose we have a dataset with two classes (Yes/No):
* Before Splitting (Parent Entropy):
  * 5 Yes, 5 No → Entropy = 1
* After Splitting on Feature X:
  * Left Node: 4 Yes, 1 No → Entropy = 0.72
  * Right Node: 1 Yes, 4 No → Entropy = 0.72

If each node contains half of the samples:

    Entropy(Children) = 5/10(0.72)+5/10(0.72) = 0.72
    InformationGain = 1−0.72 = 0.28
Since higher Information Gain is better, the feature with the highest IG is chosen for the next split.

**Key Points**
* Higher Information Gain → More informative feature.
* Used in ID3, C4.5, and C5.0 Decision Trees.
* Prevents unnecessary splits, improving model efficiency.

##Q 6. What is the difference between Gini Impurity and Entropy?
**Ans** - **Difference Between Gini Impurity and Entropy in Decision Trees**

|Criterion	|Gini Impurity	|Entropy (Information Gain)|
|-|||
|Definition	|Measures the probability of incorrectly classifying a randomly chosen element.	|Measures the disorder (uncertainty) in a dataset.|
|Formula	|Gini=1-∑pᵢ² |	Entropy=-∑pᵢlog₂pᵢ|
|Range	|0 (pure node) to 0.50.5 (two equal classes)	|0 (pure node) to 11 (max disorder for two classes)|
|Computation	|Faster, as it avoids logarithms.	|Slightly slower due to log calculations.|
|Interpretation	|Lower Gini means purer splits.|	Lower Entropy means less disorder.|
|Decision Tree Usage	|Used in CART (Classification and Regression Trees).	|Used in ID3, C4.5, C5.0 algorithms.|
|Preference	|Works well in most cases, often default in libraries like Scikit-Learn.	|Preferred when probability distributions need more precise differentiation.|

**Example Comparison**

Suppose a node has two classes:
* Class A: 70% (pA=0.7p A =0.7)
* Class B: 30% (pB=0.3p B =0.3)

1. Gini Impurity

        Gini=1-(0.7²+0.3²) = 1-(0.49+0.09) = 0.42
2. Entropy

        Entropy=-[0.7log₂(0.7)+0.3log₂(0.3)]
Approximating logs:

        =-[0.7(−0.514)+0.3(−1.737)]
        =-(-0.3598-0.5211) = 0.881

##Q 7. What is the mathematical explanation behind Decision Trees?
**Ans** - **Mathematical Explanation of Decision Trees**
A Decision Tree is a recursive, tree-like structure used for classification and regression. The mathematical foundation relies on selecting the best feature to split the dataset, minimizing impurity and maximizing information gain.

**1. Splitting Criteria**

At each node, the algorithm evaluates all features and chooses the one that best separates the data. The selection is based on minimizing impurity.

**For Classification:**

(a) Gini Impurity

    Gini = 1−∑ᶜᵢ₌₁pᵢ²
where:
* C is the number of classes.
* pᵢ  is the proportion of class i in the node.

(b) Entropy (Information Gain)

    Entropy = −∑ᶜᵢ₌₁pᵢlog₂pᵢ
Information Gain = Entropy(Parent)-∑ᵏᵢ₌₁|Sᵢ|/|S|Entropy(Sᵢ)
where:
* S is the total dataset.
* Sᵢ are the subsets created by splitting.
* |Sᵢ|/|S| is the weight of each subset.

**For Regression:**

(c) Variance Reduction

Used when predicting continuous values.

    Variance = 1/n∑(yᵢ-ȳ)²
Variance Reduction = Variance(Parent)-∑ᵏᵢ₌₁ |Sᵢ|/|S|Variance(Sᵢ)

**2. Recursive Splitting**

Once the best feature is chosen, the dataset is split into child nodes. The process repeats until a stopping condition is met:
* All data in a node belong to the same class (pure node).
* Max depth is reached.
* Further splitting does not improve accuracy.

**3. Pruning (Avoiding Overfitting)**

Pruning helps prevent overfitting by removing unnecessary branches.
* Pre-pruning: Stop splitting if Information Gain is below a threshold.
* Post-pruning: Build the full tree, then remove weak branches using cross-validation.

**Final Prediction**
* Classification: The majority class in a leaf node is the predicted label.
* Regression: The average value of the target variable in a leaf node is the prediction.

**Key Takeaways**
* Mathematical foundation: Based on entropy, Gini impurity, or variance reduction.
* Recursive splitting: Selects the best feature at each step.
* Pruning: Prevents overfitting.
* Prediction: Uses majority voting (classification) or averaging (regression).

##Q 8. What is Pre-Pruning in Decision Trees?
**Ans** - **Pre-Pruning in Decision Trees**

Pre-pruning is a technique used to prevent a Decision Tree from growing too deep, reducing overfitting by stopping the tree from splitting further based on certain conditions.

**How Pre-Pruning Works**

Instead of allowing the tree to grow fully and then cutting branches, pre-pruning stops the tree early if:
1. Max Depth is Reached: Stop splitting when the tree reaches a certain depth.
2. Minimum Samples per Leaf: Stop if a node has fewer than a threshold number of samples.
3. Minimum Information Gain: Stop if splitting does not significantly reduce impurity (Gini/Entropy).
4. Max Number of Nodes: Limit the total number of nodes in the tree.
5. Chi-Square Testing (for categorical data): Stop splitting if the new branches do not significantly improve classification.

**Mathematical Explanation**

A Decision Tree recursively splits nodes by selecting the best feature based on impurity reduction:

    Information Gain = Entropy(Parent)-∑ᵏᵢ₌₁|Sᵢ|/|S|Entropy(Sᵢ)
Pre-pruning sets a threshold θ, where further splitting occurs only if:

    Information Gain>θ
Similarly, in Gini Impurity:

    Gini Reduction = Gini(Parent)-∑ᵏᵢ₌₁|Sᵢ|/|S|Gini(Sᵢ)
Pre-pruning restricts splitting when Gini Reduction is too small.

**Advantages of Pre-Pruning**
  * Prevents Overfitting: Stops the tree before it becomes too complex.
  * Improves Generalization: Produces a simpler model that works better on new data.
  * Reduces Training Time: Stops unnecessary computations.

**Disadvantages of Pre-Pruning**
  * Risk of Underfitting: The tree might be too simple and miss important patterns.
  * Choosing the Right Threshold is Hard: Improper stopping criteria can hurt accuracy.

**Example in Python (Pre-Pruning using Scikit-Learn)**

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(max_depth=3, min_samples_split=5, min_samples_leaf=3)
clf.fit(X_train, y_train)

accuracy = clf.score(X_test, y_test)
print("Accuracy:", accuracy)

**Key Takeaways**
* Pre-Pruning stops tree growth early based on predefined conditions.
* Helps reduce overfitting but may cause underfitting if too aggressive.
* Common parameters: max_depth, min_samples_split, min_samples_leaf.

##Q 9. What is Post-Pruning in Decision Trees?
**Ans** - **Post-Pruning in Decision Trees**

Post-pruning (also called "pruning" or "cost-complexity pruning") is a technique used to reduce overfitting in Decision Trees by removing unnecessary branches after the tree has been fully grown.

**How Post-Pruning Works**
1. Grow the Full Tree: The Decision Tree is allowed to split until it perfectly classifies all training data.
2. Evaluate Performance: A validation dataset is used to measure the accuracy of different subtrees.
3. Remove Unnecessary Branches: The least important branches are pruned.
4. Stop When Accuracy Stops Improving: The tree is pruned until removing further nodes reduces accuracy.

**Mathematical Explanation**

A tree's complexity can be controlled by defining a pruning cost function:

    C(T)=E(T)+α⋅|T|
where:
* C(T) = Cost of the tree.
* E(T) = Error of the tree (misclassification rate).
* |T| = Number of terminal nodes (leaf nodes).
* α = Complexity penalty (higher αα leads to more pruning).

The goal is to find the subtree T' that minimizes C(T).

**Types of Post-Pruning**

**1. Reduced Error Pruning**
* Remove nodes if accuracy on validation data does not decrease.
* Stop when further pruning hurts accuracy.

**2. Cost-Complexity Pruning**
* Uses an α parameter to control pruning.
* A series of subtrees is created, and the one with the best validation accuracy is chosen.

**Advantages of Post-Pruning**
* Better Generalization: Removes unnecessary complexity while keeping important structure.
* Higher Accuracy on Test Data: Avoids overfitting to training data.
* Data-Driven Pruning: Uses a validation set, making it more reliable.

**Disadvantages of Post-Pruning**
* Computationally Expensive: Requires training a fully grown tree first.
* Choosing the Right Pruning Parameter: Finding the best α in cost-complexity pruning can be difficult.

**Example in Python (Using Cost-Complexity Pruning)**

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = path.ccp_alphas

pruned_trees = [DecisionTreeClassifier(random_state=42, ccp_alpha=alpha).fit(X_train, y_train) for alpha in ccp_alphas]

train_scores = [tree.score(X_train, y_train) for tree in pruned_trees]
test_scores = [tree.score(X_test, y_test) for tree in pruned_trees]

best_alpha = ccp_alphas[test_scores.index(max(test_scores))]
print(f"Best pruning parameter (alpha): {best_alpha}")

**Key Takeaways**
* Post-Pruning removes overfitting after training.
* Uses cost-complexity pruning to balance accuracy and complexity.
* More effective than pre-pruning but computationally expensive.

##Q 10. What is the difference between Pre-Pruning and Post-Pruning?
**Ans** - **Difference Between Pre-Pruning and Post-Pruning in Decision Trees**

|Criterion	|Pre-Pruning (Early Stopping)	|Post-Pruning (Pruning After Training)|
|-|||
|Definition	|Stops tree growth before it overfits.|	Removes unnecessary branches after the tree is fully grown.|
|How It Works	|Prevents splitting if a stopping condition (e.g., max depth, min samples) is met.|	Trims back branches that don’t improve test accuracy.|
|Stopping Criteria	|- Max tree depth reached.
- Min samples per leaf/node.
- Minimum information gain.
- Chi-square test (for categorical data).|	- Reduced error pruning (removes branches that don’t improve validation accuracy).
- Cost-complexity pruning (uses α to balance complexity & error).|
|Goal	|Prevent unnecessary complexity from the beginning.|	Remove overfitting caused by excessive depth.|
|Computation Cost	|Faster, as it stops growing early.|	Slower, since the full tree is grown before pruning.|
|Risk	|May cause underfitting if stopped too early.|	More effective in preventing overfitting but computationally expensive.|
|Commonly Used In	|Simple decision trees, small datasets.	|Large datasets, complex trees.|
|Scikit-Learn Parameters	|max_depth, min_samples_split, min_samples_leaf	|ccp_alpha (cost-complexity pruning)|

**Example Comparison**

Imagine a Decision Tree built for classifying emails as spam or not spam:
* Pre-Pruning: The tree stops splitting when max_depth = 3, even if some deeper patterns exist.
* Post-Pruning: The tree grows fully, then removes branches that don't improve validation accuracy.

##Q 11. What is a Decision Tree Regressor?
**Ans** - **Decision Tree Regressor**

A Decision Tree Regressor is a machine learning model that uses a tree structure to predict continuous numerical values rather than categorical labels (as in classification). It splits data recursively based on feature values to minimize prediction error (e.g., variance reduction or mean squared error).

**Working**
1. Start with the entire dataset as the root node.
2. Choose the best split using a criterion like Mean Squared Error, Mean Absolute Error, or Reduction in Variance.
3. Split the dataset into child nodes based on the chosen feature and threshold.
4. Repeat the process recursively until a stopping condition is met (e.g., max depth, min samples per leaf).
5. Predict the output at leaf nodes using the mean of the target values in that node.

**Mathematical Formulation**

**1. Splitting Criterion**

(a) Mean Squared Error

To split a node at feature XjX j, select the threshold tt that minimizes:

    MSE = 1/N∑ᴺᵢ₌₁(yᵢ-ŷ)²
 where:
* yᵢ is the actual target value.
* ŷ is the mean prediction in the node.

The goal is to find the split that reduces MSE the most.

(b) Variance Reduction

Alternatively, the variance before and after splitting is computed:

    Variance = 1/N∑ᴺᵢ₌₁(yᵢ-ȳ)²

A split is chosen if it maximizes variance reduction:

    Variance Reduction = Variance(Parent)-∑ᵏᵢ₌₁|Sᵢ|/|S|Variance(Sᵢ)
where
* Sᵢ are the subsets created by the split.

**Advantages of Decision Tree Regressors**
* Easy to Interpret: Can be visualized like a flowchart.
* Handles Non-Linear Data: Can model complex relationships.
* No Need for Feature Scaling: Works well with raw data.
* Handles Missing Values: Can split on available features.

**Disadvantages**
* Prone to Overfitting: Without pruning, it can create very deep trees.
* Not Smooth Predictions: Predictions are constant within regions, leading to step-like outputs.
* Sensitive to Small Changes: Small changes in data can result in different trees.

**Example: Decision Tree Regression in Python**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor

X = np.linspace(0, 10, 100).reshape(-1, 1)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

regressor = DecisionTreeRegressor(max_depth=4)
regressor.fit(X, y)

X_test = np.linspace(0, 10, 1000).reshape(-1, 1)
y_pred = regressor.predict(X_test)

plt.scatter(X, y, label="Actual Data", color="blue", alpha=0.5)
plt.plot(X_test, y_pred, label="Decision Tree Prediction", color="red")
plt.legend()
plt.show()

##Q 12. What are the advantages and disadvantages of Decision Trees?
**Ans** - **Advantages and Disadvantages of Decision Trees**

Decision Trees are widely used in machine learning due to their interpretability and flexibility, but they also have some limitations. Let's go through both aspects.

**Advantages of Decision Trees**
1. Easy to Understand and Interpret
  * Decision Trees are intuitive and can be easily visualized, making them useful for explaining models to non-technical stakeholders.
  * They mimic human decision-making processes.

2. Handles Both Classification and Regression
  * Can be used for both classification (DecisionTreeClassifier) and regression (DecisionTreeRegressor) problems.

3. No Need for Feature Scaling
  * Unlike SVMs or Neural Networks, Decision Trees don’t require standardization or normalization (e.g., MinMax scaling).

4. Handles Non-Linear Relationships
  * Can model complex, non-linear decision boundaries without requiring transformations of the data.

5. Handles Categorical and Numerical Data
  * Works with both categorical and numerical input variables.

6. Works with Missing Values
  * Can handle missing values by splitting only on available features.

7. Feature Selection is Built-In
  * Automatically selects the most important features when making splits, reducing the need for manual feature selection.

8. Computationally Efficient for Small Datasets
  * Faster training compared to deep learning models, especially on smaller datasets.

**Disadvantages of Decision Trees**
1. Prone to Overfitting
  * Deep trees memorize training data, making them perform poorly on new, unseen data.
  * Solution: Use pruning (pre-pruning or post-pruning) to prevent overfitting.

2. Unstable (Sensitive to Small Changes in Data)
  * A small change in the dataset can result in a completely different tree structure.
  * Solution: Use an ensemble method like Random Forest to improve stability.

3. Greedy Algorithm May Not Find the Optimal Tree
  * The tree-building algorithm chooses the best split locally, but not necessarily the best global structure.
  * Solution: Use bagging or boosting (e.g., Gradient Boosting, XGBoost) to improve performance.

4. Can Be Computationally Expensive for Large Datasets
  * As the dataset grows, trees become deep and complex, increasing memory and computation time.
  * Solution: Limit tree depth using max_depth or min_samples_split.

5. Biased Towards Dominant Classes
  * If some classes are more frequent, the tree may favor those classes during splitting.
  * Solution: Use balanced class weights or resample the dataset.

6. Not Smooth for Regression (Step-Wise Predictions)
  * Decision Tree Regression produces piecewise constant predictions, unlike linear or polynomial regression models.
  * Solution: Use Random Forest Regressor or Gradient Boosting Regressor for smoother predictions.

**Summary Table**

|Feature	|Decision Trees|
|-||
|Interpretability	| Highly interpretable|
|Feature Scaling Needed?	| No|
|Handles Missing Data?	| Yes|
|Handles Non-Linear Data?	| Yes|
|Overfitting Risk?	| High (needs pruning)|
|Computationally Expensive for Large Datasets?	| Yes|
|Stable Against Small Changes?	| No (high variance)|
|Handles Both Classification & Regression?	| Yes|

##Q 13. How does a Decision Tree handle missing values?
**Ans** -Decision Trees can handle missing values in two main ways:

1. During Training (Handling Missing Values While Building the Tree)
2. During Prediction (Handling Missing Values When Making Predictions)

**1. Handling Missing Values During Training**

When training a Decision Tree, missing values in features can be dealt with using different strategies:

**(A) Ignore Missing Values When Choosing Splits**
* Some Decision Tree algorithms (like CART in Scikit-Learn) ignore samples with missing values when computing the best split.
* However, this may reduce the available data and lead to a suboptimal split.

(B) Assign Missing Values to the Most Common Value
* If a feature is numerical, missing values can be replaced with the mean or median of the feature.
* If a feature is categorical, missing values can be replaced with the most frequent category (mode).
* This is a simple approach but may introduce bias.

(C) Use Surrogate Splits
* Instead of removing missing values, some algorithms use alternative (surrogate) features that correlate with the missing feature to determine splits.
* If the primary split feature has missing values, the decision tree follows a secondary feature that provides similar information.
* Used in C4.5 and CART implementations in some libraries.

**2. Handling Missing Values During Prediction**

When making predictions, if a sample has a missing value for a split feature, Decision Trees can:

(A) Follow the Most Frequent Path
* The tree follows the path taken by the majority of samples that have a value at that split.
* This is a simple and fast method.

(B) Weighted Splitting
* Assigns probabilities to missing values based on the distribution of existing values.
* The sample is sent down multiple branches proportionally and the prediction is averaged.
* More accurate but computationally expensive.

**Example: Handling Missing Values in Scikit-Learn**

In [None]:
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split

data = pd.DataFrame({
    'Feature1': [1, 2, np.nan, 4, 5, 6, np.nan, 8, 9, 10],
    'Feature2': [5, np.nan, 2, 3, 4, np.nan, 6, 7, 8, 9],
    'Target': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
})

imputer = SimpleImputer(strategy='mean')
data[['Feature1', 'Feature2']] = imputer.fit_transform(data[['Feature1', 'Feature2']])

X_train, X_test, y_train, y_test = train_test_split(data[['Feature1', 'Feature2']], data['Target'], test_size=0.2, random_state=42)
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print("Predictions:", y_pred)

* Here, we used mean imputation before training the Decision Tree.
* Alternatively, Decision Trees like C4.5 can handle missing values internally using surrogate splits.

##Q 14. How does a Decision Tree handle categorical features?
**Ans** - **How Decision Trees Handle Categorical Features**
Decision Trees can naturally handle categorical features in different ways, depending on whether they use binary splits (CART) or multiway splits (C4.5, ID3).

**1. Methods for Handling Categorical Features**

Decision Trees handle categorical data in the following ways:

(A) One-Hot Encoding
* Convert categorical features into multiple binary (0/1) features.
* Each category becomes a separate feature column.
* Used in Scikit-Learn's DecisionTreeClassifier (CART algorithm).

**Advantage:** Works well with all Decision Tree implementations.
**Disadvantage:** Increases feature dimensionality if categories are many (high cardinality).

* Example:

|Color	|One-Hot Encoding|
|-||
|Red	|(1,0,0)|
|Blue	|(0,1,0)|
|Green	|(0,0,1)|

In [None]:

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

data = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red']})
encoder = OneHotEncoder()
encoded_data = encoder.fit_transform(data[['Color']]).toarray()
print(encoded_data)

(B) Label Encoding
* Each category is assigned a unique numerical value.
* Can be used directly in Decision Trees.
* Works well when categories have an inherent order (e.g., "Low", "Medium", "High").

**Advantage:** Efficient, requires fewer features.
**Disadvantage:** May introduce false relationships if the order is meaningless.
* Example:

|Color	|Label Encoding|
|-||
|Red	|0|
|Blue	|1|
|Green	|2|

In [None]:
from sklearn.preprocessing import LabelEncoder

data = pd.DataFrame({'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red']})
encoder = LabelEncoder()
encoded_data = encoder.fit_transform(data['Color'])
print(encoded_data)

(C) Decision Trees with Native Categorical Splitting (C4.5, ID3)
* Algorithms like C4.5 and ID3 can handle categorical data directly.
* They select the best category split based on Information Gain or Gini Impurity.

**Advantage:** No need for encoding; works directly with categorical data.
**Disadvantage:** Not supported by Scikit-Learn's DecisionTreeClassifier (CART).

* Example: If "Color" is the best feature to split on, the tree creates branches:

In [None]:
       Color?
      /   |   \
   Red   Blue  Green

**2. Choosing the Best Approach**

|Method	|Best When	|Supported in Scikit-Learn?|
|-|||
|One-Hot Encoding	|Few categories, no order	| Yes (works well)|
|Label Encoding	|Categories have natural order	| Yes (but may mislead)|
|Native Categorical Splitting (C4.5)	|Large categorical features	| No (Scikit-Learn uses CART)|

##Q 15. What are some real-world applications of Decision Trees?
**Ans** - **Real-World Applications of Decision Trees**

Decision Trees are widely used in various domains due to their simplicity, interpretability, and ability to handle mixed data types (numerical & categorical). Here are some key applications:

**1. Healthcare & Medical Diagnosis**
* Disease Diagnosis – Decision Trees help in diagnosing diseases based on symptoms (e.g., COVID-19, diabetes, cancer detection).
* Treatment Recommendation – Used to suggest treatments based on patient history and test results.

* Example:

A Decision Tree can classify patients as low-risk or high-risk based on age, blood pressure, and cholesterol levels.

**2. Finance & Banking**
* Credit Scoring & Loan Approval – Used by banks to assess whether a customer is eligible for a loan.
* Fraud Detection – Helps identify fraudulent transactions based on unusual spending behavior.

* Example:
A bank uses a Decision Tree to determine whether to approve a loan based on income, credit history, and debt-to-income ratio.

**3. Retail & E-Commerce**
* Customer Segmentation – Used to group customers based on purchasing behavior.
* Product Recommendation – Helps suggest relevant products to customers.
* Churn Prediction – Identifies customers likely to stop using a service.

* Example:
An e-commerce site uses a Decision Tree to predict if a customer will buy a product based on browsing history, cart abandonment, and past purchases.

**4. Manufacturing & Quality Control**
* Defect Detection – Helps in identifying defective products based on sensor readings and quality checks.
* Predictive Maintenance – Predicts when a machine is likely to fail to prevent breakdowns.

* Example:
A factory uses a Decision Tree to predict machine failures based on temperature, vibration, and pressure levels.

**5.Human Resources & Employee Performance**
* Employee Attrition Prediction – Identifies employees who may leave based on work satisfaction, salary, and workload.
* Hiring Decisions – Helps in recruitment by analyzing candidate skills, experience, and cultural fit.

* Example:
A company uses a Decision Tree to determine whether to promote an employee based on performance scores, years of experience, and training completion.

**6. Marketing & Advertising**
* Targeted Advertising – Determines the best audience for a marketing campaign.
* Lead Scoring – Identifies potential customers who are most likely to convert.

* Example:
An advertising company uses a Decision Tree to decide whether to show an ad based on user age, location, and browsing behavior.

**7. Criminal Justice & Law Enforcement**
* Crime Prediction – Predicts high-crime areas based on historical data.
* Legal Decision Support – Helps lawyers analyze case outcomes based on past rulings.

* Example:
A law firm uses a Decision Tree to predict the outcome of a case based on evidence, past verdicts, and witness testimonies.

**8. Energy & Utilities**
* Energy Consumption Forecasting – Predicts electricity usage to optimize supply.
* Smart Grid Optimization – Helps manage power distribution efficiently.

* Example:
An energy company uses a Decision Tree to predict peak electricity demand based on weather conditions and historical usage patterns.

**9. Education & Student Performance**
* Student Performance Prediction – Helps in identifying students who need additional support.
* Personalized Learning Paths – Suggests customized learning plans based on student progress.

* Example:
A school uses a Decision Tree to predict whether a student will pass based on attendance, homework completion, and past grades.

#Practical

##Q 16. Write a Python program to train a Decision Tree Classifier on the Iris dataset and print the model accuracy.
**Ans** - Python program to train a Decision Tree Classifier on the Iris dataset and print the model accuracy.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Decision Tree Model Accuracy: {accuracy:.2f}")
plt.figure(figsize=(10,6))
plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()

**Explanation:**
* Loads the Iris dataset (preloaded in sklearn).
* Splits the dataset into training (80%) and testing (20%) sets.
* Uses Gini Impurity as the criterion and max_depth=3 to prevent overfitting.
* Trains a Decision Tree Classifier and makes predictions.
* Computes and prints the accuracy of the model.

##Q 17. Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances.
**Ans** - Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(criterion="gini", random_state=42)
clf.fit(X_train, y_train)

feature_importances = clf.feature_importances_
for feature, importance in zip(iris.feature_names, feature_importances):
    print(f"{feature}: {importance:.4f}")

    plt.figure(figsize=(10,6))
plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()

**Explanation:**
* Loads the Iris dataset.
* Splits the data into 80% training and 20% testing.
* Trains a Decision Tree Classifier using Gini Impurity.
* Prints feature importances, showing which features contribute the most to classification.

**Example Output (Feature Importances May Vary):**

In [None]:
sepal length (cm): 0.0123
sepal width (cm): 0.0276
petal length (cm): 0.9213
petal width (cm): 0.0388

* Higher values mean the feature is more important!

##Q 18. Write a Python program to train a Decision Tree Classifier using Entropy as the splitting criterion and print the model accuracy.
**Ans** - Python program to train a Decision Tree Classifier using Entropy as the splitting criterion and print the model accuracy.

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(criterion="entropy", random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Decision Tree Model Accuracy (Entropy): {accuracy:.2f}")

plt.figure(figsize=(10,6))
plot_tree(clf, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()

**Explanation:**
* Loads the Iris dataset (built-in in sklearn).
* Splits the dataset into training (80%) and testing (20%) sets.
* Uses Entropy as the criterion to split nodes.
* Trains a Decision Tree Classifier and makes predictions.
* Computes and prints the accuracy of the model.

##Q 19. Write a Python program to train a Decision Tree Regressor on a housing dataset and evaluate using Mean Squared Error (MSE).
**Ans** - Python program to train a Decision Tree Regressor on a housing dataset and evaluate its performance using Mean Squared Error (MSE).

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

housing = fetch_california_housing()
X = housing.data
y = housing.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

regressor = DecisionTreeRegressor(criterion="squared_error", max_depth=5, random_state=42)
regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print(f"Decision Tree Regressor MSE: {mse:.4f}")

plt.figure(figsize=(8,5))
plt.barh(housing.feature_names, regressor.feature_importances_)
plt.xlabel("Feature Importance")
plt.ylabel("Features")
plt.title("Decision Tree Regressor - Feature Importance")
plt.show()

**Explanation:**
* Loads the California housing dataset (built-in in sklearn).
* Splits the dataset into training (80%) and testing (20%) sets.
* Uses "squared_error" as the criterion (default in Scikit-Learn for regression).
* Trains a Decision Tree Regressor with max_depth=5 to prevent overfitting.
* Computes and prints the Mean Squared Error (MSE) to evaluate the model.

##Q 20. Write a Python program to train a Decision Tree Classifier and visualize the tree using graphviz.
**Ans** - Python program to train a Decision Tree Classifier and visualize the tree using Graphviz.

**Python Code**

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import graphviz
import pydotplus
from IPython.display import Image

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Decision Tree Model Accuracy: {accuracy:.2f}")

dot_data = export_graphviz(clf, out_file=None,
                           feature_names=iris.feature_names,
                           class_names=iris.target_names,
                           filled=True, rounded=True, special_characters=True)

graph = pydotplus.graph_from_dot_data(dot_data)
Image(graph.create_png())

**Explanation:**
* Loads the Iris dataset.
* Splits the data into 80% training and 20% testing.
* Trains a Decision Tree Classifier using Gini Impurity.
* Computes and prints model accuracy.
* Generates a visual representation of the tree using Graphviz & Pydotplus.

##Q 21. Write a Python program to train a Decision Tree Classifier with a maximum depth of 3 and compare its accuracy with a fully grown tree.
**Ans** - Python program that trains a Decision Tree Classifier with a maximum depth of 3 and compares its accuracy with a fully grown tree (no depth restriction).

**Python Code**

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf_limited = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)
clf_limited.fit(X_train, y_train)

clf_full = DecisionTreeClassifier(criterion="gini", random_state=42)
clf_full.fit(X_train, y_train)

y_pred_limited = clf_limited.predict(X_test)
y_pred_full = clf_full.predict(X_test)

accuracy_limited = accuracy_score(y_test, y_pred_limited)
accuracy_full = accuracy_score(y_test, y_pred_full)

print(f"Decision Tree Accuracy (Max Depth = 3): {accuracy_limited:.2f}")
print(f"Decision Tree Accuracy (Fully Grown): {accuracy_full:.2f}")

plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plot_tree(clf_limited, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.title("Decision Tree (Max Depth = 3)")

plt.subplot(1,2,2)
plot_tree(clf_full, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.title("Fully Grown Decision Tree")

plt.show()

**Explanation:**
* Loads the Iris dataset.
* Splits data into 80% training and 20% testing.
* Trains two Decision Tree models:
  * One with max_depth=3 (to prevent overfitting).
  * One fully grown (no depth restriction).
* Computes and prints accuracy for both models.

##Q 22. Write a Python program to train a Decision Tree Classifier using min_samples_split=5 and compare its accuracy with a default tree.
**Ans** - Python program that trains a Decision Tree Classifier using min_samples_split=5 and compares its accuracy with a default decision tree.

**Python Code**

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf_limited = DecisionTreeClassifier(criterion="gini", min_samples_split=5, random_state=42)
clf_limited.fit(X_train, y_train)

clf_default = DecisionTreeClassifier(criterion="gini", random_state=42)
clf_default.fit(X_train, y_train)

y_pred_limited = clf_limited.predict(X_test)
y_pred_default = clf_default.predict(X_test)

accuracy_limited = accuracy_score(y_test, y_pred_limited)
accuracy_default = accuracy_score(y_test, y_pred_default)

print(f"Decision Tree Accuracy (min_samples_split = 5): {accuracy_limited:.2f}")
print(f"Decision Tree Accuracy (Default Settings): {accuracy_default:.2f}")

plt.figure(figsize=(12,6))
plt.subplot(1,2,1)
plot_tree(clf_limited, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.title("Decision Tree (min_samples_split = 5)")

plt.subplot(1,2,2)
plot_tree(clf_default, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.title("Default Decision Tree")

plt.show()

**Explanation:**
* Loads the Iris dataset.
* Splits data into 80% training and 20% testing.
* Trains two Decision Tree models:
  * One with min_samples_split=5 (requires at least 5 samples to split a node).
  * One with default parameters (typically min_samples_split=2).
* Computes and prints accuracy for both models.

##Q 23. Write a Python program to apply feature scaling before training a Decision Tree Classifier and compare its accuracy with unscaled data.
**Ans** - Python program that applies feature scaling before training a Decision Tree Classifier and compares its accuracy with unscaled data.

**Python Code**

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf_unscaled = DecisionTreeClassifier(criterion="gini", random_state=42)
clf_unscaled.fit(X_train, y_train)
y_pred_unscaled = clf_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

clf_scaled = DecisionTreeClassifier(criterion="gini", random_state=42)
clf_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = clf_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

print(f"Decision Tree Accuracy (Unscaled Data): {accuracy_unscaled:.2f}")
print(f"Decision Tree Accuracy (Scaled Data): {accuracy_scaled:.2f}")

**Explanation:**
* Loads the Iris dataset.
* Splits data into 80% training and 20% testing.
* Trains two Decision Tree models:
  * One on raw (unscaled) data.
  * One after Standard Scaling (mean=0, variance=1).
* Computes and prints accuracy for both models.

##Q 24. Write a Python program to train a Decision Tree Classifier using One-vs-Rest (OvR) strategy for multiclass classification.
**Ans** -Python program to train a Decision Tree Classifier using the One-vs-Rest (OvR) strategy for multiclass classification.

**Python Code**

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.multiclass import OneVsRestClassifier

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

ovr_clf = OneVsRestClassifier(DecisionTreeClassifier(criterion="gini", random_state=42))
ovr_clf.fit(X_train, y_train)

y_pred = ovr_clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Decision Tree Classifier Accuracy (OvR): {accuracy:.2f}")

**Explanation:**
* Loads the Iris dataset (which has 3 classes).
* Splits data into 80% training and 20% testing.
* Uses One-vs-Rest (OvR) strategy, where:
  * Each class is treated as one-vs-all (binary classification).
  * A separate Decision Tree model is trained for each class.
* Computes and prints model accuracy.

##Q 25. Write a Python program to train a Decision Tree Classifier and display the feature importance scores.
**Ans** - Python program to train a Decision Tree Classifier and display the feature importance scores.

**Python Code**

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(criterion="gini", random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Decision Tree Model Accuracy: {accuracy:.2f}")

feature_importances = clf.feature_importances_

for feature, importance in zip(iris.feature_names, feature_importances):
    print(f"{feature}: {importance:.4f}")

plt.figure(figsize=(8, 5))
plt.barh(iris.feature_names, feature_importances, color='skyblue')
plt.xlabel("Feature Importance Score")
plt.ylabel("Feature")
plt.title("Feature Importance in Decision Tree")
plt.show()

**Explanation:**
* Loads the Iris dataset.
* Splits data into 80% training and 20% testing.
* Trains a Decision Tree Classifier using the Gini criterion.
* Computes and prints model accuracy.
* Extracts and prints feature importance scores.
* Visualizes feature importance using a bar chart.

**Example Output (Feature Importances May Vary)**

In [None]:
Decision Tree Model Accuracy: 1.00
sepal length (cm): 0.02
sepal width (cm): 0.00
petal length (cm): 0.57
petal width (cm): 0.41

##Q 26. Write a Python program to train a Decision Tree Regressor with max_depth=5 and compare its performance with an unrestricted tree.
**Ans** - Python program to train a Decision Tree Regressor with max_depth=5 and compare its performance with an unrestricted tree using Mean Squared Error (MSE).

**Python Code**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

housing = fetch_california_housing()
X = housing.data
y = housing.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

reg_limited = DecisionTreeRegressor(max_depth=5, random_state=42)
reg_limited.fit(X_train, y_train)

reg_full = DecisionTreeRegressor(random_state=42)
reg_full.fit(X_train, y_train)

y_pred_limited = reg_limited.predict(X_test)
y_pred_full = reg_full.predict(X_test)

mse_limited = mean_squared_error(y_test, y_pred_limited)
mse_full = mean_squared_error(y_test, y_pred_full)

print(f"Decision Tree Regressor MSE (Max Depth = 5): {mse_limited:.4f}")
print(f"Decision Tree Regressor MSE (Fully Grown): {mse_full:.4f}")

plt.figure(figsize=(10, 5))

plt.scatter(y_test, y_pred_limited, color='blue', alpha=0.5, label='Max Depth = 5')
plt.scatter(y_test, y_pred_full, color='red', alpha=0.3, label='Fully Grown Tree')
plt.plot([0, 5], [0, 5], color='black', linestyle='--', linewidth=2)

plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Decision Tree Regression: Predicted vs Actual")
plt.legend()
plt.show()

**Explanation:**
* Loads the California Housing dataset (predicts median house value).
* Splits data into 80% training and 20% testing.
* Trains two Decision Tree Regressors:
  * One with max_depth=5 (to reduce overfitting).
  * One fully grown (no depth restriction).
* Computes and prints Mean Squared Error (MSE) for both models.
* Plots actual vs predicted values for better comparison.

##Q 27. Write a Python program to train a Decision Tree Classifier, apply Cost Complexity Pruning (CCP), and visualize its effect on accuracy.
**Ans** - Python program to train a Decision Tree Classifier, apply Cost Complexity Pruning (CCP), and visualize its effect on accuracy.

**Python Code**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf_full = DecisionTreeClassifier(random_state=42)
clf_full.fit(X_train, y_train)

ccp_path = clf_full.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = ccp_path.ccp_alphas[:-1]

accuracy_train = []
accuracy_test = []

for alpha in ccp_alphas:
    clf = DecisionTreeClassifier(random_state=42, ccp_alpha=alpha)
    clf.fit(X_train, y_train)

    acc_train = accuracy_score(y_train, clf.predict(X_train))
    acc_test = accuracy_score(y_test, clf.predict(X_test))

    accuracy_train.append(acc_train)
    accuracy_test.append(acc_test)

plt.figure(figsize=(8, 5))
plt.plot(ccp_alphas, accuracy_train, marker="o", label="Training Accuracy", color="blue")
plt.plot(ccp_alphas, accuracy_test, marker="o", label="Testing Accuracy", color="red")
plt.xlabel("CCP Alpha")
plt.ylabel("Accuracy")
plt.title("Effect of Cost Complexity Pruning on Accuracy")
plt.legend()
plt.grid()
plt.show()

**Explanation:**
* Loads the Iris dataset.
* Splits data into 80% training and 20% testing.
* Trains a fully grown Decision Tree to extract CCP alphas.
* Iterates over different pruning strengths (ccp_alpha):
  * Trains multiple trees.
  * Records training and testing accuracy.
* Plots accuracy vs. CCP alpha to visualize the impact of pruning.

**Expected Output (Graph Analysis):**
* Low ccp_alpha (left side) → Overfitting, high training accuracy but low testing accuracy.
* Optimal ccp_alpha (middle) → Balanced, good generalization.
* High ccp_alpha (right side) → Underfitting, both accuracies drop.

##Q 28. Write a Python program to train a Decision Tree Classifier and evaluate its performance using Precision, Recall, and F1-Score.
**Ans** - Python program to train a Decision Tree Classifier and evaluate its performance using Precision, Recall, and F1-Score.

**Python Code**

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(criterion="gini", random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")

**Explanation:**
* Loads the Iris dataset (multiclass classification).
* Splits data into 80% training and 20% testing.
* Trains a Decision Tree Classifier using the Gini impurity criterion.
* Computes key performance metrics:
  * Precision → Measures class-specific correctness.
  * Recall → Measures class-specific completeness.
  * F1-Score → Balances precision and recall.
* Uses average='weighted' since the Iris dataset is multiclass.

**Example Output (Values May Vary):**

In [None]:
Accuracy: 1.00
Precision: 1.00
Recall: 1.00
F1-Score: 1.00

##Q 29. Write a Python program to train a Decision Tree Classifier and visualize the confusion matrix using seaborn.
**Ans** - Python program to train a Decision Tree Classifier and visualize the Confusion Matrix using seaborn.

**Python Code**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(criterion="gini", random_state=42)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

conf_matrix = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)
print(f"Decision Tree Classifier Accuracy: {accuracy:.2f}")

plt.figure(figsize=(6, 4))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.title("Confusion Matrix - Decision Tree Classifier")
plt.show()

**Explanation:**
* Loads the Iris dataset (3-class classification).
* Splits data into 80% training and 20% testing.
* Trains a Decision Tree Classifier using Gini impurity.
* Computes Confusion Matrix to analyze misclassifications.
* Visualizes the Confusion Matrix using seaborn.heatmap().

**Example Output (Confusion Matrix):**

In [None]:
Decision Tree Classifier Accuracy: 1.00

* Confusion Matrix Example:

In [None]:
[[10  0  0]
 [ 0  9  1]
 [ 0  0 10]]

##Q 30. Write a Python program to train a Decision Tree Classifier and use GridSearchCV to find the optimal values for max_depth and min_samples_split.
**Ans** - Python program to train a Decision Tree Classifier and use GridSearchCV to find the optimal values for max_depth and min_samples_split.

**Python Code**

In [None]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = DecisionTreeClassifier(random_state=42)

param_grid = {
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(estimator=clf, param_grid=param_grid, scoring='accuracy', cv=5, n_jobs=-1)
grid_search.fit(X_train, y_train)

best_model = grid_search.best_estimator_
best_params = grid_search.best_params_

y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Best Parameters: {best_params}")
print(f"Best Decision Tree Classifier Accuracy: {accuracy:.2f}")

**Explanation:**
* Loads the Iris dataset.
* Splits data into 80% training and 20% testing.
* Defines a Decision Tree Classifier.
* Uses GridSearchCV to optimize:
  * max_depth (limits tree depth).
  * min_samples_split (controls node splitting).
* Uses 5-fold cross-validation (cv=5) for better tuning.
* Trains the best model and evaluates it on the test set.

**Example Output**

In [None]:
Best Parameters: {'max_depth': 3, 'min_samples_split': 2}
Best Decision Tree Classifier Accuracy: 1.00