#**Theory** **Questions**

1. What is a Decision Tree, and how does it work?

>A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It works by recursively splitting the data into subsets based on feature values, forming a tree structure where each internal node represents a decision rule, and each leaf node represents a class label (classification) or a numerical value (regression).


2. What are impurity measures in Decision Trees?

>Impurity measures quantify the heterogeneity of a dataset. Common impurity measures include:

>Gini Impurity: Measures how often a randomly chosen element would be incorrectly classified.

>
Entropy: Measures the amount of disorder in the dataset.



3. What is the mathematical formula for Gini Impurity?

>Gini = 1 - \sum_{i=1}^{C} p_i^2



4. What is the mathematical formula for Entropy?

>Entropy = - \sum_{i=1}^{C} p_i \log_2(p_i)


5. What is Information Gain, and how is it used in Decision Trees?

>Information Gain (IG) measures the reduction in entropy after a dataset is split based on a feature.

IG = Entropy(Parent) - \sum_{i=1}^{k} \frac{|S_i|}{|S|} Entropy(S_i)



6. What is the difference between Gini Impurity and Entropy?

>Gini is computationally faster since it doesn't require logarithms.

>Entropy considers the probabilistic nature of uncertainty and can provide more interpretable results.



7. What is the mathematical explanation behind Decision Trees?

>Decision Trees recursively split the dataset using criteria like Gini or Entropy. The optimal split is determined by maximizing Information Gain or minimizing impurity. Recursive splitting stops based on stopping criteria like maximum depth, minimum samples per leaf, or pure nodes.


8. What is Pre-Pruning in Decision Trees?

>Pre-Pruning stops tree growth early by setting constraints like max_depth, min_samples_split, or min_samples_leaf.


9. What is Post-Pruning in Decision Trees?

>Post-Pruning grows a full tree first and then removes branches that do not improve accuracy using techniques like Cost Complexity Pruning (CCP).


10. What is the difference between Pre-Pruning and Post-Pruning?

>Pre-Pruning prevents overfitting early but might underfit.

>Post-Pruning builds a full tree first and then simplifies it.



11. What is a Decision Tree Regressor?

>A Decision Tree Regressor predicts continuous values by averaging the target variable in leaf nodes.


12. What are the advantages and disadvantages of Decision Trees?

->Advantages:

*Easy to interpret

*Handles both numerical and categorical data

*Requires little data preprocessing

->Disadvantages:

*Prone to overfitting

*Sensitive to small data changes

*Can be biased if dataset is imbalanced



13. How does a Decision Tree handle missing values?

>Uses surrogate splits: If a primary feature is missing, an alternative feature is used.

>Uses mean/mode imputation.



14. How does a Decision Tree handle categorical features?

>Uses one-hot encoding or label encoding.

Splits based on unique category values.



15. What are some real-world applications of Decision Trees?

>Credit risk assessment

>Medical diagnosis

>Fraud detection

>Customer segmentation

>Spam filtering



#**Practical** **Questions**

16.  Write a Python program to train a Decision Tree Classifier on the Iris dataset and print the model accuracy.

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 1.0


17. Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the
feature importances.

In [2]:
model = DecisionTreeClassifier(criterion="gini")
model.fit(X_train, y_train)

print("Feature Importances:", model.feature_importances_)


Feature Importances: [0.03334028 0.         0.88947325 0.07718647]


18. Write a Python program to train a Decision Tree Classifier using Entropy as the splitting criterion and print the
model accuracy.

In [23]:
model = DecisionTreeClassifier(criterion="entropy")
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

from sklearn.metrics import mean_squared_error
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 1.0


19. Write a Python program to train a Decision Tree Regressor on a housing dataset and evaluate using Mean
Squared Error.


In [3]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)

model = DecisionTreeRegressor()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("MSE:", mean_squared_error(y_test, y_pred))


MSE: 0.5010536440253875


20. Write a Python program to train a Decision Tree Classifier and visualize the tree using graphviz.

In [22]:
from sklearn.tree import export_graphviz
import graphviz

dot_data = export_graphviz(model, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
graph = graphviz.Source(dot_data)
graph.view()


'Source.gv.pdf'

21. Write a Python program to train a Decision Tree Classifier with a maximum depth of 3 and compare its
accuracy with a fully grown tree.

In [21]:
model_restricted = DecisionTreeClassifier(max_depth=3)
model_restricted.fit(X_train, y_train)
print("Restricted Accuracy:", accuracy_score(y_test, model_restricted.predict(X_test)))


Restricted Accuracy: 1.0


22. Write a Python program to train a Decision Tree Classifier using min_samples_split=5 and compare its
accuracy with a default tree.

In [20]:
model = DecisionTreeClassifier(min_samples_split=5)
model.fit(X_train, y_train)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))


Accuracy: 1.0


23. Write a Python program to apply feature scaling before training a Decision Tree Classifier and compare its
accuracy with unscaled data.

In [19]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = DecisionTreeClassifier()
model.fit(X_train_scaled, y_train)
print("Accuracy with Scaling:", accuracy_score(y_test, model.predict(X_test_scaled)))


Accuracy with Scaling: 1.0


24. Write a Python program to train a Decision Tree Classifier using One-vs-Rest (OvR) strategy for multiclass
classification.

In [18]:
from sklearn.multiclass import OneVsRestClassifier

model = OneVsRestClassifier(DecisionTreeClassifier())
model.fit(X_train, y_train)
print("OvR Accuracy:", accuracy_score(y_test, model.predict(X_test)))


OvR Accuracy: 1.0


25. Write a Python program to train a Decision Tree Classifier and display the feature importance scores.

In [17]:
print("Feature Importances:", model.feature_importances_)


Feature Importances: [0.         0.00851154 0.95207811 0.03941035]


26. Write a Python program to train a Decision Tree Regressor with max_depth=5 and compare its performance
with an unrestricted tree.

In [15]:
model = DecisionTreeRegressor(max_depth=5)
model.fit(X_train, y_train)
print("MSE:", mean_squared_error(y_test, model.predict(X_test)))

MSE: 0.0


27. Write a Python program to train a Decision Tree Classifier, apply Cost Complexity Pruning (CCP), and
visualize its effect on accuracy.

In [16]:
path = model.cost_complexity_pruning_path(X_train, y_train)
print("CCP Alphas:", path.ccp_alphas)

CCP Alphas: [0.         0.00404762 0.00555556 0.00555556 0.00810811 0.0120598
 0.12163269 0.49170139]


28. Write a Python program to train a Decision Tree Classifier and evaluate its performance using Precision,
Recall, and F1-Score.


In [14]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score

# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Train Decision Tree Classifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluate using Precision, Recall, and F1-Score
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")


Precision: 1.0000
Recall: 1.0000
F1-Score: 1.0000


29. Write a Python program to train a Decision Tree Classifier and visualize the confusion matrix using seaborn.

In [13]:
mport seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()




SyntaxError: invalid syntax (<ipython-input-13-42ccd2fca960>, line 1)

30. Write a Python program to train a Decision Tree Classifier and use GridSearchCV to find the optimal values
for max_depth and min_samples_split.

In [12]:
rom sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'max_depth': [3, 5, 10, None],
    'min_samples_split': [2, 5, 10]
}

# Perform Grid Search
grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Print best parameters and accuracy
print("Best Parameters:", grid_search.best_params_)
print("Best Accuracy:", grid_search.best_score_)


SyntaxError: invalid syntax (<ipython-input-12-7896ad85bb1c>, line 1)