# theoretical


 1.What is a Decision Tree, and how does it work	.
 - A Decision Tree is a flowchart-like structure used for classification and regression tasks. It splits the dataset into subsets based on the value of input features. Each node represents a feature, each branch a decision rule, and each leaf a result.

2. What are impurity measures in Decision Trees
- Impurity measures quantify the disorder or uncertainty in data. Common measures are Gini Impurity and Entropy, used to decide how to split the data at each node.

3.  What is the mathematical formula for Gini Impurity
- Gini=1−
i=1
∑
n
​
 p
i
2
​


4. What is the mathematical formula for Entropy
- Entropy=−
i=1
∑
n
​
 p
i
​
 log
2
​
 (p
i
​
 )

5.  What is Information Gain, and how is it used in Decision Trees
- Information Gain is the reduction in entropy after a dataset is split on an attribute. It's used to select the feature that best splits the data:

- 𝐼
𝑛
𝑓
𝑜
𝑟
𝑚
𝑎
𝑡
𝑖
𝑜
𝑛

𝐺
𝑎
𝑖
𝑛
=
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
(
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
)
−
∑
𝑖
∣
𝑐
ℎ
𝑖
𝑙
𝑑
𝑖
∣
∣
𝑝
𝑎
𝑟
𝑒
𝑛
𝑡
∣
×
𝐸
𝑛
𝑡
𝑟
𝑜
𝑝
𝑦
(
𝑐
ℎ
𝑖
𝑙
𝑑
𝑖
)
- Information Gain=Entropy(parent)−
i
∑
​
  
∣parent∣
∣child
i
​
 ∣
​
 ×Entropy(child
i
​
 )

6.What is the difference between Gini Impurity and Entropy
- Both measure impurity, but Gini is faster to compute. Entropy tends to give slightly more balanced trees but is computationally heavier.

7. What is the mathematical explanation behind Decision Trees
- Decision Trees use a recursive algorithm (like ID3, CART) to split nodes based on criteria like Gini or Entropy to minimize impurity and build a tree from top to bottom.

8. What is Pre-Pruning in Decision Trees
- Pre-pruning stops tree growth early by setting limits (e.g., max_depth, min_samples_split) to prevent overfitting.

9.  What is Post-Pruning in Decision Trees
- Post-pruning allows full tree growth, then prunes unnecessary nodes based on validation set performance to improve generalization.

10.  What is the difference between Pre-Pruning and Post-Pruning
- Pre-pruning prevents growth during training.

- Post-pruning removes branches after training.

11.  What is a Decision Tree Regressor
- A Decision Tree Regressor is a variant used for regression tasks, where the output is a continuous value.

12. What are the advantages and disadvantages of Decision Trees
- Advantages: Easy to interpret, handles non-linear data, requires little data preprocessing.
- Disadvantages: Prone to overfitting, unstable with small data changes.

13.  How does a Decision Tree handle missing values
- By ignoring missing values, using surrogate splits, or imputing them before training.

14. How does a Decision Tree handle categorical features
- Categorical features are split using equality-based rules (e.g., feature == value), or label encoding/one-hot encoding can be used.

15. What are some real-world applications of Decision Trees?
- Medical diagnosis, customer segmentation, credit scoring, fraud detection, etc.

# practical

16. Write a Python program to train a Decision Tree Classifier on the Iris dataset and print the model accuracy
- from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

- iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)

- clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

- y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


17. Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances
- clf = DecisionTreeClassifier(criterion='gini')
clf.fit(X_train, y_train)
print("Feature Importances:", clf.feature_importances_)




18. Write a Python program to train a Decision Tree Classifier using Entropy as the splitting criterion and print the
model accuracy
- clf = DecisionTreeClassifier(criterion='entropy')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print("Accuracy (Entropy):", accuracy_score(y_test, y_pred))


19. Write a Python program to train a Decision Tree Regressor on a housing dataset and evaluate using Mean
Squared Error (MSE)
- from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

- data = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3, random_state=42)

- reg = DecisionTreeRegressor()
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)
print("MSE:", mean_squared_error(y_test, y_pred))


20. Write a Python program to train a Decision Tree Classifier and visualize the tree using graphviz
- from sklearn.tree import export_graphviz
import graphviz

- dot_data = export_graphviz(clf, out_file=None, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
graph = graphviz.Source(dot_data)
graph.render("decision_tree")  # Saves as decision_tree.pdf


21. Write a Python program to train a Decision Tree Classifier with a maximum depth of 3 and compare its
accuracy with a fully grown tree
- clf_full = DecisionTreeClassifier()
clf_limited = DecisionTreeClassifier(max_depth=3)

- clf_full.fit(X_train, y_train)
clf_limited.fit(X_train, y_train)

- print("Full Tree Accuracy:", accuracy_score(y_test, clf_full.predict(X_test)))
print("Limited Tree Accuracy:", accuracy_score(y_test, clf_limited.predict(X_test)))



22. Write a Python program to train a Decision Tree Classifier using min_samples_split=5 and compare its
accuracy with a default tree
- clf_default = DecisionTreeClassifier()
clf_modified = DecisionTreeClassifier(min_samples_split=5)

- clf_default.fit(X_train, y_train)
clf_modified.fit(X_train, y_train)

- print("Default Tree Accuracy:", accuracy_score(y_test, clf_default.predict(X_test)))
- print("min_samples_split=5 Accuracy:", accuracy_score(y_test, clf_modified.predict(X_test)))


23.  Write a Python program to apply feature scaling before training a Decision Tree Classifier and compare its
accuracy with unscaled data
- from sklearn.preprocessing import StandardScaler

- scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

- clf_scaled = DecisionTreeClassifier()
clf_scaled.fit(X_train_scaled, y_train)

- print("Scaled Data Accuracy:", accuracy_score(y_test, clf_scaled.predict(X_test_scaled)))
- print("Unscaled Data Accuracy:", accuracy_score(y_test, clf.predict(X_test)))


24. Write a Python program to train a Decision Tree Classifier using One-vs-Rest (OvR) strategy for multiclass
classification
- from sklearn.multiclass import OneVsRestClassifier

- ovr_clf = OneVsRestClassifier(DecisionTreeClassifier())
ovr_clf.fit(X_train, y_train)
- print("OvR Accuracy:", ovr_clf.score(X_test, y_test))


25. Write a Python program to train a Decision Tree Classifier and display the feature importance scores
- print("Feature Importances:", clf.feature_importances_)


26. Write a Python program to train a Decision Tree Regressor with max_depth=5 and compare its performance
with an unrestricted tree
- reg_full = DecisionTreeRegressor()
reg_limited = DecisionTreeRegressor(max_depth=5)

- reg_full.fit(X_train, y_train)
reg_limited.fit(X_train, y_train)

- print("Full Tree MSE:", mean_squared_error(y_test, reg_full.predict(X_test)))
print("max_depth=5 MSE:", mean_squared_error(y_test, reg_limited.predict(X_test)))


27. Write a Python program to train a Decision Tree Classifier, apply Cost Complexity Pruning (CCP), and
visualize its effect on accuracy
- path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas = path.ccp_alphas

- for ccp_alpha in ccp_alphas:
    clf_pruned = DecisionTreeClassifier(ccp_alpha=ccp_alpha)
    clf_pruned.fit(X_train, y_train)
    print(f"Alpha: {ccp_alpha:.4f}, Accuracy: {accuracy_score(y_test, clf_pruned.predict(X_test))}")


28. Write a Python program to train a Decision Tree Classifier and evaluate its performance using Precision,
Recall, and F1-Score
- from sklearn.metrics import classification_report

- y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred, target_names=iris.target_names))


29. Write a Python program to train a Decision Tree Classifier and visualize the confusion matrix using seaborn
- import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

- cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()


30.  Write a Python program to train a Decision Tree Classifier and use GridSearchCV to find the optimal values
for max_depth and min_samples_split.
- from sklearn.model_selection import GridSearchCV

- param_grid = {'max_depth': [2, 3, 4, 5, None], 'min_samples_split': [2, 5, 10]}
grid = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5)
grid.fit(X_train, y_train)

- print("Best Parameters:", grid.best_params_)
- print("Best Score:", grid.best_score_)
