# Decision Trees: Explanation

## 📚 Theory

A **decision tree** is a supervised learning algorithm used for both classification and regression tasks. It works by splitting data into subsets based on feature values, creating a tree-like structure.

---

## Key Concepts:

### How Decision Trees Work:
1. The **root node** represents the entire dataset.
2. **Internal nodes** represent decisions based on feature values.
3. **Leaf nodes** represent the final output or prediction.
4. The goal is to maximize **homogeneity** in subsets at each split.

---

### Gini Index:
The Gini Index measures the **impurity** or **diversity** in the dataset. It is calculated as:

**Gini = 1 − Σᵢ (pᵢ)²**

Where:
- **pᵢ** is the probability of a class at a given node.
- A Gini Index of **0** indicates perfect purity (all samples belong to one class).

---

### Information Gain (IG):
**Information Gain** measures how much information is gained by making a split. It is calculated as:

**IG = Entropy(parent) − Σᵢ (|childᵢ| / |parent| × Entropy(childᵢ))**

Where:
- **Entropy** measures randomness or uncertainty in the data.

---

### Entropy:
Entropy quantifies the randomness in the data and is calculated as:

**Entropy = − Σᵢ pᵢ log₂(pᵢ)**

Where:
- **pᵢ** is the probability of a class at a given node.
- Lower entropy indicates greater purity of the subset.

---

## 🛠️ Practical

### Building and Visualizing a Decision Tree in Python:
Use the **Scikit-learn** library to build and visualize a decision tree for classification tasks. Example steps include:
1. Preprocess the dataset.
2. Train a decision tree classifier.
3. Visualize the tree using `sklearn.tree.plot_tree`.

---

## Evaluation Metrics:

### 1. Accuracy:
The proportion of correct predictions out of all predictions made:
**Accuracy = (TP + TN) / (TP + TN + FP + FN)**

### 2. Confusion Matrix:
A matrix summarizing prediction results:
- **True Positives (TP):** Correctly predicted positives.
- **True Negatives (TN):** Correctly predicted negatives.
- **False Positives (FP):** Incorrectly predicted positives.
- **False Negatives (FN):** Incorrectly predicted negatives.

Decision trees are interpretable and powerful for structured data, though they can overfit without proper pruning or regularization techniques.



---

## Practical

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt

data = pd.read_csv("sample_dataset.csv")
X = data.drop("target", axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)
model.fit(X_train, y_train)

plt.figure(figsize=(12, 8))
plot_tree(model, feature_names=X.columns, class_names=str(model.classes_), filled=True)
plt.show()

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)

## Explanation of the Code:
### Data Preprocessing:

- ***drop("target", axis=1):*** Separates features and target labels.
- ***train_test_split:*** Splits the dataset into training and testing sets.
### Model Training:

- ***DecisionTreeClassifier:*** Builds the tree using the Gini Index as the splitting criterion.
- ***max_depth:*** Limits the tree's depth to prevent overfitting.
### Visualization:

- ***plot_tree:*** Generates a graphical representation of the decision tree, showing how splits are made.
### Model Evaluation:

- ***accuracy_score:*** Computes the proportion of correctly classified samples.
- ***confusion_matrix:*** Displays the matrix with true/false positives and negatives.

---

## Output Example
Decision Tree Plot: Shows feature splits and leaf nodes with class predictions.
- **Confusion Matrix:**
**Accuracy = (TP + TN) / (TP + TN + FP + FN)**
  
- ***Accuracy:*** A single value indicating the prediction success rate.
This process demonstrates the complete workflow of building, visualizing, and evaluating a decision tree model in Python.