# Decision Tree Algorithm

### Why Decision Tree?

A Decision Tree is a supervised learning algorithm used for classification and regression problems. It mimics human decision-making by splitting data based on conditions.

- Handles both numerical and categorical data
- Easy to interpret and visualize
- Requires minimal data preparation

### How does a Decision Tree work?

- It splits the dataset into smaller subsets based on feature conditions.
- The splits happen recursively until the stopping condition is met (like max depth, min samples per leaf).
- Uses metrics like Gini Index (for classification) or Mean Squared Error (MSE) (for regression) to decide the best splits.

### Use Case of Decision Tree

- Loan approval (banking): Should a customer be given a loan or not?
- Medical diagnosis: Is the tumor benign or malignant?
- Customer churn prediction

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

In [2]:
# Sample dataset (Loan Approval)
data = {
    'Age': [25, 45, 35, 50, 23],
    'Salary': [50000, 60000, 80000, 150000, 20000],
    'Approved': [0, 1, 1, 1, 0]  # 0 = No, 1 = Yes
}

In [3]:
# Convert to DataFrame
df = pd.DataFrame(data)

In [4]:
# Split into features and target
X = df[['Age', 'Salary']]
y = df['Approved']

In [5]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [6]:
# Train Decision Tree Model
dt_model = DecisionTreeClassifier(criterion='gini', max_depth=3)
dt_model.fit(X_train, y_train)

In [7]:
# Predictions
y_pred = dt_model.predict(X_test)

In [8]:
# Evaluate Model
accuracy = accuracy_score(y_test, y_pred)
print(f"Decision Tree Accuracy: {accuracy:.2f}")

Decision Tree Accuracy: 1.00


# Random Forest (Ensemble Technique: Bagging)

### Why Random Forest?

A Decision Tree is prone to overfitting, meaning it memorizes training data but performs poorly on unseen data. Random Forest solves this by using an ensemble of multiple Decision Trees, each trained on different random subsets of data (Bagging - Bootstrap Aggregation).

- Reduces overfitting
- Works well with large datasets
- Handles missing values well

### How does Random Forest Work?

- Creates multiple Decision Trees on different random subsets of the data
- Uses majority voting (classification) or averaging (regression) for the final prediction
- Ensures stability and robustness

### Use Case of Random Forest

- Fraud detection
- Stock price prediction
- Customer segmentation

In [11]:
from sklearn.ensemble import RandomForestClassifier

# Train Random Forest Model
rf_model = RandomForestClassifier(n_estimators=10, random_state=42)
rf_model.fit(X_train, y_train)

# Predictions
y_pred_rf = rf_model.predict(X_test)

# Evaluate Model
accuracy_rf = accuracy_score(y_test, y_pred_rf)
print(f"Random Forest Accuracy: {accuracy_rf:.2f}")


Random Forest Accuracy: 1.00


### Key Takeaways

- **Decision Tree** is a simple and interpretable model but prone to overfitting.
- **Random Forest** (Bagging technique) combines multiple Decision Trees to improve accuracy and reduce overfitting.
- **Use Cases:** Loan approval, medical diagnosis, fraud detection, stock price prediction.
- **In Python, use** DecisionTreeClassifier and RandomForestClassifier from sklearn