# ID3 Algorithm:-
The **ID3 (Iterative Dichotomiser 3)** algorithm is a decision tree-based machine learning algorithm used for **classification tasks**. It was developed by Ross Quinlan in 1986 and is one of the foundational algorithms for constructing decision trees. The decision tree built by ID3 is used to classify data into distinct categories based on attribute values.

### Key Concepts of ID3
- ID3 constructs a decision tree by employing a **top-down, greedy search** through the given data.
- It uses a mathematical concept called **Information Gain** (based on Entropy) to determine the best attribute to split the data at each step.
- ID3 is primarily used for **categorical data** and doesn't directly handle numerical or continuous features.

---

## Why is ID3 Used?

- **Simplicity**: ID3 is simple to understand and implement, making it suitable for beginners.
- **Transparency**: The resulting decision tree is interpretable, allowing users to see the reasoning behind predictions.
- **Effectiveness for Small Data**: It works well for small to medium-sized datasets.

---

## How Does the ID3 Algorithm Work?

1. **Start with the Root Node**:
   - The entire dataset is used to calculate the **Entropy** (a measure of disorder) and **Information Gain** for all attributes.
   - The attribute with the highest Information Gain is chosen as the **splitting criterion** for the root node.

2. **Recursive Splitting**:
   - The dataset is split into subsets based on the selected attribute's values.
   - The process repeats for each subset, calculating the Entropy and Information Gain for remaining attributes.

3. **Stopping Condition**:
   - The recursion stops when one of the following is true:
     - All examples in a subset belong to a single class.
     - No attributes are left to split further.
     - The dataset is empty.

4. **Assign Class Labels**:
   - Leaf nodes are labeled with the most frequent class in their subset.

---

## Mathematics Behind ID3
### Entropy:
Entropy is a measure of uncertainty or randomness in the data. For a binary classification problem, the formula for entropy is:

$$
\text{Entropy}(S) = -p_+ \cdot \log_2(p_+) - p_- \cdot \log_2(p_-)
$$

where \( p_+ \) and \( p_- \) are the proportions of positive and negative examples in the dataset \( S \).

### Information Gain:
Information Gain measures the reduction in entropy after splitting on an attribute. It is calculated as:

$$
\text{Gain}(A) = \text{Entropy}(S) - \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} \cdot \text{Entropy}(S_v)
$$

Here, \( S_v \) is the subset of \( S \) for which attribute \( A \) has value \( v \).

### Choosing the Attribute:
The attribute with the highest Information Gain is selected for splitting.

---

## Advantages of ID3
- Easy to implement and interpret.
- Handles categorical data well.
- Builds a simple and interpretable model.

---

## Disadvantages of ID3
- Prone to **overfitting** if the tree becomes too deep.
- Doesn't handle continuous or numerical attributes directly (requires preprocessing like discretization).
- Cannot deal with missing values effectively.
- Sensitive to noise in the data.

---

## Real-World Applications
1. **Email Spam Detection**:
   - Classifying emails as spam or not based on keywords, sender, etc.
2. **Medical Diagnosis**:
   - Predicting diseases based on patient symptoms.
3. **Customer Segmentation**:
   - Categorizing customers based on buying behavior and demographics.
4. **Loan Approval**:
   - Classifying loan applications into approved or rejected based on applicant attributes.

---

The ID3 algorithm is widely used for creating decision trees for classification tasks. It’s simple to implement, interpretable, and useful for problems with categorical data. However, it has limitations, such as overfitting and difficulty handling continuous data, which newer algorithms like C4.5 and CART address.



In [24]:
# importing required libraries

import math
from collections import Counter


In [25]:

# Step 1: Calculate Entropy

def entropy(data):
    total = len(data)
    counts = Counter([label for _, label in data])
    return -sum((count / total) * math.log2(count / total) for count in counts.values())

In [26]:
# Step 2: Calculate Information Gain
def information_gain(data, feature_index):
    total_entropy = entropy(data)
    total = len(data)
    
    # Group the data by the feature values
    values = set([record[0][feature_index] for record in data])
    
    # Calculate weighted entropy
    weighted_entropy = 0
    for value in values:
        subset = [record for record in data if record[0][feature_index] == value]
        weighted_entropy += (len(subset) / total) * entropy(subset)
    
    return total_entropy - weighted_entropy


In [27]:
# Step 3: Choose Best Feature to Split
def best_feature(data, features):
    gains = [information_gain(data, feature) for feature in range(len(features))]
    return gains.index(max(gains))


In [28]:
# Step 4: Build the Decision Tree
def build_tree(data, features):
    # Base case 1: If all labels are the same
    labels = [label for _, label in data]
    if len(set(labels)) == 1:
        return labels[0]
    
    # Base case 2: If no more features to split on
    if not features:
        return Counter(labels).most_common(1)[0][0]
    
    # Choose the best feature to split on
    best = best_feature(data, features)
    tree = {features[best]: {}}
    
    # Split the data based on the best feature
    values = set([record[0][best] for record in data])
    for value in values:
        subset = [record for record in data if record[0][best] == value]
        subtree = build_tree(subset, [f for i, f in enumerate(features) if i != best])
        tree[features[best]][value] = subtree
    
    return tree


In [29]:
# Step 5: Predict with the Decision Tree
def predict(tree, record, features):
    if not isinstance(tree, dict):
        return tree
    
    feature = list(tree.keys())[0]
    value = record[features.index(feature)]
    
    # Check if the value is in the tree; if not, return the most common label
    if value not in tree[feature]:
        # Return the majority class (the most common label) in the data
        labels = [label for _, label in data]
        return Counter(labels).most_common(1)[0][0]
    
    return predict(tree[feature].get(value), record, features)


In [30]:
# Example Usage
# Define the data (each record is a tuple of (features, label))
data = [
    (['Sunny', 'Hot', 'High', 'Weak'], 'No'),
    (['Sunny', 'Hot', 'High', 'Strong'], 'No'),
    (['Overcast', 'Hot', 'High', 'Weak'], 'Yes'),
    (['Rain', 'Mild', 'High', 'Weak'], 'Yes'),
    (['Rain', 'Cool', 'Normal', 'Weak'], 'Yes'),
    (['Rain', 'Cool', 'Normal', 'Strong'], 'No'),
    (['Overcast', 'Cool', 'Normal', 'Strong'], 'Yes'),
    (['Sunny', 'Mild', 'High', 'Weak'], 'No'),
    (['Sunny', 'Cool', 'Normal', 'Weak'], 'Yes'),
    (['Rain', 'Mild', 'Normal', 'Weak'], 'Yes'),
    (['Sunny', 'Mild', 'Normal', 'Strong'], 'Yes'),
    (['Overcast', 'Mild', 'High', 'Strong'], 'Yes'),
    (['Overcast', 'Hot', 'Normal', 'Weak'], 'Yes'),
    (['Rain', 'Mild', 'High', 'Strong'], 'No')
]

features = ['Outlook', 'Temperature', 'Humidity', 'Wind']


In [31]:
# Step 6: Train the Decision Tree
tree = build_tree(data, features)
# Step 7: Make Predictions
test_record = ['Sunny', 'Cool', 'High', 'Strong']
prediction = predict(tree, test_record, features)

# Output the result
print("Prediction for", test_record, ":", prediction)


Prediction for ['Sunny', 'Cool', 'High', 'Strong'] : Yes
