**Aim:** To construct the Decision tree using the training data sets under supervised learning concept.

**Program:** Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample.

In [3]:
import csv
from math import log2

def get_data(file):
    with open(file) as csv_file:
        csv_reader = csv.reader(csv_file)
        data = list(csv_reader)
        for line in data:
            print(line)
    return data[1:], data[0]

def entropy(data):
    outcomes = [row[-1] for row in data]
    probs = [outcomes.count(value) / len(outcomes) for value in set(outcomes)]
    return -sum(p * log2(p) for p in probs)

def split_data(data, attr):
    values = set(row[attr] for row in data)
    return [[row for row in data if row[attr] == value] for value in values]

def best_attribute(data):
    base_entropy = entropy(data)
    gain = [(base_entropy - sum((len(subset) / len(data)) * entropy(subset) for subset in split_data(data, attr)), attr) 
            for attr in range(len(data[0]) - 1)]
    return max(gain)[1]

def decision_tree(data, labels):
    outcomes = [row[-1] for row in data]
    if outcomes.count(outcomes[0]) == len(outcomes):
        return outcomes[0]
    
    attr = best_attribute(data)
    tree = {labels[attr]: {}}

    for value in set(row[attr] for row in data):
        sub_labels = labels[:attr] + labels[attr+1:]
        sub_data = [row[:attr] + row[attr+1:] for row in data if row[attr] == value]

        tree[labels[attr]][value] = decision_tree(sub_data, sub_labels)
        
    return tree

data, labels = get_data("id3.csv")
tree = decision_tree(data, labels)
print("\nDecision Tree:", tree)

['Outlook', 'Temperature', 'Humidity', 'Wind', 'PlayTennis']
['sunny', 'hot', 'high', 'weak', 'no']
['sunny', 'hot', 'high', 'strong', 'no']
['overcast', 'hot', 'high', 'weak', 'yes']
['rain', 'mild', 'high', 'weak', 'yes']
['rain', 'cool', 'normal', 'weak', 'yes']
['rain', 'cool', 'normal', 'strong', 'no']
['overcast', 'cool', 'normal', 'strong', 'yes']
['sunny', 'mild', 'high', 'weak', 'no']
['sunny', 'cool', 'normal', 'weak', 'yes']
['rain', 'mild', 'normal', 'weak', 'yes']
['sunny', 'mild', 'normal', 'strong', 'yes']
['overcast', 'mild', 'high', 'strong', 'yes']
['overcast', 'hot', 'normal', 'weak', 'yes']
['rain', 'mild', 'high', 'strong', 'no']

Decision Tree: {'Outlook': {'rain': {'Wind': {'weak': 'yes', 'strong': 'no'}}, 'overcast': 'yes', 'sunny': {'Humidity': {'normal': 'yes', 'high': 'no'}}}}


### 1. What is the ID3 algorithm?

The ID3 algorithm is a decision tree learning algorithm that uses entropy and information gain to select the best attribute for splitting the data at each node.

### 2. How is entropy calculated in this code?

Entropy is calculated by summing up `-p * log2(p)` for each probability `p` of the unique outcomes in the dataset, representing the uncertainty in the data.

### 3. What does information gain represent in decision tree algorithms?

Information gain measures the reduction in entropy or uncertainty when a dataset is split on an attribute. It helps in choosing the attribute that best separates the data.

### 4. Why does the `decision_tree` function check if all outcomes are the same before splitting?

If all outcomes are the same, it means that the data is pure, and there's no need for further splitting, allowing the function to return a leaf node with that outcome.

### 5. What happens when the dataset is split on an attribute?

When the dataset is split on an attribute, it is divided into subsets, each containing rows where the attribute has the same value. The decision tree then branches accordingly.

### 6. Why do we remove an attribute from the label list after splitting the data on it?

We remove the attribute to avoid re-splitting the data on the same attribute, ensuring that the decision tree progresses and does not revisit the same decision point.

### 7. What are the limitations of the ID3 algorithm?

The ID3 algorithm tends to favor attributes with more levels (values), which may lead to overfitting. It also assumes that the data is categorical and does not handle continuous attributes without modification.