<div align="center">

---
# Decision Trees - ID3 [Artificial Intelligence Project]
---
</div>

<div align="center">

***
## Problem Presentation
***
</div>
    
> ADD PROBLEM PRESENTATION

<div align="center">

***
## ID3 Algorithm 
***
</div>

A well-known decision tree approach for Machine Learning is the Iterative Dichotomiser 3 (ID3) algorithm. By choosing the best characteristic at each node to partition the data depending on information gain, it recursively constructs a tree. The goal is to make the final subsets as homogeneous as possible. By choosing features that offer the greatest reduction in entropy or uncertainty, ID3 iteratively grows the tree. The procedure keeps going until a halting requirement is satisfied, like a minimum subset size or a maximum tree depth. 

The ID3 Algorithm is specifically designed for building decision trees from a given dataset. It's primary objective is to construct a tree that best explains the relationship between attributes in the data and their corresponding class labels.

**1. Selecting the Best Attribute:**
- ID3 employs the concept of entropy and information gain to determine the attribute that best separates the data. Entropy measures the impurity or randomness in the dataset.
- The algorithm calculates the entropy of each attribute and selects the one that results in the most significant information gain when used for splitting the data.

**2. Creating Tree Nodes:**
- The chosen attribute is used to split the dataset into subsets based on its distinct values.
- For each subset, ID3 recurses to find the next best attribute to further partition the data, forming branches and new nodes accordingly.

**3. Stopping Criteria:**
- The recursion continues until one of the stopping criteria is met, such as when all instances in a branch belong to the same class or when all attributes have been used for splitting.

**4. Handling Missing Values:**
- ID3 can handle missing values to prevent overfitting. While not directly included in ID3, post-processing techniques or variations like C4.5 incorporate pruning to improve the tree's generalization.

<div align="center">

***
## Mathematical Concepts of ID3 Algorithm
***
</div>

### Entropy

**Entropy** is a measure of disorder or uncertainty in a set of data. It is a tool used in ID3 to measure a dataset's disorder  or impurity. By dividing the data into as homogeneous subsets as feasible, the objective is to minimze entropy.

For a set $S$ with classes $\{c_1,\space c_2,\space ...\space,\space c_n \}$, the entropy is calculated as:

$$H(S) = \sum_{i=1}^n \space p_i \space log_2(p_i)$$

Where $p_i$ is the proportion of instances of class $c_i$ in the set.

### Information Gain

Information Gain measures how well a certain quality reduces uncertainty. ID3 splits the data at each stage, choosing the property that maximizes Information Gain. It is computes using the distinction between entropy prior to and following the split.

Information Gain measures the effectiveness of an Attribute $A$ in reducing uncertainty in set $S$

$$IG(A,S) = H(S) - \sum_{v \space \in \space values(A)} \frac{|S_v|}{|S|} \cdot H(S_v))$$

Where, $|S_v|$ is the size of the subset of $S$ for which attribute $A$ has value $v$.

### Gain Ratio (Used more in the C4.5 Algorithm)

Gain Ratio is an improvement on Information Gain that considers the inherent worth of characteristics that have a wide range of possible values. It deals with the bias of Information Gan in favor of characteristics with more pronounced values.

$$ GR(A,S) = \frac{IG(A,S)}{\sum_{v\space\in\space values(A)} \frac{|S_v|}{|S|} \cdot log_2(\frac{|S_v|}{|S|})} $$

<div align="center">

***
## Problem's Resolution Approach
***
</div>

In [266]:
# Importing Dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import (Counter)

np.random.seed(1234)

<div align="center">

***
## Decision Tree - ID3 [Class]
***
</div>

In [267]:
class Node:
    def __init__(self, feature=None, threshold=None, left=None, right=None, *, value=None):
        # Feature and Threshold this node was divided with
        self.feature = feature
        self.threshold = threshold
        
        # Defining the Left and Right children
        self.left = left
        self.right = right

        # Value of a Node -> Determines if it is a Node or not
        self.value = value

    def is_leaf(self):
        # If a Node does not have a Value then it is not a Leaf
        return self.value is not None

In [377]:
class DecisionTree:
    def __init__(self, min_samples_split=2, max_depth=100, n_features=None):
        # Amount of Samples needed to perform a split
        self.min_samples_split = min_samples_split

        # Max depth of the decision tree
        self.max_depth = max_depth

        # Number of features (X) - Helps add some randomness to the Tree
        self.n_features = n_features

        # Defining a root - will later help to traverse the tree
        self.root = None

    def _most_common_label(self, y):
        # Creating a Counter
        counter = Counter(y)

        # Getting the Most Common Value
        value = counter.most_common(1)[0][0]

        # Returns most common value
        return value

    def _entropy(self, y):
        # The Bincount method creates a numpy array with the occurences of each value.
        # The index of the array is the number and it's value in the array corresponds to the amount of times it appears in y
        occurences = np.bincount(y)

        # Calculating every pi for every X in the previous array
        ps = occurences / len(y)

        # Returning the Entropy Value
        return - sum(p * np.log(p) for p in ps if p > 0)

    def _split(self, X_Column, split_threshold):
        # Splitting the Data
        # Note: np.argwhere().flatten() returs the list of indices from the given one where it's elements obey the condition given
        left_indices = np.argwhere(X_Column <= split_threshold).flatten()
        right_indices = np.argwhere(X_Column > split_threshold).flatten()
        return left_indices, right_indices

    def _information_gain(self, y, X_Column, threshold):
        # Getting the Parent Entropy
        parent_entropy = self._entropy(y)

        # Create the Children
        left_indices, right_indices = self._split(X_Column, threshold)

        # Checks if any of the lists are empty
        if (left_indices.size == 0 or right_indices.size == 0):
            return 0

        # -> Calculate the Weighted Average Entropy of the Children

        # Number of Samples in y
        n = len(y)

        # Number of samples in the Left and Right children
        n_left, n_right = left_indices.size, right_indices.size

        # Calculate the Entropy for both Samples (Left and Right)
        entropy_left, entropy_right = self._entropy(y[left_indices]), self._entropy(y[right_indices])

        # Calculate the Child Entropy
        child_entropy = (n_left / n) * entropy_left + (n_right / n) * entropy_right

        # Calculate Information Gain
        information_gain = parent_entropy - child_entropy
        return information_gain

    def _best_split(self, X, y, feature_indices):
        # Finds the Best existent split and threshold (Based on the Information Gain)

        # Initializing the Best Parameters
        best_gain = -1
        split_idx, split_threshold = None, None

        # Traverse all possible actions
        for feat_idx in feature_indices:
            X_Column = X[:, feat_idx]
            thresholds = np.unique(X_Column)

            for threshold in thresholds:
                # Calculate the Information Gain
                gain = self._information_gain(y, X_Column, threshold)

                # Updating the Best Parameters
                if (gain > best_gain):
                    best_gain = gain
                    split_idx = feat_idx
                    split_threshold = threshold

        # Returning the Best Split Criteria Found
        return split_idx, split_threshold

    def _grow_tree(self, X, y, depth=0):
        # Getting the number of samples, features and labels in the data given
        n_samples, n_features = X.shape
        n_labels = np.unique(y).size

        """
        # Stopping Criteria

        (depth >= self.max_depth)             => Reached Maximum dpeth defined
        (n_labels == 1)                       => Current Node only has 1 type of label (which means it's pure)
        (n_samples < self.min_samples_split)  => The amount of samples is not enough to perform a split

        Therefore, we must return a new node (which is going to be a leaf)
        with the current inform
        """

        # Checks the Stopping Criteria
        if (depth >= self.max_depth or n_labels == 1 or n_samples < self.min_samples_split):
            leaf_value = self._most_common_label(y)
            return Node(value=leaf_value)

        # Getting the Indices of the Features
        features_indices = np.random.choice(n_features, self.n_features, replace=False)

        # Find the Best Split 
        best_feature, best_threshold = self._best_split(X, y, features_indices)

        # Create Child Nodes (Also makes a recursive call to continue to grow the tree)
        left_indices, right_indices = self._split(X[:, best_feature], best_threshold)
        left = self._grow_tree(X[left_indices, :], y[left_indices], depth + 1)
        right = self._grow_tree(X[right_indices, :], y[right_indices], depth + 1)
        
        return Node(best_feature, best_threshold, left, right)

    def fit(self, X, y):
        # Making sure that the amount of features does not surpass the ones available
        if not self.n_features:
            self.n_features = X.shape[1]
        else:
            self.n_features = min(X.shape[1], self.n_features)
    
        # Creating a Tree Recursively
        self.root = self._grow_tree(X, y)
            
    def _traverse_tree(self, X, node:Node):
        # Traverses the Tree until we reached a leaf node -> which will determine the classification label
        if (node.is_leaf()):
            return node.value

        if (X[node.feature] <= node.threshold):
            return self._traverse_tree(X, node.left)
        else:
            return self._traverse_tree(X, node.right)

    def predict(self, X):
        # Predicts the Label given an Input
        return np.array([self._traverse_tree(x, self.root) for x in X])

    """ print_tree function not working properly """
    def print_tree(self, node=None, indent=" "):
        if not node:
            node = self.root
        
        if node.value is not None:
            print(node.value)

        else:
            print("X_" + str(node.feature), "<=", node.threshold, "?")
            print("%sleft:" % (indent), end="")
            self.print_tree(node.left, 2*indent)
            print("%sright:" % (indent), end="")
            self.print_tree(node.right, 2*indent)

    def accuracy(self, y_test, y_predicted):
        # Returns the Accuracy of the Model
        return sum(y_test == y_predicted) / len(y_test)

<div align="center">

***
## Dataset [Class]
***
</div>

In [378]:
class Dataset:
    def __init__(self, file_path):
        self.df = pd.read_csv(file_path)
        self.data, self.target, self.label_decoder = self._get_data_target()

    def _label_encoder(self, array):
        # Find Unique Values
        unique_labels = np.unique(array)
        
        # Generate a mapping from label to integer
        label_encoder = {label: idx for idx, label in enumerate(unique_labels)}
        
        # Creating a Label Decoder
        label_decoder = {idx:label for label, idx in label_encoder.items()}
        
        # Map the original array to the integer labels
        encoded_labels = np.array([label_encoder[label] for label in array])
        
        return encoded_labels, label_decoder

    def _get_data_target(self):
        # Defining the Target and Label Columns
        cols = self.df.columns
        X_Cols = cols[0:-1]
        Y_Col = cols[-1]
    
        # Splitting the Dataframe into features and label
        X = self.df[X_Cols].to_numpy()
        y, y_decoder = self._label_encoder(self.df[Y_Col].squeeze().to_numpy())
        
        return X, y, y_decoder

    def _shuffle_data(self):
        # Note: The array[rand] actually calls the special method __getitem__

        # Creating a new order
        rand = np.arange(len(self.data))
        np.random.shuffle(rand)
            
        # Rearranges the data / target arrays
        self.data = self.data[rand]
        self.target = self.target[rand]

    def train_test_split(self, test_size=0.3):
        if test_size > 1 or test_size < 0:
            raise Exception("Invalid Test Size Proprotion (Must be between 0 - 1)")

        # Shuffles the Data
        self._shuffle_data()

        # Defining the training size
        train_size = int((1 - test_size) * self.target.size)
        
        # Splitting the data into training and testing sets
        X_Train, X_Test = self.data[:train_size, :], self.data[train_size :, :]
        y_Train, y_Test = self.target[:train_size], self.target[train_size :]

        # Returning the sets
        return X_Train, X_Test, y_Train, y_Test

    def K_Fold_CV(self, total_folds=3, model=DecisionTree, *args, **kwargs):
        # Performs a K-Fold Cross Validation

        # Length of the Data
        n = self.target.size

        # Number of folds to perform
        k = total_folds

        # nfold -> size / length of each subset / fold
        nfold = n // k
    
        # List to store all the calculated accuracies
        accuracies = []

        # Getting the indices for the data (will have as many as the length of the dataset)
        indices = np.arange(n)
        np.random.shuffle(indices)

        for i in range(k):
            # Getting the test / train indices of the current fold
            test_indices = indices[i*nfold : (i+1)*nfold]
            train_indices = np.concatenate([indices[: i * nfold], indices[(i + 1) * nfold:]])

            # Splitting the data for each new fold
            X_Train, y_Train = self.data[train_indices], self.target[train_indices]
            X_Test, y_Test = self.data[test_indices], self.target[test_indices]

            # Trainning and Evaluating the Model for each new fold
            new_model = model(*args, **kwargs) 
            new_model.fit(X_Train, y_Train)
            predictions = new_model.predict(X_Test)
            accuracies.append(new_model.accuracy(y_Test, predictions))

        # Returning the average accuracy obtained
        return np.mean(accuracies)

<div align="center">

***
## Model Evaluation with the Datasets
***
</div>

### Restaurant Dataset

In [379]:
restaurant = Dataset(file_path='./Datasets/restaurant.csv')
print(restaurant.df.shape)
restaurant.df.head(5)

(12, 12)


Unnamed: 0,ID,Alt,Bar,Fri,Hun,Pat,Price,Rain,Res,Type,Est,Class
0,X1,Yes,No,No,Yes,Some,$$$,No,Yes,French,0-10,Yes
1,X2,Yes,No,No,Yes,Full,$,No,No,Thai,30-60,No
2,X3,No,Yes,No,No,Some,$,No,No,Burger,0-10,Yes
3,X4,Yes,No,Yes,Yes,Full,$,No,No,Thai,10-30,Yes
4,X5,Yes,No,Yes,No,Full,$$$,No,Yes,French,>60,No


In [380]:
# restaurant.data
# restaurant.target

In [381]:
(X_Train, X_Test, Y_Train, Y_Test) = restaurant.train_test_split(test_size=0.3)

In [382]:
restaurant.K_Fold_CV(3, DecisionTree)

0.5833333333333334

In [383]:
dt = DecisionTree()
dt.fit(X_Train, Y_Train)
predictions = dt.predict(X_Test)

acc = dt.accuracy(Y_Test, predictions)
print(f"Accuracy = {acc}")

Accuracy = 0.5


In [384]:
dt.print_tree()

X_4 <= No ?
 left:X_5 <= None ?
  left:0
  right:1
 right:1


---

<div align="center">

***
## Just for Guidance [REMOVE LATER]
***
</div>

In [186]:
from sklearn import (datasets)
from sklearn.model_selection import (train_test_split)

data = datasets.load_iris()
X, Y = data.data, data.target

X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, Y, test_size=0.2, random_state=1234)

dt = DecisionTree()
dt.fit(X_Train, Y_Train)
predictions = dt.predict(X_Test)

def accuracy(y_test, y_pred):
    return sum(y_test == y_pred) / len(y_test)

acc = accuracy(Y_Test, predictions)
print(f"Accuracy = {acc}")

Accuracy = 1.0


In [14]:
print(data['data'].size)
data

600


{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [15]:
X_Train

array([[5.1, 2.5, 3. , 1.1],
       [6.2, 2.8, 4.8, 1.8],
       [5. , 3.5, 1.3, 0.3],
       [6.3, 2.8, 5.1, 1.5],
       [6.7, 3. , 5. , 1.7],
       [4.8, 3.4, 1.9, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [5.4, 3.4, 1.7, 0.2],
       [4.6, 3.6, 1. , 0.2],
       [5. , 2.3, 3.3, 1. ],
       [5.5, 3.5, 1.3, 0.2],
       [6.2, 2.2, 4.5, 1.5],
       [5.2, 4.1, 1.5, 0.1],
       [6.9, 3.1, 5.1, 2.3],
       [7.2, 3.2, 6. , 1.8],
       [4.9, 3.1, 1.5, 0.1],
       [5.8, 2.8, 5.1, 2.4],
       [6.7, 3. , 5.2, 2.3],
       [7.7, 3. , 6.1, 2.3],
       [6.7, 3.1, 5.6, 2.4],
       [4.9, 3. , 1.4, 0.2],
       [6.5, 3. , 5.2, 2. ],
       [7.6, 3. , 6.6, 2.1],
       [6.2, 2.9, 4.3, 1.3],
       [4.9, 2.4, 3.3, 1. ],
       [5.6, 2.9, 3.6, 1.3],
       [5.6, 3. , 4.5, 1.5],
       [6.9, 3.1, 4.9, 1.5],
       [6.6, 2.9, 4.6, 1.3],
       [5.1, 3.5, 1.4, 0.3],
       [5.1, 3.4, 1.5, 0.2],
       [7.4, 2.8, 6.1, 1.9],
       [5.7, 2.5, 5. , 2. ],
       [6.5, 3.2, 5.1, 2. ],
       [5.1, 3

In [16]:
data = datasets.load_wine()
print(data['data'].size)
data

2314


{'data': array([[1.423e+01, 1.710e+00, 2.430e+00, ..., 1.040e+00, 3.920e+00,
         1.065e+03],
        [1.320e+01, 1.780e+00, 2.140e+00, ..., 1.050e+00, 3.400e+00,
         1.050e+03],
        [1.316e+01, 2.360e+00, 2.670e+00, ..., 1.030e+00, 3.170e+00,
         1.185e+03],
        ...,
        [1.327e+01, 4.280e+00, 2.260e+00, ..., 5.900e-01, 1.560e+00,
         8.350e+02],
        [1.317e+01, 2.590e+00, 2.370e+00, ..., 6.000e-01, 1.620e+00,
         8.400e+02],
        [1.413e+01, 4.100e+00, 2.740e+00, ..., 6.100e-01, 1.600e+00,
         5.600e+02]]),
 'target': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

<div align="center">

***
## Advantages and Disadvantages of ID3
***
</div>

### **Advantages**

- **Interpretability**: Decision Trees generated by ID3 are **easily interpretable**, making them usefull for explaining decisions to non-technical stakeholders
- **Handles Categorical Data**: ID3 can effectively **handle categorical attributes** without explicit data preprocessing steps
- **Not Computationally Expensive**: The Algorithm is relatively straightforward and **computationally less expensive** compared to some complex models

### **Disadvantages**

- **Overfitting**: ID3 tends to create complex trees that may **overfit over the training data**, impacting its performance upon new unseen information
- **Sensitive to Noise**: Noise or outliers in the data can lead to the **creation of non-optimal or incorrect splits**
- **Exclusive to Binary Trees**: ID3 only constructs **binary trees** which **limits** its ability to **express more complex relationships** within the data


<div align="center">

***
## Conclusion
***
</div>

The **ID3 Algorithm** laid the groundwork for **decision tree learning**, providing a robust framework for understanding **attribute selection** and **recursive partitioning**. Despite its limitations, ID3's simplicity and interpretability have paved the way for more sophisticated algorithms that address its drawbacks while retaining its essence.

As **Machine Learning** continues to evolve, the ID3 Algorithm remains a **crucial piece** in the mosaic of tree-based methods, serving as a stepping stone for developing **more advanced and accurate models** in the quest for **efficient data analysis and pattern recognition**.


<div align="center">

***
## Bibliographic References
***
</div>

1. Geeks For Geeks (2023). *Decision Tree Algorithms*. Available [here](https://www.geeksforgeeks.org/decision-tree-algorithms/#id3-iterative-dichotomiser-3)
2. Geeks For Geeks (2024). *Iteratice Dichotomiser 3 (ID3) Algorithm From Scratch*. Available [here](https://www.geeksforgeeks.org/iterative-dichotomiser-3-id3-algorithm-from-scratch/)

___
## Final Considerations

$\quad$ If there is any difficulty on downloading or executing this project, please contact us via:

- **Email**:
    - [Gonçalo Esteves](https://github.com/EstevesX10) &#8594; `up202203947@up.pt`
    - [Maximino Canhola](https://github.com/MaximinoCanhola) &#8594; `up201909805@up.pt`
    - [Nuno Gomes](https://github.com/NightF0x26) &#8594; `up202206195@up.pt`