# Given

A sample of categorical data:

|   Ear Shape | Face Shape | Whiskers |   Cat (1-yes, 0-no)  |
|:---------:|:-----------:|:---------:|:------:|
|   Pointy   |   Round     |  Present  |    1   |
|   Floppy   |  Not Round  |  Present  |    1   |
|   Floppy   |  Round      |  Absent   |    0   |
|   Pointy   |  Not Round  |  Present  |    0   |


# Find 

one hot encode the features and build the decision tree:
- find the feature to split (entropy)
    - define impurity
    - calculate information gain
- split recursively
- stop splitting
    - when node is 100% purity
    - maximum depth of tree exceeds the defined level
    - improvements in information gain is too small
    - number of examples in the node is lower than predefined threshold

# Solution

In [54]:
import numpy as np

In [55]:
x = np.array([[1,1,1],[1,0,1],[1,0,0],[1,0,0],[1,1,1],[0,1,1],[0,0,0],[1,0,1],[0,1,0],[1,0,0]])
y = np.array([1,1,0,0,1,0,0,1,1,0])

x.shape, y.shape

((10, 3), (10,))

Entropy 

* in general for multi-class problems:
$$H(X) = -\sum_{i=1}^{n} p(x_i) \log_{2}(p(x_i))$$

* in binary classification simplifies to:
$$H(p_1) = -p_1 \text{log}_2(p_1) - (1- p_1) \text{log}_2(1- p_1)$$
where $p_1$ is the quantity of dogs (1s) in the dataset, and $p_0 = 1-p_1$ is the probability of cats

In [57]:
def calculate_entropy(branch):
    # In - inner branch with values inside = [0, 1, 1, 0, 1]
    # Out - 
    #   entropy = dirtiness of a dataset. 0 - ideal, 1 - bad
    #   p - weight

    p = np.sum(branch) / len(branch)
    
    if p==1 or p==0:
        entropy = 0
    else:
        entropy = - p * np.log2(p) - (1-p) * np.log2(1-p)
    
    return entropy, p

Information gain

* in mgeneral for multi-class:

    $$\text{Information Gain}(S, A) = \text{Entropy}(S) - \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} \cdot \text{Entropy}(S_v)$$

    , where 
    * $S$ = dataset, 
    * $A$ = attribute, 
    * $|S|$ - total number of instances in the dataset $S$,
    * $|S_v|$ - subset of instances where attribute $A$ has value $v$
    * $\text{Values}(A)$ - possible values of attribute A 
    
<p>

* in binary classiciation simplifies to:

$$\text{Information Gain} = H(p_{root})- \left(w_{\text{left}}\cdot H\left(p_1^\text{left}\right) + w_{\text{right}}\cdot H\left(p_1^\text{right}\right)\right),$$

In [71]:
def calculate_ig(root, left_branch, right_branch):

    H_root, _ = calculate_entropy(root)

    H_left, W_left = calculate_entropy(left_branch)
    H_right, W_right = calculate_entropy(right_branch)

    IG = H_root - (W_left * H_left + W_right * H_right)

    return IG

In [144]:
def split_feature(x, feature_to_split_on):
    # input - x - dataset
    #   feature = 2 (ex)
    # output - x_left, x_right, y_left, y_right

    y_left_idx = np.where(x[:,feature_to_split_on]==0)
    y_right_idx = np.where(x[:,feature_to_split_on]==1)

    x_left = x[y_left_idx]
    y_left = y[y_left_idx]

    x_right = x[y_right_idx]
    y_right = y[y_right_idx]

    return x_left, x_right, y_left, y_right

In [149]:
def build_tree(x, y, tree = []):

    # check exit criteria

    if x.shape[1] == len(tree):
        print(f"reached bottom, tree = {tree}. branching up")
        return tree

    # calculate IG for each feature

    ig = []

    for feature in range(x.shape[1]):
        if feature not in tree:

            left_branch = y[x[:,feature]==0]
            right_branch = y[x[:,feature]==1]
            ig.append(calculate_ig(y, left_branch, right_branch))
        
        else:
            ig.append(0)

    # pick the feature with max IG

    max_ig_feature = np.argmax(ig)
    print(f"{ig} -> {max_ig_feature}")

    tree.append(max_ig_feature)

    # split feature

    x_left, x_right, y_left, y_right = split_feature(x, max_ig_feature)
    print(f"xy_left {np.c_[x_left, y_left]}, \n xy_right {np.c_[x_right, y_right]}")

    # adjut the root and repeeat

    print(f"branching left")
    build_tree(x_left, y_left, tree)
    print(f"branching right")
    build_tree(x_right, y_right, tree)

    return tree

In [150]:
build_tree(x, y)

[0.13091388234321688, 0.08544279530415388, 0.2780719051126377] -> 2
xy_left [[1 0 0 0]
 [1 0 0 0]
 [0 0 0 0]
 [0 1 0 1]
 [1 0 0 0]], 
 xy_right [[1 1 1 1]
 [1 0 1 1]
 [1 1 1 1]
 [0 1 1 0]
 [1 0 1 1]]
branching left
[0.22192809488736231, 0.7219280948873623, 0] -> 1
xy_left [[1 0 0 1]
 [1 0 0 1]
 [0 0 0 0]
 [1 0 0 1]], 
 xy_right [[0 1 0 0]]
branching left
[0.8112781244591328, 0, 0] -> 0
xy_left [[0 0 0 0]], 
 xy_right [[1 0 0 1]
 [1 0 0 1]
 [1 0 0 0]]
branching left
reached bottom, tree = [2, 1, 0]
branching right
reached bottom, tree = [2, 1, 0]
branching right
reached bottom, tree = [2, 1, 0]
branching right
reached bottom, tree = [2, 1, 0]


[2, 1, 0]

In [16]:
0.0001*np.log(0.0001) - (1-0.0001)*np.log(1 - 0.0001)

-0.0008210390373643042