Osnabrück University - Machine Learning (Summer Term 2016) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack

# Exercise Sheet 02: Decision Trees

Please upload your solution ipynb file to StudIP in your corresponding group folder before you have your `Testat`.

## Assignment 1: Decision Trees [2p]
Draw/build the decision trees for the following boolean functions. Either use pen & paper or employ your ASCII artist within in the jupyter notebook.

Note: $\oplus := xor$

### a) $\neg A \cap B$

### b) $A \oplus B$ 

### c) $A \vee (B \wedge C) \vee (\neg C \wedge D)$

### d) $(A \rightarrow (B \wedge \neg C)) \vee (A \wedge B)$

## Assignment 2: Entropy and Information Gain [5p]

Attributes and their possible values:

  * $raining = \{yes, no\}$
  * $tired = \{yes, no\}$
  * $late = \{yes, no\}$
  * $distance = \{short, medium, long\}$

Training examples:

| #  | raining | tired | late | distance | attend_party |
|----|---------|-------|------|----------|--------------|
| 1  | yes | no  | no  | short  | **yes** |
| 2  | yes | no  | yes | medium | **no**  |
| 3  | no  | yes | no  | long   | **no**  |
| 4  | yes | yes | yes | short  | **no**  |
| 5  | yes | no  | no  | short  | **yes** |
| 6  | no  | no  | no  | medium | **yes** |
| 7  | no  | yes | no  | long   | **no**  |
| 8  | yes | no  | yes | short  | **no**  |
| 9  | yes | yes | no  | short  | **yes** |
| 10 | no  | yes | no  | medium | **no**  |
| 11 | no  | yes | no  | long   | **no**  |
| 12 | no  | yes | yes | short  | **no**  |

### a) [3p]
Build the root node of a decision tree from the training examples given in the table above by calculate the information gain on all four attributes (raining, tired, late, distance).

$$Gain(S,A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|}Entropy(S_v)$$
$$Entropy(S) = -p_{\oplus} log_{2} p_{\oplus} - p_{\ominus} log_{2} p_{\ominus}$$



$S_v$ is the subset of $S$ for which attribute A has value v. Example for attribute *tired*:
$S_{yes} \leftarrow [1+, 6−], |S_{yes}| = 7 \quad \quad \quad S_{no} \leftarrow [3+, 2−], |S_{no}| = 5$

### b) [2p]
Perform the same calculation as in **a)** but use the gain ratio instead of the information gain. Does the result for the root node change?

$$GainRatio(S,A) = \frac{Gain(S,A)}{SplitInformation(S,A)}$$
$$SplitInformation(S,A) = - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} log_{2} \frac{S_{v}}{S}$$

### c)
Let’s assume the root node is a node which checks the value of the attribute has *distance*. Calculate the next level of the decision tree using the information gain.

## Assignment 3
Implement the following two functions in Python. Take a look at the `assert`s to see how the function should behave. An assert is a condition that your function is required to pass. Most of the conditions here are taken from the lecture slides (ML-03, Slide 12 & 13). Don't worry if you do not get all asserts to pass, just comment them out.

### a) 
$$Entropy(S) = - \sum_{i=1...c} p_i log_2 p_i$$

In [None]:
from math import log
def entropy(S):
    """
    Calculate the entropy for a given target value set. 
    S: List of target classes for specific observations.
    """
    p_i = [len([value for value in S if value == c])/len(S) for c in set(S)]
    return - sum(p * log(p, 2) for p in p_i) 


# See ML-03, Slide 12 & 13
assert entropy([1,1,1,0,0,0]) == 1.0
assert round(entropy([1,1,1,1,0,0,0]), 3) == 0.985
assert round(entropy([1,1,1,1,1,1,0]), 3) == 0.592
assert round(entropy([1,1,1,1,1,1,0,0]), 3) == 0.811
assert round(entropy([2,2,1,1,0,0]), 3) == 1.585
assert round(entropy([2,2,2,1,0]), 3) == 1.371
assert round(entropy([2,2,2,0,0]), 3) == 0.971

### b) 
$$Gain(S,A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} Entropy(S_v)$$

In [None]:
def gain(S, A):
    """
    Calculates the expected reduction in entropy due to sorting on A.
    S: Target classes for observations in A.
    A: Observations.
    """
    sigma = 0
    for v in set(A): # sets only contain unique values
        S_v = [S[key] for (key, v_) in enumerate(A) if v_ == v]
        sigma = sigma + ((len(S_v) / len(S)) * entropy(S_v))
    return entropy(S) - sigma


# See ML-03, Slide 12 & 13
assert_S_ = [0,0,1,1,1,0,1,0,1,1,1,1,1,0]
assert round(gain(assert_S_, [1,1,1,1,0,0,0,1,0,0,0,1,0,1]), 3) == .152
assert round(gain(assert_S_, [0,1,0,0,0,1,1,0,0,0,1,1,0,1]), 3) == .048

In [None]:
# Just for future reference.
from sklearn.datasets import load_iris

iris = load_iris()
assert round(gain(iris.target, [r[2] for r in iris.data]), 3) == 1.446
