# Decision Tree

![](https://drive.google.com/uc?export=view&id=1Q83l1mdJ2xYbGUYicfh8226L2HoO6pBo)


## Summary

- A Decision Tree is used to build a classification or a regression models in the form of a tree structure
- It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.
- The final result is a tree with decision nodes and leaf nodes
- The decision nodes are attributes
- Branches refer to discrete values (one or more) or intervals for these attributes
- Leaves are labeled with classes.
- For each leaf, a support and a confidence may be computed:
  - Support is the proportion of examples matching the path from root to that leaf
  - Confidence is the classification accuracy for examples matching that path.
- Each path from the root to a leaf can be used as a class association rule
- When passing from decision trees to rules, each rule has the same support and confidence as the leaf from where it comes.
- Any example match a single path of the tree (so a single leaf or class).

## Pseudocode

```
ID3 (Examples, Target_Attribute, Attributes)
    Create a root node for the tree
    If all member of Examples are in the same class C
        Then Return the single-node tree Root with label = C
    Else If Attributes is empty
        Then Return the single-node tree with label = most common value of Target_Attribute in Examples
    Else
        A ← The Attribute that best classifies examples. #here we have to choose between ID3 and CART
        Decision Tree attribute for Root = A.
        For each possible value v of A,
            Add a new tree branch below Root, corresponding to the test A = v.
            Let Examples(v) be the subset of examples that have the value v for A
            If Examples(v) is empty
                Then below this new branch add a leaf node with label = most common target value in the examples
            Else below this new branch add the subtree ID3 (Examples(v), Target_Attribute, Attributes – {A})
    EndIf
    Return Root
```

## Metrics for Splitting Attributes

Input:

*   set of k-classes: $C=\{c_1, c_2, \dots, c_k\}$
*   labeled dataset: $D=\{(x_1,c_{1i}), (x_2,c_{2i}), \dots, (x_n,c_{ni}) \}$
*   each observation contains m features/attributes: $\{A_1, A_2, \dots, A_m\}$


**ID3**

Entropy of a dataset:
*   $Entropy(D) = -\sum\limits_{i=1}^{k} p(c_i)*log_{2} p(c_i)$

Entropy of a split:
- suppose we split by attribute $A_t$ which has **r** distinct values, then it will partition D **r** disjoint subsets: $D = \cup_{i=1}^r D_i$
- $Entropy(D|A_t) = \sum\limits_{i=1}^{r}\frac{|D_i|}{|D|}*Entropy(D_i)$

Information Gain:
- $InformationGain(D,A_t) = Entropy(D) - Entropy(D|A_t)$

Obs:
- Minimum value of Entropy will be **0** when all observations belong to one class.
- Maximum value of Entropy will be **1** when all target values are equally distributed.

**CART**

Gini Impurity of a dataset:
- $Gini(D) = 1 - \sum\limits_{i=1}^{k}\left(\frac{p(c_i)}{|D|}\right)^2$
<br><br>

Gini Impurity of a split:
- $Gini(D|A_t) = \sum\limits_{i=1}^{r}\frac{|D_i|}{|D|}*Gini(D_i)$

Gini Gain:
- $GiniGain(D,A_t) = Gini(D) - Gini(D|A_t)$

Obs:
- Minimum value of Gini Impurity will be **0** when all observations belong to one class.
- Maximum value of Gini Impurity will be **1** when all target values are equally distributed.

# Dataset

##Download Dataset

Download **Play Tennis** dataset. This toy dataset comes with four attributes and the target is to predict based on these attributes if one should play tennis or not.

**Attributes:**

- outlook $\in$ [Overcast, Rain, Sunny]
- temp $\in$ [Cool, Hot, Mild]
- humidity $\in$ [Normal, High]
- wind $\in$ [Weak, Strong]

**Classes:**
- Yes
- No


In [None]:
!wget -O play_tennis.csv "https://drive.google.com/uc?export=download&id=1NT1iJNj3HrPNtiLCb-myrY0XHaJ8_jtf"

--2019-10-25 10:12:13--  https://drive.google.com/uc?export=download&id=1NT1iJNj3HrPNtiLCb-myrY0XHaJ8_jtf
Resolving drive.google.com (drive.google.com)... 172.217.204.102, 172.217.204.138, 172.217.204.100, ...
Connecting to drive.google.com (drive.google.com)|172.217.204.102|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-00-6g-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/vni5uoqnk587vdha6gdko23nc3ihirb6/1571997600000/10258155222664253429/*/1NT1iJNj3HrPNtiLCb-myrY0XHaJ8_jtf?e=download [following]
--2019-10-25 10:12:14--  https://doc-00-6g-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/vni5uoqnk587vdha6gdko23nc3ihirb6/1571997600000/10258155222664253429/*/1NT1iJNj3HrPNtiLCb-myrY0XHaJ8_jtf?e=download
Resolving doc-00-6g-docs.googleusercontent.com (doc-00-6g-docs.googleusercontent.com)... 173.194.217.132, 2607:f8b0:400c:c13::84
Connecting to doc-00-6g-docs.googleusercontent.com (d

## Process Dataset

In [None]:
from IPython.display import display, HTML
import pandas as pd

def train_test_split(df, train_percent=.85):
  no_samples, _ = df.shape
  no_train_samples = int(0.85 * no_samples)
  no_test_samples = no_samples - no_train_samples
  train_df = tennis_df.iloc[:no_train_samples, :]
  test_df = tennis_df.iloc[-no_test_samples:, :]

  X_train = train_df.iloc[:, :-1]
  y_train = train_df.iloc[:, -1]
  X_test = test_df.iloc[:, :-1]
  y_test = test_df.iloc[:, -1]
  return X_train, X_test, y_train, y_test



tennis_df = pd.read_csv("play_tennis.csv", header=0)  

#this column is dropped because it bring no meaningful information (it's only a day id)
del tennis_df["day"]

print("Dataset before split")
display(HTML(tennis_df.to_html()))
print("\n\n\n")

X_train, X_test, y_train, y_test = train_test_split(tennis_df)

print("Dataset used for training")
display(HTML(X_train.to_html()))
print("\n\n\n")

print("Dataset used for testing")
display(HTML(X_test.to_html()))

# Exercises

## E1. ID3 and CART Algorithm

ID3 and CART implement the same algorithm presented in the pseudocode. The only difference is the decision where to split. ID3 uses information gain and CART uses gini impurity.
ID3 is already implemented.

*   **Task 1**: implement predict function
The output should be:
```
['Sunny' 'Hot' 'High' 'Strong'] -> No
['Rain' 'Mild' 'High' 'Weak'] -> Yes
['Overcast' 'Hot' 'High' 'Strong'] -> Yes
```
*   **Task 2**: implement gini impurity

Find all TODOs

In [None]:
#tree structure (already done)
class Tree:
    def __init__(self, name, data = None):
        self.name = name
        self.data = data
        self.children = []

    def setName(self, name):
        self.name = name

    def getName(self):
        return self.name
    
    def setData(self, data):
        self.data = data

    def getData(self):
        return self.data

    def addChild(self, child, data):
        self.children.append({"split_node": child, "split_data": data})

    def printTree(self, level=0):
        level += 1
        if self.children != []:
            for child in self.children:
                p = '|'+ ('-'*level*2) + " " + self.name + " " + str(self.data) + " == " + str(child["split_data"])
                print(p)              
                child["split_node"].printTree(level)
        else:
            p = '|'+ ('-'*level*2) + "----> " + self.name
            print(p)

In [None]:
import numpy as np
from statistics import mode 

class DecisionTreeClassifier:
  def __init__(self, criterion='entropy'):
    self.criterion = criterion
  
  #converts from Pandas Data Frame -> np.array
  def processTrainData(self, X_train_df, y_train_df):
    self.attributes = X_train_df.columns.tolist()
    self.X_train = X_train_df.values
    self.y_train = y_train_df.values
    
  #converts from Pandas Data Frame -> np.array
  def processTestData(self, X_test_df):
    self.X_test = X_test_df.values
  
  #process data and calls a function to build the Decision Tree
  def fit(self, X_train_df, y_train_df):
    self.processTrainData(X_train_df, y_train_df)
    self.dt = self.runAlgorithm(self.X_train, self.y_train, self.attributes)
    
  def predict(self, ds):
    self.processTestData(ds)   
    tree = self.dt #do not modify dt
    #[TODO] based on tree find the label corresponding to each sample
    #print sample -> target value (e.g ['Sunny' 'Hot' 'High' 'Strong'] -> No)
    #print(tree.children[0]['split_node'])
    for row in [14, 15, 16]:
      tree = self.dt
      while True:
        if(tree.children == []):
          line = [ds[col_name][row] for col_name in ds]
          print(str(line) + ' ---> ' + tree.getName())
          break
        for child in tree.children:
          #print(ds[tree.getName()][row])
          #print(child['split_data'])
          if(ds[tree.getName()][row] == child['split_data']):
            tree = child['split_node']
            break
    
  def entropy(self, y):
    no_classes = len(y)
    val, counts = np.unique(y, return_counts=True)
    probabilities = counts/no_classes
    h = sum(map(lambda p: -p*np.log2(p), probabilities))
    return h
  
  def gini(self, y):
    #[TODO] implement gini
    no_classes = len(y)
    val, counts = np.unique(y, return_counts=True)
    #print(val)
    #print(counts)
    probabilities = counts / no_classes
    return 1 - sum(map(lambda p: p ** 2, probabilities))
    #return 1 - sum([p ** 2 for ])
  
  # func is a function passed as parameter: it can be entropy or gini
  def gain(self, x, y, func):
    res = func(y)
    if res is None:
      return None
    no_values = len(x)
    val, counts = np.unique(x, return_counts=True)
    probabilities = counts/no_values
    for p, v in zip(probabilities, val):
      res  -= p * func(y[x == v])
    return res
    
  def runAlgorithm(self, x, y, attributes):
    classes, freqs = np.unique(y, return_counts=True)
    
    #if all examples begin to the same class => add a leaf node
    if len(classes)==1:
      RootNode = Tree(classes[0])
    #if no more attributes to split => add a leaf node 
    elif len(attributes)==0:
      most_freq_class = mode(y)
      RootNode = Tree(most_freq_class)
    else:
      attr_index = -1
      no_attributes = len(attributes)
      
      #choose function for splitting
      if self.criterion == "entropy":
        func = self.entropy
      elif self.criterion == "gini":
        func = self.gini
      else:
        raise Exception('Criterion {} not implemented'.format(self.criterion))

      #compute attribute A that maximizes information gain
      #@attr_index is the index of attribute attr in original list of attributes
      #we always want to keep the position of attribute in the original list
      #we will use it at predict
      #@attr_i iterates through current list of attributes
      maxGain = 0
      for attr_i, attr in zip(list(range(no_attributes)), attributes):
        attr_index_orginal_list = self.attributes.index(attr)
        gain = self.gain(x[:, attr_index_orginal_list], y, func)
        assert gain is not None, "Please return something in gain function"
        if(gain > maxGain):
          maxGain = gain
          attr_index = attr_index_orginal_list
        
   
      #attr becomes decision node
      RootNode = Tree(self.attributes[attr_index], attr_index)
      
      #find all possible values vi of A
      unique_values, _ = np.unique(x[:, attr_index], return_counts=True)

      #do not edit attributes, make a copy
      #this value is shared by all nodes on the same level
      new_attributes = attributes[:]
      new_attributes.remove(self.attributes[attr_index])

      for val in unique_values:
        #keep all samples from dataset that have A_t equal with val
        new_x = x[x[:, attr_index] == val]
        new_y = y[x[:, attr_index] == val]

        if(len(new_y) == 0):
          #prepare a leaf node with label most common target
          most_freq_class = mode(y)
          child = Tree(most_freq_class)
        else:
          #create the subtree from the remaining examples and
          #remove the attribute which already created this current branch
          child = self.runAlgorithm(new_x, new_y, new_attributes)
        RootNode.addChild(child, val)
    
    return RootNode
  
  def printDT(self):
    self.dt.printTree(0)
        

tree_id3 = DecisionTreeClassifier("gini")
tree_id3.fit(X_train, y_train)
tree_id3.printDT()
tree_id3.predict(X_test)

|-- outlook 0 == Overcast
|--------> Yes
|-- outlook 0 == Rain
|---- wind 3 == Strong
|----------> No
|---- wind 3 == Weak
|----------> Yes
|-- outlook 0 == Sunny
|---- humidity 2 == High
|----------> No
|---- humidity 2 == Normal
|----------> Yes
['Sunny', 'Hot', 'High', 'Strong'] ---> No
['Rain', 'Mild', 'High', 'Weak'] ---> Yes
['Overcast', 'Hot', 'High', 'Strong'] ---> Yes
