# Decision Tree Implementation - ID3

### Requirements:

#### Diturunkan Sendiri:
- Program bisa membuat sebuah objek pohon yang bisa menyimpan attributes dari tree (v)
- Objek pohon dapat membuat decision tree dari data yang diberikan, dan menyimpan atribut-atribut dari pohon tersebut (v)
- Objek pohon dapat menyimpan node-node yang merupakan splitting points untuk membuat keputusan (v)
- Objek pohon dapat mengakses seluruh node yang ada pada pohon
- Objek pohon dapat memilih splitting point untuk tiap keadaan; apakah menggunakan metrik information gain atau gain ratio (v)
- Objek pohon dapat mempertimbangkan atribut yang value-nya continuous dan diskrit (v)
- Objek pohon dapat mempertimbangkan atribut yang mempunyai missing value (v)
- Objek pohon dapat melakukan post-pruning dengan menggunakan 20% data untuk validasi. Detil pruning kurang lebih: https://www.quora.com/How-can-I-find-a-real-step-by-step-example-of-a-decision-tree-pruning-to-overcome-overfitting
- Objek pohon dapat menampilkan pohon yang dibuat
- Objek node dapat melakukan splitting pada dataset (menentukan keputusan harus ke node mana setelah suatu kondisi)
    - Objek node tahu harus melakukan splitting pada atribut apa
    - Objek node menyimpan splitting points pada atribut yang bersangkutan

#### Dari Spek:
- Overfitting training data dengan post pruning. Gunakanlah 20% training data untuk data validasi.
- Continuous-valued attribute: information gain dari kandidate. (v)
- Alternative measures for selecting attributes: gain ratio. (v)
- Handling missing attribute value: most common target value. (v)
- full-training the data 
- menampilkan modelnya.

In [188]:
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
import math
import collections
import operator

In [3]:
data = pd.read_csv("play_tennis.csv")
data.head()
proportion = data['play'].value_counts()/len(data)
print(proportion[0])
entropy = 0
for p in proportion.tolist():
    print(p)
    entropy -= p*math.log(p,2)
    
print(data[data['outlook'] == 'sunny'])

0.6428571428571429
0.6428571428571429
0.35714285714285715
Empty DataFrame
Columns: [day, outlook, temp, humidity, wind, play]
Index: []


In [4]:
#read iris data
load, target = load_iris(return_X_y=True)
iris_data = pd.DataFrame(load, columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
iris_data['label'] = pd.Series(target)

In [5]:
#definisi kelas Node
#Node merupakan split point pada tree. 
#Kelas ini menyimpan data yang ada pada suatu split point, atribut apa yang digunakan untuk splitting, dan tipe atribut tsb. Atau jika node merupakan daun maka disimpan value-nya
#Kelas ini dapat menentukan splitting point kebawah dari suatu node, baik atribut splittingnya kontinu maupun diskrit
#Atribut-atribut Node: 
# - data: subset data
# - split_attr: nama atribut yang akan di split
# - split_values: value cabang dari node (merupakan satu integer jika continuous, dan multiple values jika categorical)
# - target_attr: atribut label/atribut target prediksi
# - attr_cont_split: splitting point dari atribut tsb (jika atribut tsb kontinu)
# - is_leaf: apakah node merupakan daun atau tidak
# - leaf_value: nilai hasil prediksi jika node merupakan daun
# - childs: anak dari node yang berupa node
class Node:
    #konstruktor
    def __init__(self, data, split_attr, target_attr, is_continuous=False, split_value_continuous=None, is_leaf=False, leaf_value=None, parent_value=None):
        self.data = data
        self.split_attr = split_attr
        self.target_attr = target_attr
        self.childs = []
        self.is_leaf = is_leaf
        self.split_values = [split_value_continuous]
        self.leaf_value = leaf_value
        self.parent_value = parent_value

    #check apakah split attribute == numerik
    def is_attr_categorical(self):
        return self.data[self.split_attr].dtype == 'O'
    
    #get splits node jika node bukan daun
    def get_splits(self):
        if( not self.check_if_leaf()):
            #jika atribut split categorical
            if(self.is_attr_categorical()):
                #tentukan split values
                self.split_values = self.data[self.split_attr].unique()
            #jika atribut numerik / continuous, split value sudah didefinisikan sejak konstruksi objek
            return self.split_values
                        
    #add a child to a node
    def add_child(self, node):
        self.childs.append(node)

In [195]:
#definisi kelas Tree
#Kelas ini mengkonstruksi decision tree dengan menghubungkan sekumpulan node, juga memilih untuk tiap node 
#atribut apa yang akan digunakan untuk splitting. Kelas ini dapat mempertimbangkan atribut yang mengandung nilai null.
#Metrik yang  digunakan bisa dipilih antara information gain atau gain ratio.
#Kelas ini dapat melakukan pruning pada tree yang dibuat, dan juga dapat mencetak model tree yang telah dibuat
#NOTE: Asumsi missing value, bernilai "None" atau "none"
#Atribut-atribut Tree:
# - data: merupakan data yang digunakan untuk training
# - target_attr: atribut yang menjadi target prediksi (label)
# - root: node yang merupakan root
# - use_info_gain: True/False. Jika true maka metrik pemilihan atribut menggunakan information gain. Jika False, metrik menggunakan gain ratio
class Tree:
    #konstruktor
    def __init__(self, data, target_attr, use_info_gain=True,root_value=None):
        self.data = data
        self.target_attr = target_attr
        self.root = None
        self.root_value = root_value
        self.use_info_gain = use_info_gain
        self.ruleset = []
    
    #cari entropi total pada data
    def total_entropy(self, data):
        proportion = data[self.target_attr].value_counts()/len(data)
        entropy = 0
        for p in proportion.tolist():
            entropy -= p*math.log(p,2)
        return entropy
    
    #hitung information gain dari suatu kolom
    def info_gain(self, kolom):
        data = self.data
        data_entropy = self.total_entropy(data)
        proportion_kolom = data[kolom].value_counts()/len(data)
        sum_entropy_kolom = 0
        for value_kolom, value_proportion in zip(proportion_kolom.index.tolist(), proportion_kolom.tolist()):
            #print("here checking")
            #print(data[data[kolom] == value_kolom])
            entropy_value_kolom = self.total_entropy(data[data[kolom] == value_kolom])
            sum_entropy_kolom -= value_proportion*entropy_value_kolom
            
        return data_entropy + sum_entropy_kolom
    
    #hitung information split pada data di suatu atribut
    def split_info(self, attr):
        proportion = self.data[attr].value_counts()/len(data)
        split_info = 0
        for p in proportion.tolist():
            split_info -= p*math.log(p,2)
        return split_info
    
    #hitung gain ratio untuk suatu atribut
    def gain_ratio(self, attr):
        return info_gain(attr)/split_info(attr)
    
    #cari split-split yang memungkinkan pada atribut continuous
    def find_possible_splits_continuous(self, sorted_data, split_attr):
        sorted_target = sorted_data[self.target_attr].values.tolist()
        sorted_attr = sorted_data[split_attr].values.tolist()
        prev_target_value = sorted_target[0]
        possible_splits = []
        #iterasi target value, cari titik-titik dimana 
        try:
            for i in range(1, len(sorted_target)):
                el = sorted_target[i]
                if (prev_target_value != el):
                    possible_splits.append(0.5*(sorted_attr[i] + sorted_attr[i-1]))
                prev_target_value = el
        except Exception as e:
            print(e)
        finally:
            return possible_splits
    
    #cari gain dari tiap split dan cari split optimum
    def find_optimum_split_continuous(self, pos_splits, sorted_data, split_attr):
        optimum_split = 0
        max_info_gain = -1
        #iterate split
        for i, el in enumerate(pos_splits):
            #hitung information gain
            current_gain = self.calculate_info_gain_continuous(el, sorted_data, split_attr)
            #jika information gain lebih dari sebelumnya, ganti optimum split
            if(current_gain > max_info_gain):
                max_info_gain = current_gain
                optimum_split = el
        return optimum_split
    
    #cari information gain pada suatu split continuous
    def calculate_info_gain_continuous(self, split_value, sorted_data, split_attr):
        data_entropy = self.total_entropy(sorted_data)
        #pisah data mjd "<=" dan ">" split_value
        data_less_than_equal = sorted_data[sorted_data[split_attr] <= split_value]
        data_more_than = sorted_data[sorted_data[split_attr] > split_value]
        #hitung entropi kolom
        entropy_less_than_equal = (float(len(data_less_than_equal))/len(sorted_data)) * self.total_entropy(data_less_than_equal)
        entropy_more_than = (float(len(data_more_than))/len(sorted_data)) * self.total_entropy(data_more_than)
        return data_entropy - entropy_less_than_equal - entropy_more_than
    
    #check apakah attribute == numerik
    def is_attr_categorical(self, attr):
        return self.data[attr].dtype == 'O'
    
    #handling missing value
    def handle_missing_value(self, split_attr):
        if(self.is_attr_categorical(split_attr)):
            mode = self.data[split_attr].mode().values[0]
            self.data[split_attr] = self.data[split_attr].replace({None:mode})        
    
    #buat tree
    def make_tree(self):
        #cari info_gain dari masing-masing kolom 
        data_X = self.data.drop(self.target_attr, axis=1)
        
        #basis-1: jika data terbagi dg sempurna
        if(self.data[self.target_attr].nunique() == 1):
            self.root = Node("none", "none", "none", is_leaf=True, leaf_value=self.data[self.target_attr].unique()[0], parent_value=self.root_value)
            return self.root
        
        #basis-2: jika tidak ada atribut
        if(len(data_X.columns) == 0):
            self.root = Node("none", "none", "none", is_leaf=True, leaf_value=self.data[self.target_attr].mode().values[0], parent_value=self.root_value)
            return self.root
        
        #rekurens, jika data tidak bisa mjd leaf
        else:
            max_metric = -1
            split_attr = ""
            is_split_attr_categorical = True
            for attr in data_X.columns:
                #Jika kolom kategorikal
                if(self.is_attr_categorical(attr)):
                    if(self.use_info_gain):
                        current_metric = self.info_gain(attr)
                    else:
                        current_metric = self.gain_ratio(attr)
                #jika kolom numerik
                else:
                    #sort data
                    sorted_data = self.data.sort_values(by=attr)
                    #cari split-split yang memungkinkan 
                    pos_splits = self.find_possible_splits_continuous(sorted_data, attr)
                    #hitung gain dari tiap continuous split dan cari nilai optimum
                    split_value_continuous = self.find_optimum_split_continuous(pos_splits, sorted_data, attr)
                    #hitung gain ketika sudah diketahui nilai optimum
                    current_metric = self.calculate_info_gain_continuous(split_value_continuous, sorted_data, attr)

                #jika ditemukan maximum info gain di kolom tertentu
                if(current_metric > max_metric):
                    max_metric = current_metric
                    split_attr = attr
                    is_split_attr_categorical = self.is_attr_categorical(attr)
                    if (not is_split_attr_categorical):
                        split_value_attr = split_value_continuous
            
            #setelah atribut dipilih, cek apakah ada missing value
            #impute missing value dengan modus pada atribut tsb. (asumsi: atribut yg di handle hanyalah kategorikal)
            self.handle_missing_value(split_attr)
            
            #buat node
            #jika atribut terpilih == kategorikal
            if(is_split_attr_categorical):
                self.root = Node(self.data, split_attr, self.target_attr, parent_value=self.root_value)
                split_values = self.data[split_attr].unique()
                #iterate all split values
                for split_value in split_values:
                    filtered_data = self.data[self.data[split_attr] == split_value].drop(split_attr, axis=1)
                    self.root.add_child(Tree(filtered_data, self.target_attr, root_value=split_value).make_tree())

            #jika atribut terpilih == numerik & kontinu
            else:
                self.root = Node(self.data, split_attr, self.target_attr, is_continuous=True, split_value_continuous=split_value_attr, parent_value=self.root_value)
                #filter <=
                filtered_data = self.data[self.data[split_attr] <= split_value_attr].drop(split_attr, axis=1)
                self.root.add_child(Tree(filtered_data, self.target_attr, root_value="<="+str(split_value_attr)).make_tree())

                #filter >
                filtered_data = self.data[self.data[split_attr] > split_value_attr].drop(split_attr, axis=1)
                self.root.add_child(Tree(filtered_data, self.target_attr, root_value=">"+str(split_value_attr)).make_tree())

            return self.root

    def print_tree(self, node, depth, space):
        if (depth == 0):
            print('-------tree-------')
            dash = ''
        else:
            dash = '|' + '-'*space + '>'
            
        if(node.is_leaf):
            output = ('|' + (' '*space))*(depth-1) + dash + '{' + str(node.leaf_value) + '}'
        else:
            output = ('|' + (' '*space))*(depth-1) + dash + node.split_attr 
        
        if (node.parent_value):
            output = output + '    (' + node.parent_value + ')'
        
        print(output)
        
        depth += 1
        for child in node.childs:
            self.print_tree(child, depth, space)
            
    #bagian rekursif untuk prediksi
    def get_prediction_result(self, prediction_instance, node):
        #basis - jika node merupakan leaf, kembalikan value
        if(node.is_leaf):
            return node.leaf_value
        
        #rekurens - jika node bukan leaf, cari anaknya yang tepat, telusuri anak
        else:
            #jika node categorical
            if(node.is_attr_categorical()):
                for child in node.childs:
                    if (child.parent_value == prediction_instance[node.split_attr]):
                        return self.get_prediction_result(prediction_instance, child)
                        break
            #jika node numerik/kontinu
            else:
                if(prediction_instance[node.split_attr] <= node.split_values[0]):
                    return self.get_prediction_result(prediction_instance, node.childs[0])
                elif(prediction_instance[node.split_attr] > node.split_values[0]):
                    return self.get_prediction_result(prediction_instance, node.childs[1])
                
    #prediksi suatu dataset test
    def predict(self, test_data):
        print('-------predict-------')
        pred_result = []
        #iterasi seluruh instance pada test_data
        for i in range(len(test_data)):
            #instance untuk di prediksi
            prediction_instance = test_data.iloc[i]
            #get prediction untuk instance yang dicek, lalu append ke hasil
            pred_result.append(self.get_prediction_result(prediction_instance, self.root))
        return pred_result
    
    #transformasi tree menjadi kumpulan rule
    def recursively_write_rule(self, node, rule):
        #basis - mencapai leaf. Append rule ke ruleset
        if(node.is_leaf):
            new_rule = rule + [[self.target_attr, node.leaf_value]]
            self.ruleset.append(new_rule)
        
        #rekurens - mencatat current precondition dan telusuri anak-anaknya
        else:
            for child in node.childs:
                new_rule = rule + [[node.split_attr, child.parent_value]]
                self.recursively_write_rule(child, new_rule)
    
    #parsing rule menjadi query
    def parse_rule(self, rule):
        str_rule = ''
        for statement in rule[:-1]:
            #categorical variable
            if (statement[1][0] != "<") and (statement[1][0] != ">"):
                str_rule += statement[0] + ' == "' + statement[1] + '" and '
            else:
                str_rule += statement[0] + statement[1] + ' and '
        return str_rule[:-4]
    
    #kalkulasi akurasi suatu rule
    def calculate_rule_accuracy(self, rule):
        query = self.parse_rule(rule)
        filtered_data = self.data_test.query(query)
        target_value = rule[-1][1]
        num_correct_answers = len(filtered_data[filtered_data[self.target_attr] == target_value])
        if(len(filtered_data) == 0):
            return 0
        else:
            return float(num_correct_answers)/float(len(filtered_data))
    
    #pruning untuk suatu rule
    def prune_rule(self, rule, prev_accuracy):
        if(len(rule) > 2):
            optimal_rule = []
            max_accuracy = -1
            for statement in rule[:-1]:
                temp_rule = rule.copy()
                temp_rule.remove(statement)
                accuracy = self.calculate_rule_accuracy(temp_rule)
                if accuracy > max_accuracy:
                    max_accuracy = accuracy
                    optimal_rule = temp_rule

            #basis - akurasi tidak improve
            if((max_accuracy <= prev_accuracy) or (len(rule) == 0)):
                return (optimal_rule, rule[-1][-1], max_accuracy)
            #rekurens - akurasi masih bisa dinaikkan dengan pruning
            else:
                return self.prune_rule(optimal_rule, max_accuracy)
        else:
            return (rule, rule[-1][-1], self.calculate_rule_accuracy(rule))
    
    #post-pruning
    def rule_post_pruning(self, data_test):
        print('-------rules-------')
        #definisikan rules yang ada
        #lakukan DFS pada tree sampai leaf. Catat semua rule yang ada
        self.recursively_write_rule(self.root, [])
        
        #set data test
        self.data_test = data_test
        
        #prune rule
        sorted_rule = {}
        for i, rule in enumerate(self.ruleset):
            pruned_rule, label, accuracy = (self.prune_rule(rule, -1))
            sorted_rule[self.parse_rule(pruned_rule)+'; label: '+str(label)] = accuracy

        sorted_rule = sorted(sorted_rule.items(), key=lambda kv: kv[1])
        return sorted_rule

In [None]:
randomized_iris_data = iris_data.sample(frac=1).reset_index().drop('index', axis=1)
iris_train_data = randomized_iris_data.iloc[0:120]
iris_test_data = randomized_iris_data.iloc[120:]

In [196]:
prune_tree = Tree(iris_train_data, 'label')
root_prune_tree = prune_tree.make_tree()
prune_tree.print_tree(root_prune_tree, 0, 2)
pruned_rules = prune_tree.rule_post_pruning(iris_test_data)
for pruned_rule in pruned_rules:
    print(pruned_rule[0])

-------tree-------
petal_length
|-->{0}    (<=2.45)
|-->petal_width    (>2.45)
|  |-->sepal_length    (<=1.7)
|  |  |-->{2}    (<=4.95)
|  |  |-->sepal_width    (>4.95)
|  |  |  |-->{1}    (<=2.2)
|  |  |  |-->{1}    (>2.2)
|  |-->sepal_length    (>1.7)
|  |  |-->sepal_width    (<=5.9)
|  |  |  |-->{2}    (<=3.1)
|  |  |  |-->{1}    (>3.1)
|  |  |-->{2}    (>5.9)
-------rules-------
sepal_length<=5.9 and sepal_width>3.1 ; label: 1
petal_length>2.45 ; label: 2
petal_length>2.45 ; label: 1
petal_length<=2.45 ; label: 0
sepal_length>4.95 and sepal_width<=2.2 ; label: 1
petal_width>1.7 and sepal_width<=3.1 ; label: 2
petal_width>1.7 ; label: 2


In [95]:
tree_iris = Tree(iris_data, 'label')
root_iris = tree_iris.make_tree()
tree_iris.print_tree(root_iris, 0, 2)

test_data = iris_data.sort_values(by='sepal_width').tail(10)
tree_iris.predict(test_data)

-------tree-------
petal_length
|-->{0}    (<=2.45)
|-->petal_width    (>2.45)
|  |-->sepal_length    (<=1.7)
|  |  |-->sepal_width    (<=7.1)
|  |  |  |-->{1}    (<=2.8)
|  |  |  |-->{1}    (>2.8)
|  |  |-->{2}    (>7.1)
|  |-->sepal_length    (>1.7)
|  |  |-->sepal_width    (<=5.9)
|  |  |  |-->{2}    (<=3.1)
|  |  |  |-->{1}    (>3.1)
|  |  |-->{2}    (>5.9)
-------predict-------


[0, 0, 2, 2, 0, 0, 0, 0, 0, 0]

In [108]:
prune_tree = Tree(iris_train_data, 'label')
root_prune_tree = prune_tree.make_tree()
prune_tree.print_tree(root_prune_tree, 0, 2)
prune_tree.rule_post_pruning(iris_test_data)

-------tree-------
petal_length
|-->{0}    (<=2.45)
|-->petal_width    (>2.45)
|  |-->sepal_length    (<=1.7)
|  |  |-->{2}    (<=4.95)
|  |  |-->sepal_width    (>4.95)
|  |  |  |-->{1}    (<=2.2)
|  |  |  |-->{1}    (>2.2)
|  |-->sepal_length    (>1.7)
|  |  |-->sepal_width    (<=5.9)
|  |  |  |-->{2}    (<=3.1)
|  |  |  |-->{1}    (>3.1)
|  |  |-->{2}    (>5.9)
-------rules-------
petal_length
none
petal_width
sepal_length
none
sepal_width
none
none
sepal_length
sepal_width
none
none
none
[['petal_length', '<=2.45'], ['none', 0]]
[['petal_length', '>2.45'], ['petal_width', '<=1.7'], ['sepal_length', '<=4.95'], ['none', 2]]
[['petal_length', '>2.45'], ['petal_width', '<=1.7'], ['sepal_length', '>4.95'], ['sepal_width', '<=2.2'], ['none', 1]]
[['petal_length', '>2.45'], ['petal_width', '<=1.7'], ['sepal_length', '>4.95'], ['sepal_width', '>2.2'], ['none', 1]]
[['petal_length', '>2.45'], ['petal_width', '>1.7'], ['sepal_length', '<=5.9'], ['sepal_width', '<=3.1'], ['none', 2]]
[['petal_

ValueError: expr cannot be an empty string

In [85]:
data_X = data.drop('day', axis=1)
tree = Tree(data_X, 'play', use_info_gain=True)
root = tree.make_tree()

tree.print_tree(root, 0, 2)
print('-------predict-------')
print(tree.predict(data_X.tail(4)))

-------tree-------
outlook
|-->humidity    (Sunny)
|  |-->{No}    (High)
|  |-->{Yes}    (Normal)
|-->{Yes}    (Overcast)
|-->wind    (Rain)
|  |-->{Yes}    (Weak)
|  |-->{No}    (Strong)
-------predict-------
-------predict-------
['Yes', 'Yes', 'Yes', 'No']


In [8]:
#hitung entropi total dataset
def total_entropy(data):
    proportion = data['play'].value_counts()/len(data)
    entropy = 0
    for p in proportion.tolist():
        entropy -= p*math.log(p,2)
    return entropy

#hitung information gain dari suatu kolom
def gain(data, kolom):
    data_entropy = total_entropy(data)
    print('KOLOM:', kolom.upper())
    print('total entropy of current data', '=',data_entropy)
    proportion_kolom = data[kolom].value_counts()/len(data)
    sum_entropy_kolom = 0
    for value_kolom, value_proportion in zip(proportion_kolom.index.tolist(), proportion_kolom.tolist()):
        entropy_value_kolom = total_entropy(data[data[kolom] == value_kolom])
        sum_entropy_kolom -= value_proportion*entropy_value_kolom
        print('value entropy kolom for', kolom, ':', value_kolom, ':', value_proportion, '=', entropy_value_kolom )
    print('sum entropy kolom for', kolom, '=', sum_entropy_kolom)
    return data_entropy + sum_entropy_kolom

#get current_data
def get_node_data(data, kolom, value):
    new_data = data[data[kolom] == value]
    return new_data.drop(kolom, axis=1)

#get current_columns
def get_current_columns(data):
    return data.drop('play', axis=1).columns

### Iterasi 1

In [9]:
current_data = data
current_gains = []
for kolom in get_current_columns(current_data):
    gain_kolom = gain(current_data, kolom)
    print("gain", kolom, ":", gain_kolom)
    current_gains.append([kolom, gain_kolom])
current_gains = pd.Series([x[1] for x in current_gains], index=[x[0] for x in current_gains])
print('GAINS\n', current_gains)
print('AKTIVITAS TERKAIT', current_data['play'].value_counts().index.tolist())

KOLOM: DAY
total entropy of current data = 0.9402859586706309
value entropy kolom for day : D7 : 0.07142857142857142 = 0.0
value entropy kolom for day : D6 : 0.07142857142857142 = 0.0
value entropy kolom for day : D12 : 0.07142857142857142 = 0.0
value entropy kolom for day : D13 : 0.07142857142857142 = 0.0
value entropy kolom for day : D10 : 0.07142857142857142 = 0.0
value entropy kolom for day : D8 : 0.07142857142857142 = 0.0
value entropy kolom for day : D14 : 0.07142857142857142 = 0.0
value entropy kolom for day : D5 : 0.07142857142857142 = 0.0
value entropy kolom for day : D3 : 0.07142857142857142 = 0.0
value entropy kolom for day : D4 : 0.07142857142857142 = 0.0
value entropy kolom for day : D1 : 0.07142857142857142 = 0.0
value entropy kolom for day : D9 : 0.07142857142857142 = 0.0
value entropy kolom for day : D2 : 0.07142857142857142 = 0.0
value entropy kolom for day : D11 : 0.07142857142857142 = 0.0
sum entropy kolom for day = 0.0
gain day : 0.9402859586706309
KOLOM: OUTLOOK
to

In [10]:
current_data

Unnamed: 0,day,outlook,temp,humidity,wind,play
0,D1,Sunny,Hot,High,Weak,No
1,D2,Sunny,Hot,High,Strong,No
2,D3,Overcast,Hot,High,Weak,Yes
3,D4,Rain,Mild,High,Weak,Yes
4,D5,Rain,Cool,Normal,Weak,Yes
5,D6,Rain,Cool,Normal,Strong,No
6,D7,Overcast,Cool,Normal,Strong,Yes
7,D8,Sunny,Mild,High,Weak,No
8,D9,Sunny,Cool,Normal,Weak,Yes
9,D10,Rain,Mild,Normal,Weak,Yes


#### Root = Ada Hangout

### Iterasi 2

#### ada hangout = ya

In [11]:
current_data = get_node_data(data, 'ada hangout', 'ya')
current_gains = []
for kolom in get_current_columns(current_data):
    gain_kolom = gain(current_data, kolom)
    print("gain", kolom, ":", gain_kolom)
    current_gains.append([kolom, gain_kolom])
current_gains = pd.Series([x[1] for x in current_gains], index=[x[0] for x in current_gains])
print('GAINS\n', current_gains)
print('AKTIVITAS TERKAIT', current_data['aktivitas'].value_counts().index.tolist())

KeyError: 'ada hangout'

In [None]:
current_data

#### ada hangout = tidak

In [None]:
current_data = get_node_data(data, 'ada hangout', 'tidak')
current_gains = []
for kolom in get_current_columns(current_data):
    gain_kolom = gain(current_data, kolom)
    print("gain", kolom, ":", gain_kolom)
    current_gains.append([kolom, gain_kolom])
current_gains = pd.Series([x[1] for x in current_gains], index=[x[0] for x in current_gains])
print('GAINS\n', current_gains)
print('AKTIVITAS TERKAIT', current_data['aktivitas'].value_counts().index.tolist())

In [None]:
current_data

### Iterasi 3

#### ada hangout = tidak ^ deadline = urgent

In [None]:
current_data = get_node_data(get_node_data(data, 'ada hangout', 'tidak'), 'deadline', 'urgent')
current_gains = []
for kolom in get_current_columns(current_data):
    gain_kolom = gain(current_data, kolom)
    print("gain", kolom, ":", gain_kolom)
    current_gains.append([kolom, gain_kolom])
current_gains = pd.Series([x[1] for x in current_gains], index=[x[0] for x in current_gains])
print('GAINS\n', current_gains)
print('AKTIVITAS TERKAIT', current_data['aktivitas'].value_counts().index.tolist())

In [None]:
current_data

#### ada hangout = tidak ^ deadline = dekat

In [None]:
current_data = get_node_data(get_node_data(data, 'ada hangout', 'tidak'), 'deadline', 'dekat')
current_gains = []
for kolom in get_current_columns(current_data):
    gain_kolom = gain(current_data, kolom)
    print("gain", kolom, ":", gain_kolom)
    current_gains.append([kolom, gain_kolom])
current_gains = pd.Series([x[1] for x in current_gains], index=[x[0] for x in current_gains])
print('GAINS\n', current_gains)
print('AKTIVITAS TERKAIT', current_data['aktivitas'].value_counts().index.tolist())

In [None]:
current_data

#### ada hangout = tidak ^ deadline = tidak ada

In [None]:
current_data = get_node_data(get_node_data(data, 'ada hangout', 'tidak'), 'deadline', 'tidak ada')
current_gains = []
for kolom in get_current_columns(current_data):
    gain_kolom = gain(current_data, kolom)
    print("gain", kolom, ":", gain_kolom)
    current_gains.append([kolom, gain_kolom])
current_gains = pd.Series([x[1] for x in current_gains], index=[x[0] for x in current_gains])
print('GAINS\n', current_gains)
print('AKTIVITAS TERKAIT', current_data['aktivitas'].value_counts().index.tolist())

In [None]:
current_data

### Iterasi 4

#### ada hangout = tidak ^ deadline = dekat ^ malas = tidak

In [None]:
current_data = get_node_data(get_node_data(get_node_data(data, 'ada hangout', 'tidak'), 'deadline', 'dekat'), 'malas', 'tidak')
current_gains = []
for kolom in get_current_columns(current_data):
    gain_kolom = gain(current_data, kolom)
    print("gain", kolom, ":", gain_kolom)
    current_gains.append([kolom, gain_kolom])
current_gains = pd.Series([x[1] for x in current_gains], index=[x[0] for x in current_gains])
print('GAINS\n', current_gains)
print('AKTIVITAS TERKAIT', current_data['aktivitas'].value_counts().index.tolist())

In [None]:
current_data

#### ada hangout = tidak ^ deadline = dekat ^ malas = ya

In [None]:
current_data = get_node_data(get_node_data(get_node_data(data, 'ada hangout', 'tidak'), 'deadline', 'dekat'), 'malas', 'ya')
current_gains = []
for kolom in get_current_columns(current_data):
    gain_kolom = gain(current_data, kolom)
    print("gain", kolom, ":", gain_kolom)
    current_gains.append([kolom, gain_kolom])
current_gains = pd.Series([x[1] for x in current_gains], index=[x[0] for x in current_gains])
print('GAINS\n', current_gains)
print('AKTIVITAS TERKAIT', current_data['aktivitas'].value_counts().index.tolist())

In [None]:
current_data