# ID3 (Iterative Dichotomiser 3) Algorithm Based Decision Tree

Subject Code: 18AIL66

Program No.: 4

Demonstrates the working of the decision tree based ID3 algorithm.
sample.

Other decision tree algorithms are C4.5, C5.0 and CART (Classification and Regression Trees). ID3 is used in classification only with nominal feautures.

In [1]:
# Imports required packages

import pandas as pd
import numpy as np
from ID3DecisionTree import Node, ID3DecisionTreeClassifier

## Exploratory Data Analysis

In [2]:
# Loads data

data = pd.read_csv("../../Data/playtennis.csv")

In [4]:
# Having a quick look into data

display(data)

Unnamed: 0,Day,Outlook,Temperature,Humidity,Wind,PlayTennis
0,D1,Sunny,Hot,High,Weak,No
1,D2,Sunny,Hot,High,Strong,No
2,D3,Overcast,Hot,High,Weak,Yes
3,D4,Rain,Mild,High,Weak,Yes
4,D5,Rain,Cool,Normal,Weak,Yes
5,D6,Rain,Cool,Normal,Strong,No
6,D7,Overcast,Cool,Normal,Strong,Yes
7,D8,Sunny,Mild,High,Weak,No
8,D9,Sunny,Cool,Normal,Weak,Yes
9,D10,Rain,Mild,Normal,Weak,Yes


## Data Preparation

In [8]:
# Skipping column 'Day' for not being relevant

data = data[["Outlook", "Temperature", "Humidity", "Wind", "PlayTennis"]]

display(data)

Unnamed: 0,Outlook,Temperature,Humidity,Wind,PlayTennis
0,Sunny,Hot,High,Weak,No
1,Sunny,Hot,High,Strong,No
2,Overcast,Hot,High,Weak,Yes
3,Rain,Mild,High,Weak,Yes
4,Rain,Cool,Normal,Weak,Yes
5,Rain,Cool,Normal,Strong,No
6,Overcast,Cool,Normal,Strong,Yes
7,Sunny,Mild,High,Weak,No
8,Sunny,Cool,Normal,Weak,Yes
9,Rain,Mild,Normal,Weak,Yes


## Applying ID3 Algorithm

_**Psedocode for the ID3 algorithm**_

ID3(examples, target_attribute, attributes)
    
_'examples' are the training examples._

_'target_attribute' is the attribute whose value is to be predicted by the tree._

_'attributes' is a list of other attributes that may be tested by the learned decision tree._

_Returns a decision tree that correctly classifies the given Examples._
    
- Create a _root_ node for the tree
- If all _examples_ are positive, return the single-node tree _root_, with label = +
- If all _examples_ are negative, return the single-node tree _root_, with label = -
- If _attributes_ is empty, return the single-node tree _root_, with label = most common value of _target_attribute_ in _examples_
- Otherwise Begin
    - _A_ <-- the attribute from _attributes_ that best* classifies _examples_
    - The decision attribute for _root_ <-- _A_
    - For each possible value, _v<sub>i</sub>_, of _A_,
        - Add a new tree branch below _root_, corresponding to the test _A_ = _v<sub>i</sub>_
        - Let _examples<sub>v<sub>i</sub></sub>_, be the subset of _examples_ that have value _v<sub>i</sub>_ for _A_
        - If _examples<sub>v<sub>i</sub></sub>_ is empty
            - Then below this new branch add a leaf node with label = most common value of _target_attribute_ in _examples_
            - Else below this new branch add the subtree ID3(_examples<sub>v<sub>i</sub></sub>_, _target_attribute_, _attributes_ - {_A_}))
- End
- Return _root_
    
_**Formulae:**_
$$Entropy(S)=\sum_{i=1}^{n}-p_i*log_2p_i$$

$$Information_Gain(S, A)=Entropy(A)-\sum_{v∈Values(A)}\frac{|S_v|}{|S|}*Entropy(S_v)$$

where,

_p<sub>i</sub>_ is proportion of _S_ belonging to class _i_

_Values(A)_ is set of all possible values in attribute _A_

_S<sub>v</sub>_ is the subset of _S_ for which attribute _A_ has value _v_ (i.e. _S<sub>v</sub>={s∈S|A(s)=v}_)

In [9]:
# Instantiating the classifier
id3DecisionTreeClassifier = ID3DecisionTreeClassifier()

In [10]:
# Fits the classifier
id3DecisionTreeClassifier.fit(data, "PlayTennis")

In [11]:
# Prints the decision tree
id3DecisionTreeClassifier.print_tree()

(Outlook) [e: 0.94, s: 14]
	Sunny
		(Humidity) [e: 0.97, s: 5]
			High
				('LEAF') [e: 0.00, s: 3, p: ['No']]
			Normal
				('LEAF') [e: 0.00, s: 2, p: ['Yes']]
	Overcast
		('LEAF') [e: 0.00, s: 4, p: ['Yes']]
	Rain
		(Wind) [e: 0.97, s: 5]
			Weak
				('LEAF') [e: 0.00, s: 3, p: ['Yes']]
			Strong
				('LEAF') [e: 0.00, s: 2, p: ['No']]

LEGENDS:
e: entropy, s: sample count, p: prediction


In [12]:
# Classifies one test data

test_data = {"Outlook":"Sunny", "Temperature":"Hot", "Humidity":"Normal", "Wind":"Strong"}

prediction = id3DecisionTreeClassifier.predict(test_data)

print(prediction)

['Yes']
