## Entropy and Information Gain
Consider an example where we are building a decision tree to predict whether a loan given to a person would result in a write-off or not. Our entire population consists of 30 instances. 16 belong to the write-off class and the other 14 belong to the non-write-off class. We have two features, namely “Balance” that can take on two values -> “< 50K” or “>50K” and “Residence” that can take on three values -> “OWN”, “RENT” or “OTHER”. 

Your tasks are to demonstrate how a decision tree algorithm would decide what attribute to split on first and what feature provides more information, or reduces more uncertainty about our target variable out of the two using the concepts of Entropy and Information Gain.

In [1]:
# Import relevant libraries
import pandas as pd
import numpy as np

In [2]:
# load the loan.csv file and save the data in variable loan... Note the delimiter of the file is ';'
loan=pd.read_csv('loan.csv', sep=';')
loan.head()

Unnamed: 0,Class,Balance,Residence
0,Write-Off,<50K,OWN
1,Write-Off,<50K,OWN
2,Write-Off,<50K,OWN
3,Write-Off,<50K,OWN
4,Write-Off,<50K,OWN


## Entropy: Write-Offs vs. Non-Write-Offs
<img src="https://miro.medium.com/max/1300/1*zMu0UClotNXljrjqmyRIHA.png" alt="Example" width="600"/>

Given the above example you will need to calculate the Entropy of the Root node, Entropy of the child nodes, and the information gain. The feature we are splitting on is **'Balance'**.

In [3]:
count_write = loan.Class.value_counts()[0]
count_no_write = loan.Class.value_counts()[1]
count_total = len(loan)
print('Count of writes =', count_write)
print('Count of no-writes =', count_no_write)
print('Count of df =', count_total)

Count of writes = 16
Count of no-writes = 14
Count of df = 30


In [4]:
# Calculate Entropy of the Root node
entropy_root = -(16/30)*np.log2(16/30)-(14/30)*np.log2(14/30)
print('The root entropy =',entropy_root)

The root entropy = 0.9967916319816366


In [5]:
# Calculate Entropy of the Child Nodes
entropy_1 = -(12/13)*np.log2(12/13)-(1/13)*np.log2(1/13)
entropy_2 = -(4/17)*np.log2(4/17)-(13/17)*np.log2(13/17)
print("The entropy of 'Balance <50K ='",entropy_1,
      "\nThe entropy of 'Balance >=50K ='",entropy_2)

The entropy of 'Balance <50K =' 0.39124356362925566 
The entropy of 'Balance >=50K =' 0.7871265862012691


In [6]:
# Which class has the purest split?
# accounts with Balance <50K have the purest split

Information Gain:
IG=entropy(parent)-[weighted average]entropy(children)

where
        [weighted average]entropy(children)=(no. of examples in left child node) / (total no. of examples in parent node) * (entropy of left node) 
+ 
(no. of examples in right child node)/ (total no. of examples in parent node) * (entropy of right node)

In [7]:
#weighted average of entropy children for balance is
weighted_average_entropy_balance=(13/30)*entropy_1+(17/30)*entropy_2
print('The entropy of the Balance class =',weighted_average_entropy_balance)

The entropy of the Balance class = 0.6155772764200632


<img src="https://miro.medium.com/max/1410/1*JaCz5L8AGreiPza3BLqF3Q.png" alt="Example" width="600"/>


Given the above example you will need to calculate the Entropy of the Root node, Entropy of the child nodes, and the information gain. The feature we are splitting on is **'Residence'**.

In [8]:
# Calculate Entropy of the Child Nodes
entropy_1 = -(7/8)*np.log2(7/8)-(1/8)*np.log2(1/8)
entropy_2 = -(4/10)*np.log2(4/10)-(6/10)*np.log2(6/10)
entropy_3 = -(5/12)*np.log2(5/12)-(7/12)*np.log2(7/12)
# 
print("Entropy of 'Residence = OWN'",entropy_1,
      "\nEntropy of 'Residence = RENT'", entropy_2, # by the way, \n = new line
      "\nEntropy of 'Residence = OTHER'",entropy_3)

Entropy of 'Residence = OWN' 0.5435644431995964 
Entropy of 'Residence = RENT' 0.9709505944546686 
Entropy of 'Residence = OTHER' 0.9798687566511528


In [9]:
#weighted average of entropy of children nodes based on residence
weighted_average_entropy_residence=(8/30)*entropy_1+(10/30)*entropy_2+(12/30)*entropy_3
print('The w. average entropy of Residence nodes =',weighted_average_entropy_residence)

The w. average entropy of Residence nodes = 0.8605482189985763


In [10]:
# Which class has the purest split?
# 'Residence = OWN' has the purest split

# Information Gain: Write-Offs vs. Non-Write-Offs
Now that you've calculated the entropy per feature. What is the Information Gain when splitting on Balance and Residence?

In [11]:
# Calculate the IG associated with Balance
information_gain_balance = entropy_root-weighted_average_entropy_balance
print(information_gain_balance)

0.38121435556157335


In [12]:
# Calculate the IG associated with Residence
information_gain_residence = entropy_root-weighted_average_entropy_residence
print(information_gain_residence)

0.1362434129830603


In [13]:
# On what feature will the Decision Tree split and why?
# we can split the decision tree based on the BALANCE feature which as HIGH information gain
if information_gain_residence < information_gain_balance:
    print('We split on Balance as it provides us with the highest IG')
else:
    print('We split on Residence as it provides us with the highest IG')

We split on Balance as it provides us with the highest IG
