#  <font color='red'> Review of entropy using python </font> 
 
We will use pandas to read a CSV file and to store data

Documentation of pandas https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html

In [21]:
import pandas as pd
import numpy as np


## Download student-mat.csv from ICON

The csv file is downloaded from Kaggle 
https://www.kaggle.com/uciml/student-alcohol-consumption/data?select=student-mat.csv


In [22]:

df = pd.read_csv('student-mat.csv')
df.head(3)


Unnamed: 0,school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,...,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
0,GP,F,18,U,GT3,A,4,4,at_home,teacher,...,4,3,4,1,1,3,6,5,6,6
1,GP,F,17,U,GT3,T,1,1,at_home,other,...,5,3,3,1,1,3,4,5,5,6
2,GP,F,15,U,LE3,T,1,1,at_home,other,...,4,3,2,2,3,3,10,7,8,10


## Create a smaller data frame with only two columns

Grade A: G3  > 80%

Absences: high absences, if a student missed 10 or more classes.


In [23]:
newlist = pd.DataFrame()
newlist['Grade'] = np.where(df['G3']*5 >= 80, 1, 0)
newlist['Absences'] = np.where(df['absences'] >= 10, 1, 0)
newlist['count'] = 1
newlist.head(10)

Unnamed: 0,Grade,Absences,count
0,0,0,1
1,0,0,1
2,0,1,1
3,0,0,1
4,0,0,1
5,0,1,1
6,0,0,1
7,0,0,1
8,1,0,1
9,0,0,1


## Compute joint, marginal, and conditional probabilities


In [24]:
Joint_table = pd.pivot_table(
    newlist, 
    values='count', 
    index=['Grade'], 
    columns=['Absences'], 
    aggfunc=np.size, 
    fill_value=0
)

Joint_table = Joint_table.to_numpy()

# Joint probabilities
PAG = Joint_table/len(newlist)

## Joint Entropy, Individual Entropies, Mutual Information and Conditional entropies

Entropy of X and Y: information in the random variable
$$H(X) = -\sum_i p(x_i) \log_2 p(x_i) = -\sum_{i,j}p(x_i,y_j)\log_2(p(x_i))$$

If the variables are independent, we have $p(x,y)=p(x)p(y)$. We can visually compare $p(x,y)$ and $p(x)p(y)$ to see the similarity.

Mutual information between the variables is the KL divergence between $p(x,y)$ and $p(x)p(y)$
$$MI(X,Y) = KL(p(x,y)||p(x)p(y)) = \sum_{i,j} p(x_i,y_j) \log_2 \left(\frac{p(x_i,y_j)}{p(x_i)p(y_j)}\right)$$


Conditional entropy be obtained as $H(X|Y) = H(X)-MI(X,Y)$. Specifically, the information in X that is not obtained from Y can be obtained as the difference between the information in X and the Mutual Information (shared between the two)

## <font color=red> YOUR TASK: COMPLETE THE CODE FOR MUTUAL INFORMATION </font>


In [25]:
HAG = -np.sum(PAG * np.log2(PAG))

# Marginals: Sum along the grades/absences axis 
PA = np.sum(PAG,axis=0)
# Sum along the absences axis 
PG = np.sum(PAG,axis=1)

# Entropy of Absences
HA = -np.sum(PAG * np.log2(PA[None,:]))
# Entropy of Grades
HG = -np.sum(PAG * np.log2(PG[:,None]))

# P(A)P(G) will be equal to the joint probability, if the features are independent
PAPG = PA[None,:]*PG[:,None]

# Mutual information is the KL divergence between PAPG and the joint probability
# Low mutual information implies that the measures are almost independent

# YOUR CODE HERE
#--------------------------------------------------
#PAG is p(x,y), PAPG is p(x)p(y)
MI = np.sum(PAG * np.log2(PAG/PAPG))

#--------------------------------------------------

# Conditional entropy: Information in A that is not explained by G
HAgivenG = HA - MI
# Conditional entropy: Information in G that is not explained by A
HGgivenA = HG - MI


PGgivenA = PAG/PA[None,:]
PAgivenG = PAG/PG[:,None]

print('Joint probabilities')
print(PAG)
print('--------------------')

print('Product of marginals')
print(PAPG)
print('--------------------')

print('Mutual Information')
print(MI)
print('--------------------')

print('Entropy of Absences',HA)
print('Entropy of Grades',HG)
print('Enropy of A given G',HAgivenG)
print('Enropy of G given A',HGgivenA)



Joint probabilities
[[0.70126582 0.19746835]
 [0.08860759 0.01265823]]
--------------------
Product of marginals
[[0.70988624 0.18884794]
 [0.07998718 0.02127864]]
--------------------
Mutual Information
0.0039541826667557275
--------------------
Entropy of Absences 0.7417246276548037
Entropy of Grades 0.4729953622931483
Enropy of A given G 0.737770444988048
Enropy of G given A 0.4690411796263926


## <font color=red>To do: Compute conditional entropies differently </font>

Conditional entropy H(X|Y): information in X that is not obtained from Y is the expectation of the conditional probabilities
$$H(X|Y) = -\sum_{i,j} p(x_i,y_i) \log_2 p(x_i|y_i)$$

You can use the conditional probabilities PGgivenA and PAgivenG above as well as the joint probabilites PAG to evaluate them direcly. Verify that numbers match the ones computed using mutual information


In [26]:
# Modify the following code

HAgivenGComputedDifferently = -np.sum(PAG * np.log2(PAgivenG))
HGgivenAComputedDifferently = -np.sum(PAG * np.log2(PGgivenA))

#--------------------------------------
print('Enropy of A given G computed directly',HAgivenGComputedDifferently)
print('Enropy of G given A computed directly',HGgivenAComputedDifferently)



Enropy of A given G computed directly 0.7377704449880482
Enropy of G given A computed directly 0.46904117962639263
