# Project 1 - Information measures

The goal of this first project is to get accustomed to the information and uncertainty measures. We ask you to write a brief report (pdf format) collecting your answers to the different questions. All codes must be written in Python inside this Jupyter Notebook. No other code file will be accepted. Note that you can not change the content of locked cells or import any extra Python library than the ones already imported (numpy and pandas).

## Implementation

In this project, you will need to use information measures to answer several questions. Therefore, in this first part, you are asked to write several functions that implement some of the main measures seen in the first theoretical lectures. Remember that you need to fill in this Jupyter Notebook to answer these questions. Pay particular attention to the required output format of each function.

In [1]:
# [Locked Cell] You can not import any extra Python library in this Notebook.
import numpy as np
import pandas as pd

### Question 1

Write a function entropy that computes the entropy $\mathcal{H(X)}$ of a random variable $\mathcal{X}$ from its probability distribution $P_\mathcal{X} = (p_1, p_2, . . . , p_n)$. Give the mathematical formula that you are using and explain the key parts of your implementation. Intuitively, what is measured by the entropy?

In [2]:
def entropy(Px):
    """
    Computes the entropy from the marginal probability distribution. 
    Arguments:
    ----------
    - Px :  Marginal probability distribution of the random 
            variable X in a numpy array where Px[i]=P(X=i)
    Return:
    -------
    - The entropy of X (H(X)) as a number (integer, float or double).
    """
    return - Px @ np.log2(Px, where=(Px!=0)) # Px must be a numpy array otherwise the `where` argument in np.log2 won't correctly work

### Question 2

Write a function joint_entropy that computes the joint entropy $\mathcal{H(X,Y)}$ of two discrete random variables $\mathcal{X}$ and $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$. Give the mathematical formula that you are using and explain the key parts of your implementation. Compare the entropy and joint_entropy functions (and their corresponding formulas), what do you notice?

In [3]:
def joint_entropy(Pxy):
    """
    Computes the joint entropy from the joint probability distribution.  
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y 
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The joint entropy H(X,Y) as a number (integer, float or double).
    """
    joint_entropy = 0
    for i in range(Pxy.shape[0]):
         for j in range(Pxy.shape[1]):
                 if Pxy[i,j] != 0:
                        joint_entropy += Pxy[i,j]*np.log2(Pxy[i,j])
    return -joint_entropy

### Question 3

Write a function conditional_entropy that computes the conditional entropy $\mathcal{H(X|Y)}$ of a discrete random variable $\mathcal{X}$ given another discrete random variable $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$. Give the mathematical formula that you are using and explain the key parts of your implementation. Describe an equivalent way of computing that quantity.

In [4]:
def conditional_entropy(Pxy):
    """
    Computes the conditional entropy from the joint probability distribution.
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y 
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The conditional entropy H(X|Y) as a number (integer, float or double)
    """
    conditional_entropy = 0
    for i in range(Pxy.shape[0]):
            for j in range(Pxy.shape[1]):
                    if Pxy[i,j]!=0 and sum(Pxy[:,j])!=0:
                        conditional_entropy += Pxy[i,j] * np.log2( Pxy[i,j]/sum(Pxy[:,j]) )
    return -conditional_entropy
                    

### Question 4

Write a function mutual_information that computes the mutual information $\mathcal{I(X;Y)}$ between two discrete random variables $\mathcal{X}$ and $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$ . Give the mathematical formula that you are using and explain the key parts of your implementation. What can you deduce from the mutual information $\mathcal{I(X;Y)}$ on the relationship between $\mathcal{X}$ and $\mathcal{Y}$? Discuss.

In [5]:
def mutual_information(Pxy):
    """
    Computes the mutual information I(X;Y) from joint probability distribution
    
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y 
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The mutual information I(X;Y) as a number (integer, float or double)
    """
    mutual_information = 0
    for i in range(Pxy.shape[0]):
            for j in range(Pxy.shape[1]):
                    marg_Px = sum(Pxy[i,:])
                    marg_Py = sum(Pxy[:,j])
                    if marg_Px!=0 and marg_Py!=0 and Pxy[i,j]!=0:
                            mutual_information += Pxy[i,j] * np.log2(Pxy[i,j]/(marg_Px*marg_Py))
    return mutual_information

### Question 5

Let $\mathcal{X}$, $\mathcal{Y}$ and $\mathcal{Z}$ be three discrete random variables. Write the functions cond_joint_entropy and cond_mutual_information that respectively compute $\mathcal{H(X,Y|Z)}$ and $\mathcal{I(X;Y|Z)}$ of two discrete random variable $\mathcal{X}$, $\mathcal{Y}$ given another discrete random variable $\mathcal{Z}$ from their joint probability distribution $P_\mathcal{X,Y,Z}$. Give the mathematical formulas that you are using and explain the key parts of your implementation.
Suggestion: Observe the mathematical definitions of these quantities and think how you could derive them from the joint entropy and the mutual information.

In [6]:
def cond_joint_entropy(Pxyz):
    """
    Computes the conditional joint entropy of X, Y knowing Z 
    from the joint probability distribution Pxyz
    Arguments:
    ----------
    - Pxyz: joint probability distribution of X, Y and Z
            in a 3-D array where Pxyz[i][j][k]=P(X=i,Y=j,Z=k)
    Return:
    -------
    - The conditional joint entropy H(X,Y|Z) as a number (integer, float or double)
    
    """
    cond_joint_entropy = 0
    for i in range(Pxyz.shape[0]):
            for j in range(Pxyz.shape[1]):
                    for k in range(Pxyz.shape[2]):
                            marg_Pz = np.sum(Pxyz[:,:,k])
                            if marg_Pz!=0 and Pxyz[i,j,k]!=0:
                                    cond_joint_entropy += Pxyz[i,j,k] * np.log2(Pxyz[i,j,k]/marg_Pz)
    return -cond_joint_entropy

In [7]:
def cond_mutual_information(Pxyz):
    """
    Computes the conditional mutual information of X, Y knowing Z 
    from joint probability distribution Pxyz
    Arguments:
    ----------
    - Pxyz: joint probability distribution of X, Y and Z
            in a 3-D array where Pxyz[i][j][k]=P(X=i,Y=j,Z=k)
    Return:
    -------
    - I(X;Y|Z): The conditional joint entropy as a number (integer, float or double)
    
    """
    Pxz = np.sum(Pxyz, axis=1)
    Pyz = np.sum(Pxyz, axis=0)
    return -cond_joint_entropy(Pxyz) + conditional_entropy(Pxz) + conditional_entropy(Pyz)

In [8]:
# [Locked Cell] Evaluation of your functions by the examiner. 
# You don't have access to the evaluation, this will be done by the examiner.
# Therefore, this cell will return nothing for the students.
import os
if os.path.isfile("private_evaluation.py"):
    from private_evaluation import unit_tests
    unit_tests(entropy, joint_entropy, conditional_entropy, mutual_information, cond_joint_entropy, cond_mutual_information)

## Weather forecasting

You may create cells below to answer the different questions related to weather forecasting. Unlike in the first part (Implementation), you are free to define as many cells as you need below to answer the different questions. Try to be structured and clear in your code (comment it if necessary). Note that you have to answer the questions in the pdf report, including the numbers you get!

Question 6

In [9]:
df = pd.read_csv("weather_data.csv", sep=",")
for col in df:
    entrop = entropy(df[col].value_counts(normalize=True))
    card = len(df[col].value_counts(normalize=True))
    #print(col, " : entropy = ", entrop, "; cardinality = ", card)
    print(f"{ col :<20}| " + "entropy = " + f"{entrop:<20}| " + "cardinality = " + f"{card:<20}")

print("Probability distribution") 
print(df["lightning"].value_counts(normalize=True))
print(df["air_quality"].value_counts(normalize=True))


temperature         | entropy = 1.5113935187221061  | cardinality = 3                   
air_pressure        | entropy = 0.9999971146079947  | cardinality = 2                   
same_day_rain       | entropy = 1.475468797174184   | cardinality = 3                   
next_day_rain       | entropy = 1.5686562064046452  | cardinality = 3                   
relative_humidity   | entropy = 0.9997963972977278  | cardinality = 2                   
wind_direction      | entropy = 1.9995507337173037  | cardinality = 4                   
wind_speed          | entropy = 1.5848180054843541  | cardinality = 3                   
cloud_height        | entropy = 1.5846220675718725  | cardinality = 3                   
cloud_density       | entropy = 1.5844638106709676  | cardinality = 3                   
month               | entropy = 3.5834131970628738  | cardinality = 12                  
day                 | entropy = 2.806398967708293   | cardinality = 7                   
daylight            |

In [10]:
# Maximum entropy (obtained with uniform distribution)
for i in [2,3,4,7,12]:
    print("Maximum entropy with uniform distrib and cardinality = ", f"{i:<2}", ":", entropy(np.ones(i)/i))

Maximum entropy with uniform distrib and cardinality =  2  : 1.0
Maximum entropy with uniform distrib and cardinality =  3  : 1.584962500721156
Maximum entropy with uniform distrib and cardinality =  4  : 2.0
Maximum entropy with uniform distrib and cardinality =  7  : 2.8073549220576037
Maximum entropy with uniform distrib and cardinality =  12 : 3.584962500721156


Question 7

In [11]:
for col in df:
    if col != "next_day_rain":
        Pxy = pd.crosstab(df["next_day_rain"], df[col], normalize=True).to_numpy()
        cond_ent = conditional_entropy(Pxy)
        print("H(next_day_rain | " + f"{col +')' :<20} = " + str(cond_ent))


H(next_day_rain | temperature)         = 1.5681010089559217
H(next_day_rain | air_pressure)        = 0.9399751579488524
H(next_day_rain | same_day_rain)       = 1.3894855510944033
H(next_day_rain | relative_humidity)   = 1.3010552470998942
H(next_day_rain | wind_direction)      = 1.567815335514243
H(next_day_rain | wind_speed)          = 1.5677670877577965
H(next_day_rain | cloud_height)        = 1.5667630289976597
H(next_day_rain | cloud_density)       = 1.5665898847425932
H(next_day_rain | month)               = 1.564879749222577
H(next_day_rain | day)                 = 1.567156809902154
H(next_day_rain | daylight)            = 1.5682591876897034
H(next_day_rain | lightning)           = 1.5682325748732024
H(next_day_rain | air_quality)         = 1.5678811341555414


Question 8

In [12]:
Pxy = pd.crosstab(df["relative_humidity"], df["wind_speed"], normalize=True).to_numpy()
print(f"{'Mutual information between relative_humidity and wind_speed ' :<60} = " + str(mutual_information(Pxy)))

Pxy = pd.crosstab(df["month"], df["temperature"], normalize=True).to_numpy()
print(f"{'Mutual information between month and temperature ' :<60} = " + str(mutual_information(Pxy)))


Mutual information between relative_humidity and wind_speed  = 0.00012439598067303954
Mutual information between month and temperature             = 0.5753467937246419


Question 9

In [13]:
for col in df:
    if col != "next_day_rain":
        Pxy = pd.crosstab(df["next_day_rain"], df[col], normalize=True).to_numpy()
        mut_info = mutual_information(Pxy)
        print("I(next_day_rain; " + f"{col +')' :<20} = " + str(mut_info))

I(next_day_rain; temperature)         = 0.0005551974487232802
I(next_day_rain; air_pressure)        = 0.6286810484557925
I(next_day_rain; same_day_rain)       = 0.17917065531024198
I(next_day_rain; relative_humidity)   = 0.2676009593047508
I(next_day_rain; wind_direction)      = 0.0008408708904022481
I(next_day_rain; wind_speed)          = 0.000889118646848536
I(next_day_rain; cloud_height)        = 0.0018931774069855228
I(next_day_rain; cloud_density)       = 0.0020663216620520324
I(next_day_rain; month)               = 0.0037764571820686353
I(next_day_rain; day)                 = 0.001499396502491182
I(next_day_rain; daylight)            = 0.00039701871494185036
I(next_day_rain; lightning)           = 0.0004236315314423574
I(next_day_rain; air_quality)         = 0.00077507224910362


Question 10

In [14]:
df2 = df[df["next_day_rain"] != "dry"]

for col in df2:
    if col != "next_day_rain":
        Pxy = pd.crosstab(df["next_day_rain"], df2[col], normalize=True).to_numpy()
        cond_ent = conditional_entropy(Pxy)
        mut_info = mutual_information(Pxy)
        print("H(next_day_rain | " + f"{col +')' :<20} = " + f"{str(cond_ent):<20} | " + 
                " I(next_day_rain; " + f"{col +')' :<20} = " + str(mut_info))

H(next_day_rain | temperature)         = 0.9984897673205075   |  I(next_day_rain; temperature)         = 0.0008216756551031133
H(next_day_rain | air_pressure)        = 0.9912367657089518   |  I(next_day_rain; air_pressure)        = 0.00807467726665878
H(next_day_rain | same_day_rain)       = 0.8412577092300247   |  I(next_day_rain; same_day_rain)       = 0.15805373374558573
H(next_day_rain | relative_humidity)   = 0.5601193454280589   |  I(next_day_rain; relative_humidity)   = 0.43919209754755173
H(next_day_rain | wind_direction)      = 0.9985380515646349   |  I(next_day_rain; wind_direction)      = 0.0007733914109755396
H(next_day_rain | wind_speed)          = 0.998610419155289    |  I(next_day_rain; wind_speed)          = 0.00070102382032141
H(next_day_rain | cloud_height)        = 0.9988374633287047   |  I(next_day_rain; cloud_height)        = 0.0004739796469060229
H(next_day_rain | cloud_density)       = 0.9990516764718314   |  I(next_day_rain; cloud_density)       = 0.000259766503

Question 11

In [15]:
for col in df2:
    if col != "next_day_rain" and col != "temperature":
        Pxyz = pd.crosstab(df[col], [df["next_day_rain"], df["temperature"]], normalize=True).to_numpy().reshape(-1, 3, 3)
        cond_ent = cond_joint_entropy(Pxyz)
        mut_info = cond_mutual_information(Pxyz)
        print("H(next_day_rain, " + f"{col +' | T°)' :<25} = " + f"{str(cond_ent):<20} | " + 
                " I(next_day_rain, ; " + f"{col +' | T°)' :<25} = " + str(mut_info))

H(next_day_rain, air_pressure | T°)        = 1.93860245187694     |  I(next_day_rain, ; air_pressure | T°)        = 0.6294687899897877
H(next_day_rain, same_day_rain | T°)       = 2.8628867981094843   |  I(next_day_rain, ; same_day_rain | T°)       = 0.17990196682207582
H(next_day_rain, relative_humidity | T°)   = 2.2995737105015377   |  I(next_day_rain, ; relative_humidity | T°)   = 0.267545482715412
H(next_day_rain, wind_direction | T°)      = 3.5650901256852205   |  I(next_day_rain, ; wind_direction | T°)      = 0.0016950431837090552
H(next_day_rain, wind_speed | T°)          = 3.151429557692598    |  I(next_day_rain, ; wind_speed | T°)          = 0.001222871219935806
H(next_day_rain, cloud_height | T°)        = 3.1500677909195143   |  I(next_day_rain, ; cloud_height | T°)        = 0.0024410845558919814
H(next_day_rain, cloud_density | T°)       = 3.148829820911224    |  I(next_day_rain, ; cloud_density | T°)       = 0.0032002372741724017
H(next_day_rain, month | T°)               =

## Playing with information theory-based strategy

Question 12

In [16]:
prob_distrib = np.ones(26)/26
entropy_5_fields = entropy(prob_distrib) # The entropy is identical for each of the 5 fields
print("Entropy of each of the 5 fields : ", entropy_5_fields)

distrib_board = np.ones(26**5)/(26**5)
entropy_board = entropy(distrib_board) 
print("Entropy of the whole game : ", entropy_board)

Entropy of each of the 5 fields :  4.700439718141093
Entropy of the whole game :  23.502198590564696


Question 13

In [17]:
# We know that the A is already placed (but may reappear), and we eliminate the T, B, L and E
prob_distrib = np.ones(22)/22
entropy_wrong_fields = entropy(prob_distrib)
print("Entropy gray fields : ", entropy_wrong_fields)

entropy_good_field = entropy(np.ones(1))
print("Entropy green field : ", entropy_good_field)

distrib_board = np.ones(22**4) / (22**4)
entropy_board = entropy(distrib_board) 
print("Entropy of the board : ", entropy_board)

Entropy gray fields :  4.459431618637297
Entropy green field :  0.0
Entropy of the board :  17.837726474551225


Question 14

In [33]:
# The G has 1/3 proba to be in each cell except the 2nd and the 4th because we know it is in the board but we don't know exactly in each 
# of the remaining cell
# The 4th cell can't contain neither A or G
# R,O,U,H are deleted also
prob_distrib_1_3_5 = np.ones(18) * ((2/3)/17) #2/3 because there is already a 1/3 proba that the G is there (for the first, third and fifth cells)
prob_distrib_1_3_5[0] = 1/3
entropy_1_3_5 = entropy(prob_distrib_1_3_5/np.sum(prob_distrib_1_3_5))
print("Entropy of fields 1, 3 and 5 : ", entropy_1_3_5)

# for the 2nd cell the entropy is once again 0
# for the 4th cell we know that we can't have a G
prob_distrib_4 = np.ones(17)/17 

entropy_4 = entropy(prob_distrib_4)
print("Entropy of field 4 : ", entropy_4)

################### !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# in the combinations, the first factor represents the G (if there is only 1 G, there is 3 possible ways in which it can be in the word, if there is 2 there is 3 possible ways also)
# the second factor represents the cells where the G is not, if there is 2 cells in which the G is not there is 17**2 possible combinations of the 17 remaining letters
# the third factor is the 4th cell, which can contain the 17 remaining letters
combinations_1G = 3 * 17**2 * 17
combinations_2G = 3 * 17 * 17
combinations_3G = 1 * 1 * 17
prob_distrib_game = np.ones(combinations_1G+combinations_2G+combinations_3G)/(combinations_1G+combinations_2G+combinations_3G)
entropy_game = entropy(prob_distrib_game)
print("Entropy of the whole board : ", entropy_game)
print("Sum of entropies of fields : ", 3*entropy_1_3_5 + entropy_4, " vs entropy of game : ", entropy_game)


Entropy of fields 1, 3 and 5 :  3.6432710615547155
Entropy of field 4 :  4.087462841250339
Entropy of the whole board :  13.931383892539227
Sum of entropies of fields :  15.017276025914486  vs entropy of game :  13.931383892539227
