## Implementation

In this project, you will need to use information measures to answer several questions. Therefore, in this first part, you are asked to write several functions that implement some of the main measures seen in the first theoretical lectures. Remember that you need to fill in this Jupyter Notebook to answer these questions. Pay particular attention to the required output format of each function.

In [1]:
# [Locked Cell] You can not import any extra Python library in this Notebook.
import numpy as np
import pandas as pd

# Project 1 - Information measures

The goal of this first project is to get accustomed to the information and uncertainty measures. We ask you to write a brief report (pdf format) collecting your answers to the different questions. All codes must be written in Python inside this Jupyter Notebook. No other code file will be accepted. Note that you can not change the content of locked cells or import any extra Python library than the ones already imported (numpy and pandas).

### Question 1

Write a function entropy that computes the entropy $\mathcal{H(X)}$ of a random variable $\mathcal{X}$ from its probability distribution $P_\mathcal{X} = (p_1, p_2, . . . , p_n)$. Give the mathematical formula that you are using and explain the key parts of your implementation. Intuitively, what is measured by the entropy?

In [2]:
def entropy(Px):
    """
    Computes the entropy from the marginal probability distribution.
    Arguments:
    ----------
    - Px :  Marginal probability distribution of the random
            variable X in a numpy array where Px[i]=P(X=i)
    Return:
    -------
    - The entropy of X (H(X)) as a number (integer, float or double).
    """
    return -np.sum(Px * np.log2(Px), where=(Px > 0)) # H(X) = -sum(P(Xi) * log2(P(Xi)))

### Question 2

Write a function joint_entropy that computes the joint entropy $\mathcal{H(X,Y)}$ of two discrete random variables $\mathcal{X}$ and $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$. Give the mathematical formula that you are using and explain the key parts of your implementation. Compare the entropy and joint_entropy functions (and their corresponding formulas), what do you notice?

In [3]:
def joint_entropy(Pxy):
    """
    Computes the joint entropy from the joint probability distribution.
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The joint entropy H(X,Y) as a number (integer, float or double).
    """
    return -np.sum(Pxy*np.log2(Pxy), where=(Pxy > 0)) # H(X,Y) = -sum(P(Xi,Yj) * log2(P(Xi,Yj)))

### Question 3

Write a function conditional_entropy that computes the conditional entropy $\mathcal{H(X|Y)}$ of a discrete random variable $\mathcal{X}$ given another discrete random variable $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$. Give the mathematical formula that you are using and explain the key parts of your implementation. Describe an equivalent way of computing that quantity.

In [4]:
def conditional_entropy(Pxy):
    """
    Computes the conditional entropy from the joint probability distribution.
    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The conditional entropy H(X|Y) as a number (integer, float or double)
    """
    # Py = np.sum(Pxy, axis=0)
    # Hxy = joint_entropy(Pxy)
    # Hy = entropy(Py)
    # Hx_y = Hxy - Hy # H(X|Y) = H(X,Y) - H(Y)
    # return Hx_y

    return joint_entropy(Pxy) - entropy(np.sum(Pxy, axis=0)) # H(X|Y) = H(X,Y) - H(Y)
    # Alternative: return -np.sum(Pxy * np.log2(Pxy / np.sum(Pxy, axis=0), where=(Pxy > 0)), where=(Pxy > 0)) # H(X|Y) = -sum(P(Xi,Yj) * log2(P(Xi,Yj)/P(Yj))

### Question 4

Write a function mutual_information that computes the mutual information $\mathcal{I(X;Y)}$ between two discrete random variables $\mathcal{X}$ and $\mathcal{Y}$ from the joint probability distribution $P_\mathcal{X,Y}$ . Give the mathematical formula that you are using and explain the key parts of your implementation. What can you deduce from the mutual information $\mathcal{I(X;Y)}$ on the relationship between $\mathcal{X}$ and $\mathcal{Y}$? Discuss.

In [5]:
def mutual_information(Pxy):
    """
    Computes the mutual information I(X;Y) from joint probability distribution

    Arguments:
    ----------
    - Pxy:  joint probability distribution of X and Y
            in a 2-D numpy array where Pxy[i][j]=P(X=i,Y=j)
    Return:
    -------
    - The mutual information I(X;Y) as a number (integer, float or double)
    """
    # Px = np.sum(Pxy, axis=1)
    # Hx = entropy(Px)
    # Hx_y = conditional_entropy(Pxy)
    # Ixy = Hx - Hx_y # I(X;Y) = H(X) - H(X|Y)
    # return Ixy

    return entropy(np.sum(Pxy, axis=1)) - conditional_entropy(Pxy) # I(X;Y) = H(X) - H(X|Y)
    # Alternative: return entropy(np.sum(Pxy, axis=1)) + entropy(np.sum(Pxy, axis=0)) - joint_entropy(Pxy) # I(X;Y) = H(X) + H(Y) - H(X,Y)
    # Alternative: return np.sum(Pxy * np.log2(Pxy / np.outer(np.sum(Pxy, axis=1), np.sum(Pxy, axis=0))), where=(Pxy > 0)) # I(X;Y) = +sum(P(Xi,Yj) * log2(P(Xi,Yj)/(P(Xi)*P(Yj)))

### Question 5

Let $\mathcal{X}$, $\mathcal{Y}$ and $\mathcal{Z}$ be three discrete random variables. Write the functions cond_joint_entropy and cond_mutual_information that respectively compute $\mathcal{H(X,Y|Z)}$ and $\mathcal{I(X;Y|Z)}$ of two discrete random variable $\mathcal{X}$, $\mathcal{Y}$ given another discrete random variable $\mathcal{Z}$ from their joint probability distribution $P_\mathcal{X,Y,Z}$. Give the mathematical formulas that you are using and explain the key parts of your implementation.
Suggestion: Observe the mathematical definitions of these quantities and think how you could derive them from the joint entropy and the mutual information.

In [6]:
def cond_joint_entropy(Pxyz):
    """
    Computes the conditional joint entropy of X, Y knowing Z
    from the joint probability distribution Pxyz
    Arguments:
    ----------
    - Pxyz: joint probability distribution of X, Y and Z
            in a 3-D array where Pxyz[i][j][k]=P(X=i,Y=j,Z=k)
    Return:
    -------
    - The conditional joint entropy H(X,Y|Z) as a number (integer, float or double)

    """
    return -np.sum(Pxyz * np.log2(Pxyz / np.sum(Pxyz, axis=(0,1), keepdims=True)), where=(Pxyz > 0)) # H(X,Y|Z) = -sum(P(Xi,Yj,Zk) * log2(P(Xi,Yj,Zk)/(P(Zk)))

In [7]:
def cond_mutual_information(Pxyz):
    """
    Computes the conditional mutual information of X, Y knowing Z
    from joint probability distribution Pxyz
    Arguments:
    ----------
    - Pxyz: joint probability distribution of X, Y and Z
            in a 3-D array where Pxyz[i][j][k]=P(X=i,Y=j,Z=k)
    Return:
    -------
    - I(X;Y|Z): The conditional joint entropy as a number (integer, float or double)

    """
    # Pxz = np.sum(Pxyz, axis=1)
    # Pyz = np.sum(Pxyz, axis=0)
    # Hx_z = conditional_entropy(Pxz)
    # Hy_z = conditional_entropy(Pyz)
    # Hxy_z = cond_joint_entropy(Pxyz)
    # Ixy_z = Hx_z + Hy_z - Hxy_z # I(X;Y|Z) = H(X|Z) + H(Y|Z) - H(X,Y|Z)
    # return Ixy_z

    return conditional_entropy(np.sum(Pxyz, axis=1)) + conditional_entropy(np.sum(Pxyz, axis=0)) - cond_joint_entropy(Pxyz) # I(X;Y|Z) = H(X|Z) + H(Y|Z) - H(X,Y|Z)

In [8]:
# [Locked Cell] Evaluation of your functions by the examiner.
# You don't have access to the evaluation, this will be done by the examiner.
# Therefore, this cell will return nothing for the students.
import os
if os.path.isfile("private_evaluation.py"):
    from private_evaluation import unit_tests
    unit_tests(entropy, joint_entropy, conditional_entropy, mutual_information, cond_joint_entropy, cond_mutual_information)

### Predicting the result of the Information and Coding Theory exam

You may create cells below to answer the different questions related to result of the Information and Coding Theory exam questions. Unlike in the first part (Implementation), you are free to define as many cells as you need below to answer the different questions. Try to be structured and clear in your code (comment it if necessary). Note that you have to answer the questions in the pdf report, including the numbers you get!

### Loading data

In [9]:
# Load the dataset
file_path = "data.csv"
df = pd.read_csv(file_path)

# Display basic information about the dataset
df.info()
# df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 11 columns):
 #   Column                            Non-Null Count  Dtype 
---  ------                            --------------  ----- 
 0   Exam result                       5000 non-null   object
 1   Grade for the probability class   5000 non-null   object
 2   Project grade                     5000 non-null   object
 3   Time spent on project             5000 non-null   object
 4   Time spent studying               5000 non-null   object
 5   Interest in the course            5000 non-null   object
 6   Weather the week before the exam  5000 non-null   object
 7   Date                              5000 non-null   object
 8   Location                          5000 non-null   object
 9   Master                            5000 non-null   object
 10  Evalens score of the course       5000 non-null   object
dtypes: object(11)
memory usage: 429.8+ KB


###  Joint probability function & Verification

In [10]:
def probability(data, var_lst=[]):
    """
    Calculate the marginal probability or joint probability of given variables in the dataframe.

    Parameters:
    data (pd.DataFrame): The dataframe containing the data.
    variables (str): The columns for which to calculate the joint probability.

    Returns:
    pd.DataFrame: A dataframe containing the joint probability of the given variables.
    """
    # var_lst = list(reversed(var_lst))

    if len(var_lst) == 0:
        return None

    # Compute frequency table
    joint_df = data.groupby(var_lst).size().reset_index(name='count')

    # Get unique values for each variable
    unique_values = [data[var].unique() for var in var_lst]

    # Create full probability table with all combinations
    if len(var_lst) == 1:
        full_joint_df = joint_df
        full_joint_df['proba'] = full_joint_df['count'] / full_joint_df['count'].sum()
        probability_array = full_joint_df['proba'].values
    else:
        full_index = pd.MultiIndex.from_product(unique_values, names=var_lst)
        full_joint_df = joint_df.set_index(var_lst).reindex(full_index, fill_value=0).reset_index()
        full_joint_df['proba'] = full_joint_df['count'] / full_joint_df['count'].sum()
        probability_array = full_joint_df['proba'].values.reshape([len(vals) for vals in unique_values])

    return probability_array


### Question 6

Compute and report the entropy of each variable, and compare each value with its corresponding variable cardinality. What do you notice? Justify theoretically.  

In [11]:
# Compute the entropy of each variable
entropy_results = {}

# Denote X as each variable in the dataset

for col in df.columns:
    X_proba = probability(df, [col])
    entropy_value = entropy(X_proba)
    cardinality = len(X_proba)
    entropy_results[col] = {"Entropy, H(X)": entropy_value, "Cardinality, |X|": cardinality, "Probability distribution, P(X)": X_proba}

# Convert results into a DataFrame
entropy_df = pd.DataFrame.from_dict(entropy_results, orient="index")
entropy_df.index.name = "Variable, X"

# Display the results
display(entropy_df.sort_values(["Cardinality, |X|", "Entropy, H(X)"], ascending=[True, True]))


Unnamed: 0_level_0,"Entropy, H(X)","Cardinality, |X|","Probability distribution, P(X)"
"Variable, X",Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Weather the week before the exam,0.980511,2,"[0.418, 0.582]"
Exam result,0.991254,2,"[0.555, 0.445]"
Master,1.481246,3,"[0.5028, 0.3008, 0.1964]"
Grade for the probability class,1.489325,3,"[0.2002, 0.4948, 0.305]"
Interest in the course,1.5139,3,"[0.4106, 0.397, 0.1924]"
Evalens score of the course,1.515841,3,"[0.2128, 0.3234, 0.4638]"
Date,1.584271,3,"[0.332, 0.3466, 0.3214]"
Location,1.584961,3,"[0.3338, 0.3326, 0.3336]"
Project grade,1.493609,4,"[0.2242, 0.5178, 0.2564, 0.0016]"
Time spent studying,1.621822,5,"[0.5886, 0.0654, 0.2514, 0.0378, 0.0568]"


### Question 7

Compute and report the conditional entropy of Exam result given each of the other variables. Considering the variable descriptions, what do you notice when the conditioning variable is (a) Interest in the course and (b) master?

In [12]:
# Compute the conditional entropy of "Exam result" given each other variable
cond_entropy_results = {}

# Denote X as the variable "Exam result", Y as the other variables
X_var = "Exam result"

# Compute the conditional entropy H(X|Y) for each variable Y
for col in df.columns:
    if col != X_var:
        joint_proba = probability(df, [X_var, col])
        cond_entropy_value = conditional_entropy(joint_proba)
        cond_entropy_results[col] = {"Conditional entropy, H(Exam result | Y)": cond_entropy_value}

# Convert results into a DataFrame
cond_entropy_df = pd.DataFrame.from_dict(cond_entropy_results, orient="index")
cond_entropy_df.index.name = "Variable, Y"

# Display results
display(cond_entropy_df.sort_values("Conditional entropy, H(Exam result | Y)", ascending=True))


Unnamed: 0_level_0,"Conditional entropy, H(Exam result | Y)"
"Variable, Y",Unnamed: 1_level_1
Time spent studying,0.865424
Interest in the course,0.910074
Time spent on project,0.914489
Project grade,0.917995
Evalens score of the course,0.920023
Date,0.970676
Grade for the probability class,0.980955
Weather the week before the exam,0.989805
Master,0.991109
Location,0.991156


### Question 8

Compute the mutual information between the variables location and Evalens score of the course. What can you deduce about the relationship between these two variables?  What about the variables Time spent on the project and project grade?

In [13]:
# Compute mutual information of "Location" and "Evalens score of the course"
joint_proba_location_evalens = probability(df, ['Location', 'Evalens score of the course'])
mutual_info_location_evalens = mutual_information(joint_proba_location_evalens)

# Compute mutual information of "Time spent on project" and "Project grade"
joint_proba_time_project_grade = probability(df, ['Time spent on project', 'Project grade'])
mutual_info_time_project_grade = mutual_information(joint_proba_time_project_grade)

mutual_info_results = {
    "(Location; Evalens)": mutual_info_location_evalens,
    "(Time spent on project; Project grade)": mutual_info_time_project_grade
}

# Convert results into a DataFrame
mutual_info_df = pd.DataFrame.from_dict(mutual_info_results, orient="index", columns=["Mutual information, I(X; Y)"])
mutual_info_df.index.name = "(Variable X; Variable Y)"

# Display results
display(mutual_info_df)

  return -np.sum(Pxy*np.log2(Pxy), where=(Pxy > 0)) # H(X,Y) = -sum(P(Xi,Yj) * log2(P(Xi,Yj)))
  return -np.sum(Pxy*np.log2(Pxy), where=(Pxy > 0)) # H(X,Y) = -sum(P(Xi,Yj) * log2(P(Xi,Yj)))


Unnamed: 0_level_0,"Mutual information, I(X; Y)"
(Variable X; Variable Y),Unnamed: 1_level_1
(Location; Evalens),0.000199
(Time spent on project; Project grade),0.685335


### Question 9

A student in Computer Science from the University of Liège bets his friends that he can predict the upcoming exam by accessing the dataset. However, his hacking skills are still weak. Therefore, he can only access a single variable of the dataset to make its prediction. Using only the mutual information, which variable should he choose to get? Would using conditional entropy lead to another choice?

In [14]:
# Compute the mutual information of "Exam result" and each of the other variables
mutual_info_results = {}

# Denote X as the variable "Exam result", Y as the other variables
X_var = "Exam result"

# Compute the mutual information I(X;Y) for each variable Y
for col in df.columns:
    if col != X_var:
        joint_proba = probability(df, [X_var, col])
        mutual_info = mutual_information(joint_proba)
        mutual_info_results[col] = mutual_info

# Convert results into a DataFrame
mutual_info_df = pd.DataFrame.from_dict(mutual_info_results, orient="index", columns=["Mutual information, I(Exam result; Y)"])
mutual_info_df.index.name = "Variable, Y"

# Display results
display(mutual_info_df.sort_values("Mutual information, I(Exam result; Y)", ascending=False))

Unnamed: 0_level_0,"Mutual information, I(Exam result; Y)"
"Variable, Y",Unnamed: 1_level_1
Time spent studying,0.12583
Interest in the course,0.081181
Time spent on project,0.076765
Project grade,0.073259
Evalens score of the course,0.071231
Date,0.020578
Grade for the probability class,0.010299
Weather the week before the exam,0.001449
Master,0.000145
Location,9.8e-05


### Question 10

With the interest in the course considered as known, would you change your answer from the previous question? What can you say about the amount of information provided by this variable? Compare this value with previous results.

In [15]:
# Compute the conditional joint entropy of "Exam result" and each of the other variables when knowing "Interest in the course"
q10_results = {}

# Denote X as the variable "Exam result", Y as the other variables, Z as the variable "Interest in the course"
X_var = "Exam result"
Z_var = "Interest in the course"

# Compute the conditional joint entropy H(X,Y|Z) and mutual information I(X;Y|Z) for each variable Y
for col in df.columns:
    if col != X_var and col != Z_var:
        joint_proba = probability(df, [X_var, col, Z_var])
        cond_mutual_info = cond_mutual_information(joint_proba)
        q10_results[col] = {"I(Exam result; Y | Interest in the course)": cond_mutual_info}

# Convert results into a DataFrame
cond_joint_entropy_df = pd.DataFrame.from_dict(q10_results, orient="index")
cond_joint_entropy_df.index.name = "Variable, Y"

# Display results
display(cond_joint_entropy_df.sort_values("I(Exam result; Y | Interest in the course)", ascending=False))

  return -np.sum(Pxy*np.log2(Pxy), where=(Pxy > 0)) # H(X,Y) = -sum(P(Xi,Yj) * log2(P(Xi,Yj)))
  return -np.sum(Pxy*np.log2(Pxy), where=(Pxy > 0)) # H(X,Y) = -sum(P(Xi,Yj) * log2(P(Xi,Yj)))
  return -np.sum(Pxyz * np.log2(Pxyz / np.sum(Pxyz, axis=(0,1), keepdims=True)), where=(Pxyz > 0)) # H(X,Y|Z) = -sum(P(Xi,Yj,Zk) * log2(P(Xi,Yj,Zk)/(P(Zk)))
  return -np.sum(Pxyz * np.log2(Pxyz / np.sum(Pxyz, axis=(0,1), keepdims=True)), where=(Pxyz > 0)) # H(X,Y|Z) = -sum(P(Xi,Yj,Zk) * log2(P(Xi,Yj,Zk)/(P(Zk)))
  return -np.sum(Pxy*np.log2(Pxy), where=(Pxy > 0)) # H(X,Y) = -sum(P(Xi,Yj) * log2(P(Xi,Yj)))
  return -np.sum(Pxy*np.log2(Pxy), where=(Pxy > 0)) # H(X,Y) = -sum(P(Xi,Yj) * log2(P(Xi,Yj)))
  return -np.sum(Pxyz * np.log2(Pxyz / np.sum(Pxyz, axis=(0,1), keepdims=True)), where=(Pxyz > 0)) # H(X,Y|Z) = -sum(P(Xi,Yj,Zk) * log2(P(Xi,Yj,Zk)/(P(Zk)))
  return -np.sum(Pxyz * np.log2(Pxyz / np.sum(Pxyz, axis=(0,1), keepdims=True)), where=(Pxyz > 0)) # H(X,Y|Z) = -sum(P(Xi,Yj,Zk) * log2(P(Xi,Yj,Zk)/(

Unnamed: 0_level_0,I(Exam result; Y | Interest in the course)
"Variable, Y",Unnamed: 1_level_1
Time spent studying,0.045481
Evalens score of the course,0.035664
Date,0.029738
Grade for the probability class,0.010938
Project grade,0.009282
Weather the week before the exam,0.002484
Location,0.002438
Time spent on project,0.001196
Master,0.000784
