### Q26 - Entropies and Expectations


The expectation of a function of a discrete random variable is denoted as 

$$\langle f(x)\rangle \equiv \sum_{x \in X} f(x)p(x) $$

Similarly, for a pair of random variables, we have the expectation

$$ \langle f(x, y) \rangle \equiv \sum_{x \in X} \sum_{y \in Y} f(x,y)p(x,y)$$

The variance is defined as 

$$ Var\{f(x)\} = \langle {(f(x) - \langle f(x) \rangle)}^{2} \rangle $$

It is a measure of spread. For a pair of random variables, the covariance is 

$$ Cov[f(x), g(y)] = \langle (f(x) - \langle f(x) \rangle ) (g(y) - \langle g(y) \rangle ) \rangle$$

The covariance gives the information about the dependence between $ f(x) $ and $ g(y)$

Given a probability table $p(x,y)$ specified as a matrix and respective domains of two discrete variable $ x \in X $ and $ y \in Y$,  

Expectations $\langle x \rangle$, $\langle y \rangle$, $\langle y | x \rangle$, $\langle x | y \rangle$, $Cov[x,y]$ are below:

In [1]:
def marginal_probability(joint_probability, axis):
    '''
    Gets marginal probability given joint probabilities and axis for summing probabilities
    axis should be 1 for variable X to sum the joint probability table over rows
    axis should be 0 for variable Y to sum the joint probability table over columns  
    '''
    return np.sum(joint_probability, axis=axis).flatten()

In [2]:
def expectation_of_variable(joint_probability, variable, axis):
    '''
    Calculates expectation of variable given joint probabilities, values of variable and axis for marginal probability 
    axis should be 1 for variable X to sum the joint probability table over rows
    axis should be 0 for variable Y to sum the joint probability table over columns    
    '''
    marginal_probability = np.sum(joint_probability, axis=axis).flatten()
    return np.asscalar(np.dot(marginal_probability, np.transpose(variable)))

In [3]:
def conditional_probability(joint_probability, axis):
    '''
    Calculates conditional probability given joint probabilities and axis for computing marginal probabilities 
    '''
    if axis == 1:
        return np.transpose(np.transpose(joint_probability) / marginal_probability(joint_probability, axis))        
    else:
        return joint_probability / marginal_probability(joint_probability, axis)

In [4]:
def expectation_of_pair(probability_table, X, Y):
    '''
    Computes the expectation of pair given values of variables and their probability distribution
    '''
    expectation = 0
    for ind_x, x in enumerate(X):
        for ind_y, y in enumerate(Y):
            expectation += x * y * probability_table[ind_x,ind_y]
    return expectation        

In [5]:
def covariance(joint_probability, X, Y):
    '''
    Calculates the covariance of given variables for given probability distribution
    '''
    expectation = 0
    for ind_x, x in enumerate(X):
        for ind_y, y in enumerate(Y):
            expectation += (x - expectation_of_variable(joint_probability, X, 1)) * (y - expectation_of_variable(joint_probability, Y, 0)) * joint_probability[ind_x,ind_y]
    return expectation        

In [6]:
def log(x):
    '''
    Sets 0 to -inf in the log of given variable
    '''
    temp_x = np.copy(x)
    temp_x[x == 0] = 1
    return np.log(temp_x)

Joint Entropy
$$ H[x,y] = - {\langle logp(x,y) \rangle}_{p(x,y)}$$

In [7]:
def entropy(probability_table):
    '''
    Finds entropy for given probability table
    '''
    log_of_probability_table = log(probability_table)
    return -np.sum(-np.multiply(probability_table, log_of_probability_table))

Marginal Entropies

$$ H[x] = - {\langle logp(x) \rangle}_{p(x)}$$
$$ H[y] = - {\langle logp(y) \rangle}_{p(y)}$$

In [8]:
def marginal_entropy(joint_probability, axis):
    marginal_prob = marginal_probability(joint_probability, axis)
    log_of_marginal_prob = log(marginal_prob)
    return -np.sum(-np.multiply(marginal_prob, log_of_marginal_prob))

Conditional Entropies


$$ H[y|x] = - {\langle logp(y|x) \rangle}_{p(x,y)}$$
$$ H[x|y] = - {\langle logp(x|y) \rangle}_{p(x,y)}$$

In [9]:
def conditional_entropy(joint_probability, given_variable):
    cond_prob = conditional_probability(joint_probability, given_variable)
    log_cond_prob = log(cond_prob)
    return -np.sum(-np.multiply(joint_probability, log_cond_prob))   

Mutual Information

$$ I(x,y) = H[x] - H[x|y] = KL(p(x,y)||p(x)p(y)) $$


In [10]:
def mutual_information(joint_probability):
    # equal to H[X] - H[X|Y]
    return marginal_entropy(joint_probability, 0) - conditional_entropy(joint_probability, 1)   

Calculations for given joint probability table:

| p(x,y) 	| y = -1 	| y = 0 	| y = 5 	|
|--------	|--------	|-------	|-------	|
| x = 1  	| 0.3    	| 0.3   	| 0     	|
| x = 2  	| 0.1    	| 0.2   	| 0.1   	|

In [11]:
import numpy as np
X = [1, 2]
Y = [-1, 0, 5]
joint_probability = np.matrix([[0.3, 0.3, 0], [0.1, 0.2 , 0.1]])

In [12]:
print('EXPECTATIONS')
print('========================')
print('Expectation of x:')
print('{:.2f}'.format(expectation_of_variable(joint_probability, X, 1)))
print('Expectation of y:')
print('{:.2f}'.format(expectation_of_variable(joint_probability, Y, 0)))

print('Expectation of x given y:')
print('{:.2f}'.format(expectation_of_pair(conditional_probability(joint_probability, 1), X, Y)))
print('Expectation of y given x:')
print('{:.2f}'.format(expectation_of_pair(conditional_probability(joint_probability, 0), X, Y)))

print('Covariance of x, y:')
print("{:.2f}".format(covariance(joint_probability, X, Y)))

EXPECTATIONS
Expectation of x:
1.40
Expectation of y:
0.10
Expectation of x given y:
1.50
Expectation of y given x:
8.75
Covariance of x, y:
0.36


In [13]:
print('ENTROPIES')
print('=====================')
print('Joint entropy H(X,Y):')
joint_entropy = entropy(joint_probability)
print('{:.2f}'.format(joint_entropy))
print('Marginal Entropies')
h_x = marginal_entropy(joint_probability, 1)  
h_y = marginal_entropy(joint_probability, 0)      
print('H[x]: ' + '{:.2f}'.format(h_x))
print('H[y]: ' + '{:.2f}'.format(h_y))

print('Conditional Entropies')
h_x_given_y = conditional_entropy(joint_probability, 0)  
h_y_given_x = conditional_entropy(joint_probability, 1)      
print('H[x|y]: ' + '{:.2f}'.format(h_x_given_y))
print('H[y|x]: ' + '{:.2f}'.format(h_y_given_x))

print('Mutual Information I(X;Y):')
mutual_info = mutual_information(joint_probability)      
print('{:.2f}'.format(mutual_info)) 


ENTROPIES
Joint entropy H(X,Y):
-1.50
Marginal Entropies
H[x]: -0.67
H[y]: -0.94
Conditional Entropies
H[x|y]: -0.56
H[y|x]: -0.83
Mutual Information I(X;Y):
-0.11


We can verify following equations illustrated in the picture below.

$$ H(X,Y) = H(X) + H(Y|X) = H(X|Y) + H(Y) $$
$$ I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) $$ 
$$ H(X,Y) = I(X;Y) + H(X|Y) + H(Y|X) $$


![Entropy](Entropy.png "Entropy")


In [14]:
assert '{:.2f}'.format(joint_entropy) == '{:.2f}'.format(h_x + h_y_given_x) 
assert '{:.2f}'.format(joint_entropy) == '{:.2f}'.format(h_y + h_x_given_y)
assert '{:.2f}'.format(mutual_info) == '{:.2f}'.format(h_x - h_x_given_y) 
assert '{:.2f}'.format(mutual_info) == '{:.2f}'.format(h_y - h_y_given_x) 
assert '{:.2f}'.format(joint_entropy) == '{:.2f}'.format(mutual_info + h_x_given_y + h_y_given_x) 