###  Created by Luis Alejandro (alejand@umich.edu)

## Mutual Information
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" (in units such as shannons, commonly called bits) obtained about one random variable through observing the other random variable.

The mutual information of two jointly discrete random variables $X$ and $Y$ is calculated as a double sum:

\begin{eqnarray} 
  I(X;Y) = \sum_{y \in Y}\sum_{x \in X} p_{(X,Y)}(x,y) \log \left( \frac{p_{(X,Y)}(x,y)}{p_X(x)p_Y(y)} \right)
\tag{1}\end{eqnarray}

where $ p_{(X,Y)} $ is the joint probability mass function of $X$ and $Y$, and $p_{X}$ and $p_Y$ are the marginal probability mass functions of $X$ and $Y$ respectively.

In [1]:
import numpy as np
from sklearn.feature_selection import mutual_info_classif
from scipy.sparse.csc import csc_matrix
import scipy.io as sio
import time

import sys
sys.path.append('../')
from utils.feature_selection.mutual import MutualInfo
from utils.feature_selection.reports import report_feature_ranking

In [2]:
# Loads and pre-process dataset
dataset = sio.loadmat('../../datasets/classification/emails.mat')
X = dataset['X'] # spare matrix
X[X > 0] = 1
y = dataset['Y'].flatten()
vocab = list(dataset['vocab'][0][i][0] for i in range(X.shape[1]))

In [3]:
# Computes MI with custom implementation
start = time.perf_counter()
mi = MutualInfo(X, y, n_jobs=4)
mi.compute()
end = time.perf_counter()
print('Elpased time:', end-start)

Using parallel version
Elpased time: 2.8455677


In [4]:
# Reports result
report_feature_ranking(mi.info, vocab, 10)

Feature ranked 1 is (our) with value 0.189992
Feature ranked 2 is (click) with value 0.181527
Feature ranked 3 is (wrote) with value 0.160754
Feature ranked 4 is (your) with value 0.124104
Feature ranked 5 is (please) with value 0.123546
.
.
.

Feature ranked 9996 is (damaged) with value 0.000000
Feature ranked 9997 is (missiles) with value 0.000000
Feature ranked 9998 is (annoy) with value 0.000000
Feature ranked 9999 is (queen) with value 0.000000
Feature ranked 10000 is (violation) with value 0.000000


## Sklearn implementation
There are many other mutual information implementations and many research papers have been proposed related to this:

https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html

In [5]:
# Computes MI with sklearn implementation
start = time.perf_counter()
mi = mutual_info_classif(X,y)
end = time.perf_counter()
print('Elpased time:', end-start)

Elpased time: 8.333583299999999


In [6]:
# Reports result
report_feature_ranking(mi, vocab, 10)

Feature ranked 1 is (our) with value 0.131692
Feature ranked 2 is (click) with value 0.125825
Feature ranked 3 is (wrote) with value 0.111426
Feature ranked 4 is (your) with value 0.086022
Feature ranked 5 is (please) with value 0.085636
.
.
.

Feature ranked 9996 is (sexuality) with value -0.000000
Feature ranked 9997 is (los) with value -0.000000
Feature ranked 9998 is (archbishop) with value -0.000000
Feature ranked 9999 is (mim) with value -0.000000
Feature ranked 10000 is (helped) with value -0.000000
