# Blahut-Arimoto algorithm to compute the channel capacity for a given input-output response.

(c) 2018 Manuel Razo. This work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT). 

---

In [10]:
# Our numerical workhorses
import numpy as np
import pandas as pd

import itertools
# Import libraries to parallelize processes
from joblib import Parallel, delayed

# Import matplotlib stuff for plotting
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib

# Seaborn, useful for graphics
import seaborn as sns

# Pickle is useful for saving outputs that are computationally expensive
# to obtain every time
import pickle

import os
import glob

# Import the utils for this project
import chann_cap_utils as chann_cap

chann_cap.set_plotting_style()

# Magic function to make matplotlib inline; other style specs must come AFTER
%matplotlib inline

# This enables SVG graphics inline (only use with static plots (non-Bokeh))
%config InlineBackend.figure_format = 'svg'

figdir = '../../fig/blahut_algorithm_channel_capacity/'
tmpdir = '../../tmp/'

## Input probability that maximizes the information transmited through a channel $P(m \mid C)$

Given the symmetry in the mutual information, i.e. $I(m;C)=I(C;m)$ the problem of transmiting a message can be studied from two perspectives:
1. Given a **fix input** $P(C)$ what is the input-output response $P(m \mid C)$ that would minimize the information subject to some average distortion $D$.
2. Given a **fix channel** $P(m \mid C)$ what is the input distribution that would maximize the mutual information.

The first view is treated by rate-distortion theory. The second point is the problem of computing the so-called information capacity of a channel. This information capacity is defined as
\begin{equation}
    C \equiv \max_{P(C)} I(C;m),
\end{equation}
where the maximum is taken over the space of probabilities $P(C)$. This means that a probability distribution $P^*(C)$ achieves capacity if it maximizes the information that can be transmitted through a fix channel.

This second view brings interesting points for our theory. On the one hand we are trying to evolve/design a circuit which could respond given a distribution of inputs. For this case rate-distortion theory tells us the minimum amount of information the channel should transmit from the input to maintain an average growth rate. On the other hand we have the full theoretical input-output function response for a given set of parameters. In this case it would be interesting to find which distribution of inputs would maximize the mutual information, and what would that maximum mutual information value be.

Luckly for us in his elegant paper Blahut not only developed the rate-distortion algorithm we implemented previously, but he also showed a simple iterataive algorithm that can approximate the distribution of inputs that achieves capacity.

In this script we will implement such algorithm and compute the maximum mutual information one can transmit through a simple repression circuit!

## Testing the implementation of the algorithm.

Before going all in with the implementation of the algorithm is worth testing it with a simple example.

We will test the regular Blahut-Arimoto (BA) algorithm as well as the accelerated BA algorithm from Matz and Duhamel 2005.

Let's implement the BA first.

In [7]:
def channel_capacity(QmC, epsilon=1E-3, info=1E4):
    '''
    Performs the Blahut-Arimoto algorithm to compute the channel capacity
    given a channel QmC.

    Parameters
    ----------
    QmC : array-like 
        definition of the channel with C inputs and m outputs.
    epsilon : float.
        error tolerance for the algorithm to stop the iterations. The smaller
        epsilon is the more precise the rate-distortion function is, but also
        the larger the number of iterations the algorithm must perform
    info : int.
        Number indicating every how many cycles to print the cycle number as
        a visual output of the algorithm.
    Returns
    -------
    C : float.
        channel capacity, or the maximum information it can be transmitted 
        given the input-output function.
    pc : array-like.
        array containing the discrete probability distribution for the input 
        that maximizes the channel capacity
    '''
    # initialize the probability for the input.
    pC = np.repeat(1 / QmC.shape[0], QmC.shape[0])
        
    # Initialize variable that will serve as termination criteria
    Iu_Il = 1
    
    loop_count = 0
    # Perform a while loop until the stopping criteria is reached
    while Iu_Il > epsilon:
        if (loop_count % info == 0) & (loop_count != 0):
            print('loop : {0:d}, Iu - Il : {1:f}'.format(loop_count, Iu_Il))
        loop_count += 1
        # compute the relevant quantities. check the notes on the algorithm
        # for the interpretation of these quantities
        # cC = exp(∑_m Qm|C log(Qm|C / ∑_c pC Qm|C))
        sum_C_pC_QmC = np.sum((pC * QmC.T).T, axis=0)
        QmC_log_QmC_sum_C_pC_QmC = QmC * np.log(QmC / sum_C_pC_QmC)
        # check for values that go to -inf because of 0xlog0
        QmC_log_QmC_sum_C_pC_QmC[np.isnan(QmC_log_QmC_sum_C_pC_QmC)] = 0
        QmC_log_QmC_sum_C_pC_QmC[np.isneginf(QmC_log_QmC_sum_C_pC_QmC)] = 0
        cC = np.exp(np.sum(QmC_log_QmC_sum_C_pC_QmC, axis=1))
       
        # I_L log(∑_C pC cC)
        Il = np.log(np.sum(pC * cC))
        
        # I_U = log(max_C cC)
        Iu = np.log(cC.max())
        
        # pC = pC * cC / ∑_C pC * cC
        pC = pC * cC / np.sum(pC * cC)
        
        Iu_Il = Iu - Il
        
    # convert from nats to bits
    Il = Il / np.log(2)
    return Il, pC, loop_count

Let's test the algorithm with the exercise that Matz proposes in his paper.

In [8]:
QmC = np.array([[0.7, 0.1], [0.2, 0.2], [0.1, 0.7]])

Il, pC, loop_count = channel_capacity(QmC.T, 1E-10)

print('Regular BA algorithm')
print('Mutual info:', Il)
print('Input distribution:', pC)
print('Iterations:', loop_count)

Regular BA algorithm
Mutual info: 0.36514844544
Input distribution: [ 0.5  0.5]
Iterations: 1


Now the channel that Arimoto proposes in his paper

In [9]:
QmC = np.array([[0.6, 0.7, 0.5], [0.3, 0.1, 0.05], [0.1, 0.2, 0.45]])

Il, pC, loop_count = channel_capacity(QmC.T, 1E-10)

print('Regular BA algorithm')
print('Mutual info:', Il)
print('Input distribution:', pC)
print('Iterations:', loop_count)

Regular BA algorithm
Mutual info: 0.161631860824
Input distribution: [  5.01735450e-01   1.41574430e-09   4.98264548e-01]
Iterations: 305


The algorithm converges to the value reported on the paper. So we know that this implementation is working.


## Computing theoretical channel capacity for a simple repression regulatory circuit.

Given that the steady state mRNA and protein distributions $P(m, p \mid C)$ were computed using the maximum entropy approximation we can take the Lagrange multipliers associated with each inducer concentration $C$ and build the input output matrix $\mathbf{Q}_{g|c}$.

Let's read the data frame containing all these Lagrange multipliers.

In [13]:
df_maxEnt = pd.read_csv('../../data/csv_maxEnt_dist/' +
                        'MaxEnt_ss_Lagrange_multipliers.csv', index_col=0)
df_maxEnt.head()

Unnamed: 0,operator,binding_energy,repressor,inducer_uM,lambda_m1p0,lambda_m2p0,lambda_m3p0,lambda_m0p1,lambda_m0p2,lambda_m0p3,lambda_m1p1,lambda_m2p1,lambda_m1p2
0,O1,-15.3,1.0,0.0,-0.299015,-0.006073,0.000154,0.003058,-2.600041e-07,5.081917e-12,5.1e-05,-7.334357e-07,-4.339613e-10
1,O1,-15.3,1.0,0.1,-0.298284,-0.006106,0.000154,0.003059,-2.596131e-07,5.063921e-12,5.1e-05,-7.320921e-07,-4.315312e-10
2,O1,-15.3,1.0,1.0,-0.286882,-0.006613,0.00016,0.003074,-2.536017e-07,4.788598e-12,5e-05,-7.118746e-07,-3.943049e-10
3,O1,-15.3,1.0,5.0,-0.198932,-0.010932,0.000214,0.0033,-2.196924e-07,3.165593e-12,4.4e-05,-6.07658e-07,-1.596795e-10
4,O1,-15.3,1.0,7.5,-0.155355,-0.013584,0.000252,0.003574,-2.17261e-07,2.801419e-12,4.3e-05,-5.806576e-07,-9.309249e-11


Let's now define a function that takes a data frame for a particular repressor copy number and operator with all inducer concentrations, and computes each corresponding distribution $P(m, p \mid C)$ to then marginalize to obtain either $P(m \mid C) = \sum_p P(m, p \mid C)$, or $P(p \mid C) = \sum_m P(m, p \mid C)$.
Then it stacks all these marginal distributions to build the input-output matrix $\mathbf{Q}_{g|c}$.

In [27]:
def trans_matrix(df_lagrange, mRNA_space, protein_space, m_dist=True):
    '''
    Function that builds the transition matrix Qg|c for a series of
    concentrations c. It builds the matrix by using the tidy data-frames
    containing the list of Lagrange multipliers.
    
    Parameters
    ----------
    df_lagrange : pandas DataFrame.
        Data Frame containing the lagrange multipliers for a single straing,
        i.e. single operator and repressor copy number value.
    mRNA_space, protein_space : array-like.
        Array containing the sample space for the mRNA and the protein
        respectively
    m_dist : Bool. Default = True.
        Boolean indicating if the mRNA input-output matrix should be returned.
        If false the protein matrix is returned.
    
    Returns
    -------
    Qg|c : input output matrix in which each row represents a concentration
    and each column represents the probability of mRNA or protein copy number.
    '''
    # Extract unique concentrations
    c_array = df_lagrange['inducer_uM'].unique()
    
    # Extract the list of Lagrange multipliers
    lagrange_mult = [col for col in df_lagrange.columns if 'lambda' in col]
    # Extract the exponents corresponding to each Lagrange multiplier
    exponents = []
    for s in lagrange_mult:
        exponents.append([int(n) for n in list(s) if n.isdigit()])
    
    # Initialize input-output matrix
    if m_dist:
        Qgc = np.zeros([len(mRNA_space), len(c_array)])
    else:
        Qgc = np.zeros([len(protein_space), len(c_array)])
    
    # Group data frame by inducer concentration
    df_group = df_lagrange.groupby('inducer_uM')
    
    # Loop through each of the concentrations computing the distribution
    for i, (group, data) in enumerate(df_group):
        # Extract the Lagrange multiplier columns
        lagrange = data.loc[:, lagrange_mult].values[0]
        
        # Compute the distribution
        Pmp = chann_cap.maxEnt_from_lagrange(mRNA_space, protein_space, 
                                             lagrange,
                                             exponents=exponents)
        
        # Marginalize and add marignal distribution to Qg|c
        if m_dist:
            Qgc[:, i] = Pmp.sum(axis=0)
        else:
            Qgc[:, i] = Pmp.sum(axis=1)
            
    return Qgc

Having defined the function let's test it.

In [40]:
df_maxEnt.repressor.unique()

array([  1.00000000e+00,   2.00000000e+00,   3.00000000e+00,
         4.00000000e+00,   5.00000000e+00,   6.00000000e+00,
         8.00000000e+00,   9.00000000e+00,   1.00000000e+01,
         1.20000000e+01,   1.40000000e+01,   1.60000000e+01,
         1.90000000e+01,   2.20000000e+01,   2.60000000e+01,
         3.00000000e+01,   3.50000000e+01,   4.10000000e+01,
         4.80000000e+01,   5.60000000e+01,   6.60000000e+01,
         7.70000000e+01,   9.00000000e+01,   1.05000000e+02,
         1.23000000e+02,   1.43000000e+02,   1.67000000e+02,
         1.95000000e+02,   2.28000000e+02,   2.66000000e+02,
         3.11000000e+02,   3.63000000e+02,   4.24000000e+02,
         4.95000000e+02,   5.78000000e+02,   6.75000000e+02,
         7.89000000e+02,   9.21000000e+02,   1.07500000e+03,
         1.25600000e+03,   1.46700000e+03,   1.71300000e+03,
         2.00000000e+03])

In [42]:
# Extract sample data frame
df_lagrange = df_maxEnt[(df_maxEnt.operator == 'O1') &
                        (df_maxEnt.repressor == 4)]

# Define sample space
mRNA_space = np.arange(0, 40)
protein_space = np.arange(0, 2.3E4)

# Build input-output matrix Qg|c
QgC = trans_matrix(df_lagrange, mRNA_space, protein_space, False)

Il, pC, loop_count = channel_capacity(QgC.T, 1E-3)

print('channel capacity: {:.2f} bits'.format(Il))

channel capacity: 1.13 bits


Having defined this function let's compute the channel capacity all the unique repressor-operator combinations in `df_maxEnt`.

In [None]:
# Group df_maxEnt by operator and repressor copy number
df_group = df_maxEnt.groupby(['operator', 'repressor'])

# Define column names for data frame
names = ['operator', 'binding_enery', 'repressor', 
         'channcap_mRNA', 'channcap_protein']

# Initialize data frame to save channel capacity computations
df_channcap = pd.DataFrame(columns=names)

compute_channcap = True
if compute_channcap:
    # Initialize dataFrame to save channel capacity computation
    df_channcap_mRNA = pd.DataFrame(columns=names)
    
    # Define function to compute in parallel the channel capacity
    def cc_parallel(df_lagrange):
        # Build mRNA transition matrix
        Qmc = trans_matrix(df_lagrange, mRNA_space, protein_space, True)

        # Build mRNA transition matrix
        Qpc = trans_matrix(df_lagrange, mRNA_space, protein_space, False)

        # Compute the channel capacity with the Blahut-Arimoto algorithm
        cc_m, _, _ = channel_capacity(Qmc.T, epsilon=1E-3)
        cc_p, _, _ = channel_capacity(Qpc.T, epsilon=1E-3)
        
        # Extract operator and repressor copy number
        op = df_lagrange.operator.unique()[0]
        eRA = df_lagrange.binding_energy.unique()[0]
        rep = df_lagrange.repressor.unique()[0]
        
        return [op, eRA, rep, cc_m, cc_p]
    # Run the function in parallel
    ccaps = Parallel(n_jobs=6)(delayed(cc_parallel)(df_lagrange)
                                for group, df_lagrange in df_group)
    
    # Convert to tidy data frame
    ccaps = pd.DataFrame(ccaps, columns=names)

    # Concatenate to data frame
    df_channcap = pd.concat([df_channcap, ccaps], axis=0)
    
    # Save results
    df_channcap.to_csv('../../data/csv_maxEnt_dist/MaxEnt_chann_cap.csv',
                       index=False)

Having computed the channel capacity let's look at the plot!

In [None]:
df_group = df_channcap.groupby('operator')

fig, ax = plt.subplots(1, 1)
for group, data in df_group:
    ax.plot(data.repressor, data.channcap, label=group)

ax.set_xlabel('repressor copy number')
ax.set_ylabel('channel capacity (bits)')
ax.set_xscale('log')
ax.legend(loc=0, title='operator')