# Demonstration of Agglomerative Info-Clustering and the CL Tree approximation.

This notebook can be run using the xeus-cling C++11 jupyter kernel.

## Setup

Specifies the include path.

In [1]:
#pragma cling add_include_path("../include")

Load the header files info-clustering (IC) algorithms. 

In [2]:
#include <IC/AIC>     // agglomerative info-clustering
#include <IC/ChowLiu> // Chow-Liu tree approximation

## Gaussian source

$\def\M#1{\boldsymbol{#1}}$Specify the covariance matrix $\M{S}=\M{A}\M{A}^{\intercal}$ for the random variables to be clustered. 

In [3]:
using namespace Eigen;

size_t k = 5;
double sigma = 1;
size_t n = 15;

MatrixXd A = MatrixXd::Zero(n, n + k);
for (size_t i = 0; i < A.rows(); i++) {
    A(i, i % k) = 1;
    A(i, k + i) = sigma;
}
MatrixXd S = A* A.transpose();

Print the matrices $\M{A}$ and $\M{S}$.

In [4]:
using namespace std;
cout << "A= \n" << A << endl;
cout << "S= \n" << S << endl;

A= 
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
S= 
2 0 0 0 0 1 0 0 0 0 1 0 0 0 0
0 2 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 2 0 0 0 0 1 0 0 0 0 1 0 0
0 0 0 2 0 0 0 0 1 0 0 0 0 1 0
0 0 0 0 2 0 0 0 0 1 0 0 0 0 1
1 0 0 0 0 2 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 2 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 2 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0 2 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 2 0 0 0 0 1
1 0 0 0 0 1 0 0 0 0 2 0 0 0 0
0 1 0 0 0 0 1 0 0 0 0 2 0 0 0
0 0 1 0 0 0 0 1 0 0 0 0 2 0 0
0 

Generate the entropy function for the gaussian vector.

In [5]:
#include <IC/gaussian>
using namespace IC;

GaussianEntropy gsf(S);

### Agglomerative info-clustering

In [6]:
{
    // generate the exact info-clustering solution via AIC
    cout << "Agglomerative info-clustering:" << endl;
    AIC psp(gsf);
    {
        size_t i = 0;
        cout << psp.getPartition(-1) << endl;
        while (psp.agglomerate(1E-8, 1E-10)) {
            cout << "agglomerates to " << psp.getPartition(-1) << " at critical value " << psp.getCriticalValues().back() << endl;
        }
    }
    vector<double> psp_gamma = psp.getCriticalValues();
    cout << "critical values : " << psp_gamma << endl;
    for (double gamma : psp.getCriticalValues()) {
        cout << "partition at threshold " << gamma << ":" << psp.getPartition(gamma) << endl;
    }
}

Agglomerative info-clustering:
[ [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] ]
agglomerates to [ [ 0 10 5 ] [ 1 11 6 ] [ 2 12 7 ] [ 3 13 8 ] [ 4 14 9 ] ] at critical value 0.173287
agglomerates to [ [ 0 10 5 1 2 3 4 11 6 12 7 13 8 14 9 ] ] at critical value 5.32907e-15
critical values : [ 0.173287 5.32907e-15 ]
partition at threshold 0.173287:[ [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] ]
partition at threshold 5.32907e-15:[ [ 0 10 5 ] [ 1 11 6 ] [ 2 12 7 ] [ 3 13 8 ] [ 4 14 9 ] ]


### Chow-Liu tree approximation

In [7]:
{
    // generate approximate solution via CL tree
    cout << "Info-clustering by CL tree approximation:" << endl;
    vector<size_t> first_node, second_node;
    vector<double> gamma;
    for (size_t i = 0; i < n; i++) {
        for (size_t j = 0; j < i; j++) {
            first_node.push_back(i);
            second_node.push_back(j);
            double I = gsf(vector<size_t> {i}) + gsf(vector<size_t> {j}) - gsf(vector<size_t> {i, j});
            gamma.push_back(I);
        }
    }
    CL cl(n, first_node, second_node, gamma);
    vector<double> cl_gamma = cl.getCriticalValues();
    cout << "critical values : " << cl_gamma << endl;
    for (double gamma : cl.getCriticalValues()) {
        cout << "partition at threshold " << gamma << ":" << cl.getPartition(gamma) << endl;
    }
}

Info-clustering by CL tree approximation:
critical values : [ 0.143841 0 ]
partition at threshold 0.143841:[ [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] ]
partition at threshold 0:[ [ 0 10 5 ] [ 1 11 6 ] [ 2 7 12 ] [ 3 8 13 ] [ 4 9 14 ] ]
