# COMP8755 Project Artefacts
## Ragib Zaman
## u6341578

This notebook (and several helper python files) supplements the report submitted for COMP8755. References cited here are listed in the report.

In this notebook we implement 8 methods of performing Aitchion Simplex Regression. Aitchison Simplex Regression is the problem of finding a function which takes as input $m$ independent compositional variables and estimates a target compositional variable, where all of the compositional variables lie within the same Aitchision simplex $S^d.$ 

We generally maintain the following notation:

$n$ = number of samples, indexed by $i$

$d$ = number of parts in the compositions, index by $j$

$m$ = number of independent compositional variables, indexed by $k$

# Models for Aitchision Simplex Regression

We are interesting in finding functions $f$ that take $m$ d-part compositions $u_1,\ldots, u_m \in S^d$ and return an estimate of a corresponding target d-part composition $v\in S^d.$

Wang et al. (2013, https://www.sciencedirect.com/science/article/pii/S0925231213005808) proposed the following model for this problem:

$$ f( u_1, \ldots, u_m) = \bigoplus_{k=1}^m \beta_k \otimes u_k$$

They provide a closed form solution for the parameters $\beta_k$ which minimize the sum of the squared Aitchison distances between the model predictions $f(\ldots)$ and the true labels $v,$ which is equivalent to minimizing the sum of the squared L2 distances between the ilr-transformed coordinates. Their model does not include a bias term, but instead they first centralize all of their data and apply a corresponding uncentering to produce their final predictions. 

We considered 7 new algorithms for this problem. First, we consider models which include a bias term $\beta_0 \in S^d$ in addition to the real parameters $\beta_1,\ldots \beta_m:$

$$f( u_1, \ldots, u_m) = \beta_0 \ \oplus \ \bigoplus_{k=1}^m \beta_k \otimes u_k$$

Second, in addition to the L2 loss we also consider a KL-loss function motivated by arguments in a similar vein to Avalos-Fernandez et al. (2018). For example, in the unbiased case the loss function takes the form

$$ \ell = D_{\exp} \left( \left( \sum_k \beta_k X^{(k)} \right) W, clr(V) \right) $$

$$ = D_{KL} \left( \tilde{V}, \exp\left( \left( \sum_k \beta_k X^{(k)} \right) W \right) \right) $$

$$ = ( \mathbf{1}_{n\times 1} )^T \exp\left( \left( \sum_k \beta_k X^{(k)} \right) W \right) \mathbf{1}_{d\times 1} - \operatorname{trace}\left( \tilde{V}^T \left( \sum_k \beta_k X^{(k)} \right) W \right) + const$$

which is a convex function in the parameters $\beta_k.$ Our report contains a computation of the gradient for this loss function, so we can find the desired parameters efficiently through out-of-the-box first-order optimisation procedures. In our implementation we have used Scipy's BFGS optimiser. 

Finally, we consider the the option of applying or not applying the centering-uncentering procedure. 

The options of (no bias / bias) , (L2 loss / KL loss) and (center-uncenter / no-centering) forms a total of 8 possibilities. Wang et al. (2013) studied the case where the first choice was chosen for these 3 options. 

To evaluate these models, we applied them to several datasets and measured their accuracy through 4 metrics. The Fisher-Rao and Symmetric KL metrics are considered to be more 'geometric' from the viewpoint of information geometry (Amari, 2016) and apply between to points on the Aitchison simplex $S^d.$ Forgetting the compositional structure and simply regarding the compositions as points in the ambient space $\mathbb{R}^d,$ we can also compute the L2 and L1 metrics, which are less 'geometric'. A discussion of the results printed below is contained in the report.


Please refer to the report for more background on the models, transformations, loss functions, gradient computations etc. used and/or implemented before.

In [1]:
# To run this notebook create a new conda environment with:
# conda create -n coda_env python=3.6 scipy numpy scikit-bio

import numpy as np
import scipy.stats
from sklearn.linear_model import LinearRegression
from skbio.stats.composition import *
from skbio.stats.composition import _gram_schmidt_basis
from scipy.optimize import minimize

import compositional_datasets as cd
from error_metrics import *

Sets of $n$ samples of targets in $S^d$ as expressed as n x d matrices V. U_vec is a list of length m, where the k-th entry is the size n x d data matrix of the k-th independent variable.

# Model coefficients
The following function computes Wang et al's (2013) closed form solution for the parameters $\beta_k$ which minimizes the L2 error of their model.

In [2]:
def L2_coeff(V, U_vec):
    m = len(U_vec)
    Y = ilr(V)
    X_vec = [ilr(U) for U in U_vec]
    A = np.array([ [np.trace(X_vec[i].T @ X_vec[j]) for i in range(m)] for j in range(m)])
    b = np.array( [np.trace(X_vec[k].T@Y) for k in range(m)]).T
    return np.linalg.solve(A,b)

The following two functions compute the parameters which minimize the KL-loss for an unbiased and biased model respectively.

In [3]:
def KL_coeff(V, U_vec):
    m = len(U_vec)
    V_tilde = np.exp(clr(V))
    C_vec = [clr(U) for U in U_vec]
    
    def kl_loss(beta):
        # beta in R^m
        clr_estimator = sum(beta[k] * C_vec[k] for k in range(m))
        g = np.sum(np.exp(clr_estimator))
        h = sum(beta[k] * np.trace(V_tilde.T @ C_vec[k]) for k in range(m))
        return g - h
    
    def kl_loss_grad(beta):
        clr_estimator = sum(beta[k] * C_vec[k] for k in range(m))
        g_grad = np.array([np.sum(C_vec[k] * np.exp(clr_estimator)) for k in range(m)])
        h_grad = np.array([np.trace(V_tilde.T@C_vec[k]) for k in range(m)])
        return g_grad - h_grad

    return minimize(kl_loss, np.array([0.0]*m), method='BFGS', jac=kl_loss_grad, tol=1e-16,
                    options={'gtol': 1e-012, 'disp':False}).x

In [4]:
def biasKL_coeff(V, U_vec):
    m = len(U_vec)
    V_tilde = np.exp(clr(V))
    C_vec = [clr(U) for U in U_vec]
    n, d = V.shape
    W = _gram_schmidt_basis(d)
    
    def bias_kl_loss(params):
        #params =(alpha, beta), length d-1+m
        # alpha in R^(d-1)
        # beta in R^m
        params = params.ravel() # Scipy's minimize method can return params as [[a,b,c]], so make it [a,b,c]
        alpha = params[:d-1].reshape((1,d-1))
        beta = params[d-1:]
        bias_term = np.ones((n,1)) @ alpha @ W
        clr_estimator = bias_term + sum(beta[k] * C_vec[k] for k in range(m))
        g = np.sum(np.exp(clr_estimator))
        h = alpha @ W @ V_tilde.T @ np.ones((n,1)) + sum(beta[k] * np.trace(V_tilde.T @ C_vec[k]) for k in range(m))
        return g - h
    
    def bias_kl_loss_grad(params):
        params = params.ravel() # Scipy's minimize method can return params as [[a,b,c]], so make it [a,b,c] 
        alpha = params[:d-1].reshape((1,d-1))
        beta = params[d-1:]
        bias_term = np.ones((n,1)) @ alpha @ W
        clr_estimator = bias_term + sum(beta[k] * C_vec[k] for k in range(m))
        
        g_beta_grad = np.array([np.sum(C_vec[k] * np.exp(clr_estimator)) for k in range(m)])
        h_beta_grad = np.array([np.trace(V_tilde.T@C_vec[k]) for k in range(m)])
        
        g_alpha_grad = np.array([np.sum((np.ones((n,1)) @ W[p].reshape((1,d))) * np.exp(clr_estimator)) for p in range(d-1)])
        h_alpha_grad = (W @ V_tilde.T @ np.ones((n,1))).T.reshape((d-1,))
        
        g_grad = np.concatenate([g_alpha_grad, g_beta_grad])
        h_grad = np.concatenate([h_alpha_grad, h_beta_grad])
        
        return g_grad - h_grad
    
    return minimize(bias_kl_loss, np.array([0.0]*(d-1+m)), method='BFGS', jac=bias_kl_loss_grad, tol=1e-16,
                    options={'gtol': 1e-012, 'disp':False}).x

# Model evaluation
The following function trains on training data and then on test data evaluates Wang. et al's (2013) model on our 4 chosen metrics.

In [5]:
def eval_model(coeff_func, V_train, V_test, U_train_vec, U_test_vec, center_data = True):
    m = len(U_train_vec)
    if center_data:
        V_train_centered = centralize(V_train)
        V_train_mean = closure(scipy.stats.gmean(V_train, axis=0))
        U_train_centered = [centralize(U) for U in U_train_vec]
        U_test_centered = [centralize(U) for U in U_test_vec]
        
        beta = coeff_func(V_train_centered, U_train_centered)
        V_prediction = clr_inv( sum(beta[k]* clr(U_test_centered[k]) for k in range(m)) 
                               + clr(np.array([V_train_mean]*V_test.shape[0])) )
        report_evaluation(V_test, V_prediction)
    else: # Do not center the data
        beta = coeff_func(V_train, U_train_vec)
        V_prediction = clr_inv( sum(beta[k]* clr(U_test_vec[k]) for k in range(m)))
        report_evaluation(V_test, V_prediction)

The following function trains on training data and evaluates the biased models the KL-loss on our 4 metrics. 

In [6]:
def eval_biased_model(coeff_func, V_train, V_test, U_train_vec, U_test_vec, center_data = True):
    m = len(U_train_vec)
    n, d = V_test.shape
    W = _gram_schmidt_basis(d)
    
    if center_data:
        V_train_centered = centralize(V_train)
        V_train_mean = closure(scipy.stats.gmean(V_train, axis=0))
        U_train_centered = [centralize(U) for U in U_train_vec]
        U_test_centered = [centralize(U) for U in U_test_vec]
        
        params = coeff_func(V_train_centered, U_train_centered)
        alpha = params[:d-1].reshape((1,d-1))
        
        print('\nbias : ', alpha[0], '\n')
        
        beta = params[d-1:]
        
        bias_term = np.ones((n,1)) @ alpha @ W
        V_prediction = clr_inv( bias_term + sum(beta[k] * clr(U_test_centered[k]) for k in range(m))
                                 + clr(np.array([V_train_mean]*V_test.shape[0]))  )
        
        report_evaluation(V_test, V_prediction)
    else: # Do not center the data
        params = coeff_func(V_train, U_train_vec)
        alpha = params[:d-1].reshape((1,d-1))
        
        print('\nbias : ', alpha[0], '\n')
        
        beta = params[d-1:]
        bias_term = np.ones((n,1)) @ alpha @ W
        
        V_prediction = clr_inv(bias_term + sum(beta[k]* clr(U_test_vec[k]) for k in range(m)))
        report_evaluation(V_test, V_prediction)

The following data trains on training data and evaluates a model with the L2 loss on our 4 metrics. Note that if we do not include a bias term in this model (i.e. set fit_bias = False), then this model is identical to Wang et al's (2013) model. We have used this as a check of correctness.

In [7]:
def eval_L2_model(V_train, V_test, U_train_vec, U_test_vec, center_data = True, fit_bias = True):
    m = len(U_train_vec)
    n, d = V_test.shape
    
    if center_data:
        V_train_centered = centralize(V_train)
        V_train_mean = closure(scipy.stats.gmean(V_train, axis=0))
        U_train_centered = [centralize(U) for U in U_train_vec]
        U_test_centered = [centralize(U) for U in U_test_vec]
        
        Y_train = ilr(V_train_centered)
        X_train_vec = [ilr(U) for U in U_train_centered]
        X_test_vec = [ilr(U) for U in U_test_centered]

        if fit_bias:
            #The following prepadding of the X_vec allows us to learn the bias term
            n_train = V_train.shape[0]
            train_bias_pad = [np.zeros((n_train,d-1)) for p in range(d-1)]
            for p, M in enumerate(train_bias_pad):
                M[:,p]=1

            X_train_vec = train_bias_pad + X_train_vec

            test_bias_pad = [np.zeros((n,d-1)) for p in range(d-1)]
            for p, M in enumerate(test_bias_pad):
                M[:,p]=1
            X_test_vec = test_bias_pad + X_test_vec
        
        
        
        
        y_train = Y_train.ravel()
        X_train = np.array([X.ravel() for X in X_train_vec]).T

        X_test = np.array([X.ravel() for X in X_test_vec]).T

        reg = LinearRegression(fit_intercept = False).fit(X_train, y_train)
        if fit_bias:
            print('\nbias : ', reg.coef_[:d-1], '\n')
        ilr_predictions = reg.predict(X_test).reshape((n,d-1)) + ilr(np.array([V_train_mean]*V_test.shape[0]))
        V_prediction = ilr_inv(ilr_predictions)

        report_evaluation(V_test, V_prediction)
        
    else: # Do not center the data
        Y_train = ilr(V_train)
        X_train_vec = [ilr(U) for U in U_train_vec]
        X_test_vec = [ilr(U) for U in U_test_vec]
        
        
        if fit_bias:
            #The following prepadding of the X_vec allows us to learn the bias term
            n_train = V_train.shape[0]
            train_bias_pad = [np.zeros((n_train,d-1)) for p in range(d-1)]
            for p, M in enumerate(train_bias_pad):
                M[:,p]=1

            X_train_vec = train_bias_pad + X_train_vec

            test_bias_pad = [np.zeros((n,d-1)) for p in range(d-1)]
            for p, M in enumerate(test_bias_pad):
                M[:,p]=1
            X_test_vec = test_bias_pad + X_test_vec

            
            
        y_train = Y_train.ravel()
        X_train = np.array([X.ravel() for X in X_train_vec]).T

        X_test = np.array([X.ravel() for X in X_test_vec]).T

        reg = LinearRegression(fit_intercept = False).fit(X_train, y_train)
        if fit_bias:
            print('\nbias : ', reg.coef_[:d-1], '\n')
        ilr_predictions = reg.predict(X_test).reshape((n,d-1))
        V_prediction = ilr_inv(ilr_predictions)

        report_evaluation(V_test, V_prediction)

In [8]:
def eval_all(V_train, V_test, U_train_vec, U_test_vec):
    print('=========    CENTERED    =========')
    print('\n-----  Unbiased  -----\n')
    
    print('----L2----\n')
    eval_model(L2_coeff, V_train, V_test, U_train_vec, U_test_vec)
    
    print('\n----KL----\n')
    eval_model(KL_coeff, V_train, V_test, U_train_vec, U_test_vec)
    
    print('\n------  Biased  ------\n')
    print('----L2----')
    eval_L2_model(V_train, V_test, U_train_vec, U_test_vec)
    print('\n----KL----')
    eval_biased_model(biasKL_coeff, V_train, V_test, U_train_vec, U_test_vec)
    
    print('\n\n=======    NOT CENTERED    =======')
    print('\n-----  Unbiased  -----\n')
    
    print('----L2----]n')
    eval_model(L2_coeff, V_train, V_test, U_train_vec, U_test_vec, center_data = False)
    
    print('\n----KL----\n')
    eval_model(KL_coeff, V_train, V_test, U_train_vec, U_test_vec, center_data = False)
    
    print('\n------  Biased  ------\n')
    print('----L2----')
    eval_L2_model(V_train, V_test, U_train_vec, U_test_vec, center_data = False)
    print('\n----KL----')
    eval_biased_model(biasKL_coeff, V_train, V_test, U_train_vec, U_test_vec, center_data = False)

# Economic Data from (H. Wang et al, Multiple linear regression for Compositional Data, 2013)

https://www.sciencedirect.com/science/article/pii/S0925231213005808

In [9]:
V, U1, U2 = cd.Wang_data

In [10]:
V_train, V_test = V[::2], V[1::2]
U1_train, U1_test = U1[::2], U1[1::2]
U2_train, U2_test = U2[::2], U2[1::2]
U_train_vec = [U1_train, U2_train]
U_test_vec = [U1_test, U2_test]

In [11]:
eval_all(V_train, V_test, U_train_vec, U_test_vec)


-----  Unbiased  -----

----L2----

Fisher-Rao:   0.38971
Symmetric KL: 0.02385
L2 error:     0.25552
L1 error:     0.37015

----KL----

Fisher-Rao:   0.39384
Symmetric KL: 0.02408
L2 error:     0.25747
L1 error:     0.37224

------  Biased  ------

----L2----

bias :  [-2.34549485e-16  1.11022302e-16] 

Fisher-Rao:   0.38971
Symmetric KL: 0.02385
L2 error:     0.25552
L1 error:     0.37015

----KL----

bias :  [0.01081073 0.00458697] 

Fisher-Rao:   0.39469
Symmetric KL: 0.02389
L2 error:     0.25629
L1 error:     0.37141



-----  Unbiased  -----

----L2----]n
Fisher-Rao:   1.19664
Symmetric KL: 0.23603
L2 error:     0.80810
L1 error:     1.15722

----KL----

Fisher-Rao:   1.14186
Symmetric KL: 0.20762
L2 error:     0.74897
L1 error:     1.08256

------  Biased  ------

----L2----

bias :  [-1.04067439 -0.50696972] 

Fisher-Rao:   0.37744
Symmetric KL: 0.02114
L2 error:     0.25523
L1 error:     0.36988

----KL----

bias :  [-1.22145815 -0.63834078] 

Fisher-Rao:   0.29758
Symmetric

# D17 Aitchison

In [12]:
# Dataset 17 Aitchison (with V[13] adjusted due to error in book)
V, U1 = cd.A17_data

In [13]:
V_train, V_test = V[:12], V[12:]
U1_train, U1_test = U1[:12], U1[12:]
U_train_vec = [U1_train]
U_test_vec = [U1_test]

In [14]:
eval_all(V_train, V_test, U_train_vec, U_test_vec)


-----  Unbiased  -----

----L2----

Fisher-Rao:   1.17167
Symmetric KL: 0.47661
L2 error:     0.51419
L1 error:     0.79439

----KL----

Fisher-Rao:   1.24382
Symmetric KL: 0.53451
L2 error:     0.55754
L1 error:     0.88819

------  Biased  ------

----L2----

bias :  [-3.36387723e-17  2.77555756e-17] 

Fisher-Rao:   1.17167
Symmetric KL: 0.47661
L2 error:     0.51419
L1 error:     0.79439

----KL----

bias :  [0.06051754 0.20292682] 

Fisher-Rao:   1.44485
Symmetric KL: 0.72773
L2 error:     0.66173
L1 error:     1.06088



-----  Unbiased  -----

----L2----]n
Fisher-Rao:   2.09073
Symmetric KL: 1.63014
L2 error:     1.10491
L1 error:     1.77260

----KL----

Fisher-Rao:   2.08713
Symmetric KL: 1.62530
L2 error:     1.10268
L1 error:     1.76901

------  Biased  ------

----L2----

bias :  [-0.00201023 -1.31974866] 

Fisher-Rao:   1.16504
Symmetric KL: 0.46014
L2 error:     0.51615
L1 error:     0.80451

----KL----

bias :  [ 0.07677281 -1.97680232] 

Fisher-Rao:   1.35250
Symmetric

# GDP vs Employment by Sector for 158 Countries
### Source: en.wikipedia.org/wiki/List_of_countries_by_GDP_sector_composition

In [15]:
V, U1 = cd.GDPwiki_data
V_train, V_test = V[:120], V[120:]
U1_train, U1_test = U1[:120], U1[120:]
U_train_vec = [U1_train]
U_test_vec = [U1_test]

In [16]:
eval_all(V_train, V_test, U_train_vec, U_test_vec)


-----  Unbiased  -----

----L2----

Fisher-Rao:   9.79281
Symmetric KL: 3.92245
L2 error:     4.87165
L1 error:     7.52357

----KL----

Fisher-Rao:   9.86570
Symmetric KL: 3.98067
L2 error:     4.91057
L1 error:     7.59648

------  Biased  ------

----L2----

bias :  [-6.62252891e-16 -2.22044605e-16] 

Fisher-Rao:   9.79281
Symmetric KL: 3.92245
L2 error:     4.87165
L1 error:     7.52357

----KL----

bias :  [0.00116505 0.01462628] 

Fisher-Rao:   9.91331
Symmetric KL: 3.99530
L2 error:     4.94072
L1 error:     7.64811



-----  Unbiased  -----

----L2----]n
Fisher-Rao:   17.09993
Symmetric KL: 11.02420
L2 error:     8.54415
L1 error:     13.52680

----KL----

Fisher-Rao:   17.15592
Symmetric KL: 11.22285
L2 error:     8.63693
L1 error:     13.69218

------  Biased  ------

----L2----

bias :  [-0.86329099 -0.5252872 ] 

Fisher-Rao:   9.78037
Symmetric KL: 3.94411
L2 error:     4.86933
L1 error:     7.49494

----KL----

bias :  [-0.91568625 -0.49034554] 

Fisher-Rao:   9.99490
Sym

# Artificial  Dataset

Created by generating a dataset in $\mathbb{R}^{d-1}$ and then using the inverse ilr map to produce compositional data.

In [17]:
V, U_vec = cd.artificial_data

In [18]:
V_train, V_test = V[:20], V[20:]
U_train_vec = [U[:20] for U in U_vec]
U_test_vec = [U[20:] for U in U_vec]

In [19]:
eval_all(V_train, V_test, U_train_vec, U_test_vec)


-----  Unbiased  -----

----L2----

Fisher-Rao:   8.80424
Symmetric KL: 11.80289
L2 error:     3.61605
L1 error:     5.98129

----KL----

Fisher-Rao:   8.79927
Symmetric KL: 11.83832
L2 error:     3.61056
L1 error:     5.97627

------  Biased  ------

----L2----

bias :  [ 4.46856148e-16  1.33226763e-15 -1.11022302e-15  7.74328490e-16
 -8.14792772e-16 -1.34765133e-15 -2.14250857e-15  1.36800942e-15
  1.08794302e-15] 

Fisher-Rao:   8.80424
Symmetric KL: 11.80289
L2 error:     3.61605
L1 error:     5.98129

----KL----

bias :  [-0.03272113 -0.06001659 -0.00839264  0.01281397  0.03260636  0.13333889
 -0.02286804 -0.14949615 -0.04539334] 

Fisher-Rao:   8.87079
Symmetric KL: 12.11430
L2 error:     3.67002
L1 error:     6.06674



-----  Unbiased  -----

----L2----]n
Fisher-Rao:   0.82543
Symmetric KL: 0.08732
L2 error:     0.34704
L1 error:     0.56739

----KL----

Fisher-Rao:   0.72519
Symmetric KL: 0.06514
L2 error:     0.28410
L1 error:     0.48218

------  Biased  ------

----L2----
