# Bayesian Gaussian CP decomposition

**Published**: September 30, 2020

**Author**: Xinyu Chen [[**GitHub homepage**](https://github.com/xinychen)]

**Download**: This Jupyter notebook is at our GitHub repository. If you want to evaluate the code, please download the notebook from the [**transdim**](https://github.com/xinychen/transdim/blob/master/imputer/BGCP.ipynb) repository.

This notebook shows how to implement the Bayesian Gaussian CP decomposition (BGCP) model on some real-world data sets. In the following, we will discuss:

- What the Bayesian Gaussian CP decomposition is.

- How to implement BGCP mainly using Python `numpy` with high efficiency.

- How to make imputation on some real-world spatiotemporal datasets.

To overcome the problem of missing values within multivariate time series data, this model takes into account low-rank tensor structure by folding data along day dimension. For an in-depth discussion of BGCP, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Xinyu Chen, Zhaocheng He, Lijun Sun (2019). <b>A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation</b>. Transportation Research Part C: Emerging Technologies, 98: 73-84. <a href="https://doi.org/10.1016/j.trc.2018.11.003" title="PDF"><b>[PDF]</b></a> 
</font>
</div>

We start by importing the necessary dependencies. We will make use of `numpy` and `scipy`.

In [1]:
import numpy as np
from numpy.random import multivariate_normal as mvnrnd
from scipy.stats import wishart
from numpy.random import normal as normrnd
from scipy.linalg import khatri_rao as kr_prod
from numpy.linalg import inv as inv
from numpy.linalg import solve as solve
from numpy.linalg import cholesky as cholesky_lower
from scipy.linalg import cholesky as cholesky_upper
from scipy.linalg import solve_triangular as solve_ut

In [2]:
def mvnrnd_pre(mu, Lambda):
    src = normrnd(size = (mu.shape[0],))
    return solve_ut(cholesky_upper(Lambda, overwrite_a = True, check_finite = False), 
                    src, lower = False, check_finite = False, overwrite_b = True) + mu

### CP decomposition

#### CP Combination (`cp_combine`)

- **Definition**:

The CP decomposition factorizes a tensor into a sum of outer products of vectors. For example, for a third-order tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$, the CP decomposition can be written as

$$\hat{\mathcal{Y}}=\sum_{s=1}^{r}\boldsymbol{u}_{s}\circ\boldsymbol{v}_{s}\circ\boldsymbol{x}_{s},$$
or element-wise,

$$\hat{y}_{ijt}=\sum_{s=1}^{r}u_{is}v_{js}x_{ts},\forall (i,j,t),$$
where vectors $\boldsymbol{u}_{s}\in\mathbb{R}^{m},\boldsymbol{v}_{s}\in\mathbb{R}^{n},\boldsymbol{x}_{s}\in\mathbb{R}^{f}$ are columns of factor matrices $U\in\mathbb{R}^{m\times r},V\in\mathbb{R}^{n\times r},X\in\mathbb{R}^{f\times r}$, respectively. The symbol $\circ$ denotes vector outer product.

- **Example**:

Given matrices $U=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]\in\mathbb{R}^{2\times 2}$, $V=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ \end{array} \right]\in\mathbb{R}^{3\times 2}$ and $X=\left[ \begin{array}{cc} 1 & 5 \\ 2 & 6 \\ 3 & 7 \\ 4 & 8 \\ \end{array} \right]\in\mathbb{R}^{4\times 2}$, then if $\hat{\mathcal{Y}}=\sum_{s=1}^{r}\boldsymbol{u}_{s}\circ\boldsymbol{v}_{s}\circ\boldsymbol{x}_{s}$, then, we have

$$\hat{Y}_1=\hat{\mathcal{Y}}(:,:,1)=\left[ \begin{array}{ccc} 31 & 42 & 65 \\ 63 & 86 & 135 \\ \end{array} \right],$$
$$\hat{Y}_2=\hat{\mathcal{Y}}(:,:,2)=\left[ \begin{array}{ccc} 38 & 52 & 82 \\ 78 & 108 & 174 \\ \end{array} \right],$$
$$\hat{Y}_3=\hat{\mathcal{Y}}(:,:,3)=\left[ \begin{array}{ccc} 45 & 62 & 99 \\ 93 & 130 & 213 \\ \end{array} \right],$$
$$\hat{Y}_4=\hat{\mathcal{Y}}(:,:,4)=\left[ \begin{array}{ccc} 52 & 72 & 116 \\ 108 & 152 & 252 \\ \end{array} \right].$$

In [3]:
def cp_combine(var):
    return np.einsum('is, js, ts -> ijt', var[0], var[1], var[2])

In [4]:
factor = [np.array([[1, 2], [3, 4]]), np.array([[1, 3], [2, 4], [5, 6]]), 
          np.array([[1, 5], [2, 6], [3, 7], [4, 8]])]
print(cp_combine(factor))
print()
print('tensor size:')
print(cp_combine(factor).shape)

[[[ 31  38  45  52]
  [ 42  52  62  72]
  [ 65  82  99 116]]

 [[ 63  78  93 108]
  [ 86 108 130 152]
  [135 174 213 252]]]

tensor size:
(2, 3, 4)


### Tensor Unfolding (`ten2mat`)

Using numpy reshape to perform 3rd rank tensor unfold operation. [[**link**](https://stackoverflow.com/questions/49970141/using-numpy-reshape-to-perform-3rd-rank-tensor-unfold-operation)]

In [5]:
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

### Computing Covariance Matrix (`cov_mat`)

For any matrix $X\in\mathbb{R}^{m\times n}$, `cov_mat` can return a $n\times n$ covariance matrix for special use in the following.

In [6]:
def cov_mat(mat, mat_bar):
    mat = mat - mat_bar
    return mat.T @ mat

## Bayesian Gaussian CP decomposition (BGCP)

### Model Description

#### Gaussian assumption

Given a matrix $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$ which suffers from missing values, then the factorization can be applied to reconstruct the missing values within $\mathcal{Y}$ by

$$y_{ijt}\sim\mathcal{N}\left(\sum_{s=1}^{r}u_{is} v_{js} x_{ts},\tau^{-1}\right),\forall (i,j,t),$$
where vectors $\boldsymbol{u}_{s}\in\mathbb{R}^{m},\boldsymbol{v}_{s}\in\mathbb{R}^{n},\boldsymbol{x}_{s}\in\mathbb{R}^{f}$ are columns of latent factor matrices, and $u_{is},v_{js},x_{ts}$ are their elements. The precision term $\tau$ is an inverse of Gaussian variance.

#### Bayesian framework

Based on the Gaussian assumption over tensor elements $y_{ijt},(i,j,t)\in\Omega$ (where $\Omega$ is a index set indicating observed tensor elements), the conjugate priors of model parameters (i.e., latent factors and precision term) and hyperparameters are given as

$$\boldsymbol{u}_{i}\sim\mathcal{N}\left(\boldsymbol{\mu}_{u},\Lambda_{u}^{-1}\right),\forall i,$$
$$\boldsymbol{v}_{j}\sim\mathcal{N}\left(\boldsymbol{\mu}_{v},\Lambda_{v}^{-1}\right),\forall j,$$
$$\boldsymbol{x}_{t}\sim\mathcal{N}\left(\boldsymbol{\mu}_{x},\Lambda_{x}^{-1}\right),\forall t,$$
$$\tau\sim\text{Gamma}\left(a_0,b_0\right),$$
$$\boldsymbol{\mu}_{u}\sim\mathcal{N}\left(\boldsymbol{\mu}_0,\left(\beta_0\Lambda_u\right)^{-1}\right),\Lambda_u\sim\mathcal{W}\left(W_0,\nu_0\right),$$
$$\boldsymbol{\mu}_{v}\sim\mathcal{N}\left(\boldsymbol{\mu}_0,\left(\beta_0\Lambda_v\right)^{-1}\right),\Lambda_v\sim\mathcal{W}\left(W_0,\nu_0\right),$$
$$\boldsymbol{\mu}_{x}\sim\mathcal{N}\left(\boldsymbol{\mu}_0,\left(\beta_0\Lambda_x\right)^{-1}\right),\Lambda_x\sim\mathcal{W}\left(W_0,\nu_0\right).$$


### Posterior Inference

In the following, we will apply Gibbs sampling to implement our Bayesian inference for the matrix factorization task.

#### - Sampling latent factors $\boldsymbol{u}_{i},i\in\left\{1,2,...,m\right\}$

Draw $\boldsymbol{u}_{i}\sim\mathcal{N}\left(\boldsymbol{\mu}_i^{*},(\Lambda_{i}^{*})^{-1}\right)$ with following parameters:

$$\boldsymbol{\mu}_{i}^{*}=\left(\Lambda_{i}^{*}\right)^{-1}\left\{\tau\sum_{j,t:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{v}_{j}\circledast\boldsymbol{x}_{t}\right)+\Lambda_u\boldsymbol{\mu}_u\right\},$$

$$\Lambda_{i}^{*}=\tau\sum_{j,t:(i,j,t)\in\Omega}\left(\boldsymbol{v}_{j}\circledast\boldsymbol{x}_{t}\right)\left(\boldsymbol{v}_{j}\circledast\boldsymbol{x}_{t}\right)^{T}+\Lambda_u.$$


#### - Sampling latent factors $\boldsymbol{v}_{j},j\in\left\{1,2,...,n\right\}$

Draw $\boldsymbol{v}_{j}\sim\mathcal{N}\left(\boldsymbol{\mu}_j^{*},(\Lambda_{j}^{*})^{-1}\right)$ with following parameters:

$$\boldsymbol{\mu}_{j}^{*}=\left(\Lambda_{j}^{*}\right)^{-1}\left\{\tau\sum_{i,t:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{x}_{t}\right)+\Lambda_v\boldsymbol{\mu}_v\right\}$$

$$\Lambda_{j}^{*}=\tau\sum_{i,t:(i,j,t)\in\Omega}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{x}_{t}\right)\left(\boldsymbol{u}_{i}\circledast\boldsymbol{x}_{t}\right)^{T}+\Lambda_v.$$


#### - Sampling latent factors $\boldsymbol{x}_{t},t\in\left\{1,2,...,f\right\}$

Draw $\boldsymbol{x}_{t}\sim\mathcal{N}\left(\boldsymbol{\mu}_t^{*},(\Lambda_{t}^{*})^{-1}\right)$ with following parameters:

$$\boldsymbol{\mu}_{t}^{*}=\left(\Lambda_{t}^{*}\right)^{-1}\left\{\tau\sum_{i,j:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{v}_{j}\right)+\Lambda_x\boldsymbol{\mu}_x\right\}$$

$$\Lambda_{t}^{*}=\tau\sum_{i,j:(i,j,t)\in\Omega}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{v}_{j}\right)\left(\boldsymbol{u}_{i}\circledast\boldsymbol{v}_{j}\right)^{T}+\Lambda_x.$$


In [7]:
def sample_factor(tau_sparse_tensor, tau_ind, factor, k, beta0 = 1):
    dim, rank = factor[k].shape
    dim = factor[k].shape[0]
    factor_bar = np.mean(factor[k], axis = 0)
    temp = dim / (dim + beta0)
    var_mu_hyper = temp * factor_bar
    var_W_hyper = inv(np.eye(rank) + cov_mat(factor[k], factor_bar) + temp * beta0 * np.outer(factor_bar, factor_bar))
    var_Lambda_hyper = wishart.rvs(df = dim + rank, scale = var_W_hyper)
    var_mu_hyper = mvnrnd_pre(var_mu_hyper, (dim + beta0) * var_Lambda_hyper)
    
    idx = list(filter(lambda x: x != k, range(len(factor))))
    var1 = kr_prod(factor[idx[1]], factor[idx[0]]).T
    var2 = kr_prod(var1, var1)
    var3 = (var2 @ ten2mat(tau_ind, k).T).reshape([rank, rank, dim]) + var_Lambda_hyper[:, :, np.newaxis]
    var4 = var1 @ ten2mat(tau_sparse_tensor, k).T + (var_Lambda_hyper @ var_mu_hyper)[:, np.newaxis]
    for i in range(dim):
        factor[k][i, :] = mvnrnd_pre(solve(var3[:, :, i], var4[:, i]), var3[:, :, i])
    return factor[k]

#### - Sampling precision term $\tau$

Draw $\tau\in\text{Gamma}\left(a^{*},b^{*}\right)$ with following parameters:

$$a^{*}=a_0+\frac{1}{2}|\Omega|,~b^{*}=b_0+\frac{1}{2}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\sum_{s=1}^{r}u_{is}v_{js}x_{ts}\right)^2.$$


In [8]:
def sample_precision_tau(sparse_tensor, tensor_hat, ind):
    var_alpha = 1e-6 + 0.5 * np.sum(ind)
    var_beta = 1e-6 + 0.5 * np.sum(((sparse_tensor - tensor_hat) ** 2) * ind)
    return np.random.gamma(var_alpha, 1 / var_beta)

### Define Performance Metrics

- **RMSE**
- **MAPE**

In [9]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

### Define BGCP with `Numpy`

In [10]:
def BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter):
    """Bayesian Gaussian CP (BGCP) decomposition."""
    
    dim = np.array(sparse_tensor.shape)
    rank = factor[0].shape[1]
    if np.isnan(sparse_tensor).any() == False:
        ind = sparse_tensor != 0
        pos_obs = np.where(ind)
        pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    elif np.isnan(sparse_tensor).any() == True:
        pos_test = np.where((dense_tensor != 0) & (np.isnan(sparse_tensor)))
        ind = ~np.isnan(sparse_tensor)
        pos_obs = np.where(ind)
        sparse_tensor[np.isnan(sparse_tensor)] = 0
    show_iter = 200
    tau = 1
    factor_plus = []
    for k in range(len(dim)):
        factor_plus.append(np.zeros((dim[k], rank)))
    temp_hat = np.zeros(dim)
    tensor_hat_plus = np.zeros(dim)
    for it in range(burn_iter + gibbs_iter):
        tau_ind = tau * ind
        tau_sparse_tensor = tau * sparse_tensor
        for k in range(len(dim)):
            factor[k] = sample_factor(tau_sparse_tensor, tau_ind, factor, k)
        tensor_hat = cp_combine(factor)
        temp_hat += tensor_hat
        tau = sample_precision_tau(sparse_tensor, tensor_hat, ind)
        if it + 1 > burn_iter:
            factor_plus = [factor_plus[k] + factor[k] for k in range(len(dim))]
            tensor_hat_plus += tensor_hat
        if (it + 1) % show_iter == 0 and it < burn_iter:
            temp_hat = temp_hat / show_iter
            print('Iter: {}'.format(it + 1))
            print('MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], temp_hat[pos_test])))
            print('RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], temp_hat[pos_test])))
            temp_hat = np.zeros(sparse_tensor.shape)
            print()
    factor = [i / gibbs_iter for i in factor_plus]
    tensor_hat = tensor_hat_plus / gibbs_iter
    print('Imputation MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], tensor_hat[pos_test])))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
    print()
    
    return tensor_hat, factor

## Data Organization

### Matrix Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),

$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$

### Tensor Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{nf},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We partition each time series into intervals of predifined length $f$. We express each partitioned time series as a matrix $Y_{i}$ with $n$ rows (e.g., days) and $f$ columns (e.g., discrete time intervals per day),

$$Y_{i}=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{n1} & y_{n2} & \cdots & y_{nf} \\ \end{array} \right]\in\mathbb{R}^{n\times f},i=1,2,...,m,$$

therefore, the resulting structure is a tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$.

## Evaluation on Guangzhou Speed Data

**Scenario setting**:

- Tensor size: $214\times 61\times 144$ (road segment, day, time of day)
- Random missing (RM)
- 40% missing rate


In [33]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')['random_tensor']
missing_rate = 0.4

## Random missing (RM)
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 80
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [34]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 80
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.084101
RMSE: 3.62712

Iter: 400
MAPE: 0.0835954
RMSE: 3.61192

Iter: 600
MAPE: 0.0833965
RMSE: 3.60484

Iter: 800
MAPE: 0.0832195
RMSE: 3.59929

Iter: 1000
MAPE: 0.0831062
RMSE: 3.59552

Imputation MAPE: 0.0830618
Imputation RMSE: 3.59306

Running time: 5216 seconds


**Scenario setting**:

- Tensor size: $214\times 61\times 144$ (road segment, day, time of day)
- Random missing (RM)
- 60% missing rate


In [35]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')['random_tensor']
missing_rate = 0.6

## Random missing (RM)
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 80
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [36]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 80
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.08519
RMSE: 3.67201

Iter: 400
MAPE: 0.0847316
RMSE: 3.66272

Iter: 600
MAPE: 0.0845525
RMSE: 3.65554

Iter: 800
MAPE: 0.0844206
RMSE: 3.65049

Iter: 1000
MAPE: 0.0842867
RMSE: 3.64605

Imputation MAPE: 0.0841852
Imputation RMSE: 3.64372

Running time: 3766 seconds


**Scenario setting**:

- Tensor size: $214\times 61\times 144$ (road segment, day, time of day)
- Non-random missing (NM)
- 40% missing rate


In [37]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')['random_matrix']
missing_rate = 0.4

## Non-random missing (NM)
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 10
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [38]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.102851
RMSE: 4.33134

Iter: 400
MAPE: 0.102636
RMSE: 4.33806

Iter: 600
MAPE: 0.102428
RMSE: 4.3298

Iter: 800
MAPE: 0.102428
RMSE: 4.32899

Iter: 1000
MAPE: 0.102387
RMSE: 4.32615

Imputation MAPE: 0.10241
Imputation RMSE: 4.32794

Running time: 202 seconds


## Evaluation on Birmingham Parking Data

**Scenario setting**:

- Tensor size: $30\times 77\times 18$ (parking slot, day, time of day)
- Random missing (RM)
- 40% missing rate


In [39]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')['tensor']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')['random_tensor']
missing_rate = 0.4

## Random missing (RM)
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 20
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [40]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.0795175
RMSE: 28.2892

Iter: 400
MAPE: 0.0752152
RMSE: 27.3606

Iter: 600
MAPE: 0.0742204
RMSE: 26.9547

Iter: 800
MAPE: 0.0754261
RMSE: 26.8998

Iter: 1000
MAPE: 0.0751039
RMSE: 26.7704

Imputation MAPE: 0.0724925
Imputation RMSE: 26.2367

Running time: 16 seconds


**Scenario setting**:

- Tensor size: $30\times 77\times 18$ (parking slot, day, time of day)
- Random missing (RM)
- 60% missing rate


In [41]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')['tensor']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')['random_tensor']
missing_rate = 0.6

## Random missing (RM)
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 20
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [42]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.0904082
RMSE: 33.2204

Iter: 400
MAPE: 0.087897
RMSE: 32.606

Iter: 600
MAPE: 0.0874777
RMSE: 32.373

Iter: 800
MAPE: 0.0871677
RMSE: 32.2292

Iter: 1000
MAPE: 0.0869017
RMSE: 32.2502

Imputation MAPE: 0.0865899
Imputation RMSE: 32.0476

Running time: 17 seconds


**Scenario setting**:

- Tensor size: $30\times 77\times 18$ (parking slot, day, time of day)
- Non-random missing (NM)
- 40% missing rate


In [43]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')['random_matrix']
missing_rate = 0.4

## Non-random missing (NM)
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 20
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [44]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.146139
RMSE: 75.6561

Iter: 400
MAPE: 0.157742
RMSE: 76.762

Iter: 600
MAPE: 0.167641
RMSE: 92.2227

Iter: 800
MAPE: 0.167714
RMSE: 98.3551

Iter: 1000
MAPE: 0.170823
RMSE: 102.301

Imputation MAPE: 0.167442
Imputation RMSE: 103.055

Running time: 16 seconds


## Evaluation on Hangzhou Flow Data

**Scenario setting**:

- Tensor size: $80\times 25\times 108$ (metro station, day, time of day)
- Random missing (RM)
- 40% missing rate


In [45]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')['random_tensor']
missing_rate = 0.4

## Random missing (RM)
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 30
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [46]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 30
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.197877
RMSE: 33.0393

Iter: 400
MAPE: 0.195282
RMSE: 38.3077

Iter: 600
MAPE: 0.196952
RMSE: 39.2784

Iter: 800
MAPE: 0.196789
RMSE: 38.7399

Iter: 1000
MAPE: 0.19573
RMSE: 40.7125

Imputation MAPE: 0.196447
Imputation RMSE: 42.1937

Running time: 98 seconds


In [47]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')['random_tensor']
missing_rate = 0.6

## Random missing (RM)
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 30
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [48]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 30
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.201009
RMSE: 31.9734

Iter: 400
MAPE: 0.200853
RMSE: 34.0132

Iter: 600
MAPE: 0.20077
RMSE: 34.5279

Iter: 800
MAPE: 0.202308
RMSE: 35.7216

Iter: 1000
MAPE: 0.202688
RMSE: 35.6643

Imputation MAPE: 0.201724
Imputation RMSE: 34.8941

Running time: 92 seconds


In [49]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')['random_matrix']
missing_rate = 0.4

## Non-random missing (NM)
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 30
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [50]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 30
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.207448
RMSE: 32.0295

Iter: 400
MAPE: 0.211257
RMSE: 40.8803

Iter: 600
MAPE: 0.209879
RMSE: 43.8077

Iter: 800
MAPE: 0.207846
RMSE: 44.4036

Iter: 1000
MAPE: 0.207392
RMSE: 45.1708

Imputation MAPE: 0.207332
Imputation RMSE: 45.6098

Running time: 84 seconds


## Evaluation on Seattle Speed Data

**Scenario setting**:

- Tensor size: $323\times 28\times 288$ (road segment, day, time of day)
- Random missing (RM)
- 40% missing rate


In [27]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
missing_rate = 0.4

## Random missing (RM)
binary_tensor = np.round(RM_mat.reshape([RM_mat.shape[0], 28, 288]) + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 50
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [28]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 50
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.0761436
RMSE: 4.56253

Iter: 400
MAPE: 0.0753259
RMSE: 4.53309

Iter: 600
MAPE: 0.0749637
RMSE: 4.51853

Iter: 800
MAPE: 0.0746534
RMSE: 4.50695

Iter: 1000
MAPE: 0.0744455
RMSE: 4.50012

Imputation MAPE: 0.074248
Imputation RMSE: 4.49303

Running time: 2489 seconds


**Scenario setting**:

- Tensor size: $323\times 28\times 288$ (road segment, day, time of day)
- Random missing (RM)
- 60% missing rate


In [29]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
missing_rate = 0.6

## Random missing (RM)
binary_tensor = np.round(RM_mat.reshape([RM_mat.shape[0], 28, 288]) + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 50
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [30]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 50
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.076408
RMSE: 4.56933

Iter: 400
MAPE: 0.0756105
RMSE: 4.54629

Iter: 600
MAPE: 0.0749344
RMSE: 4.52416

Iter: 800
MAPE: 0.0748887
RMSE: 4.52134

Iter: 1000
MAPE: 0.0747836
RMSE: 4.51837

Imputation MAPE: 0.0746966
Imputation RMSE: 4.51645

Running time: 2677 seconds


**Scenario setting**:

- Tensor size: $323\times 28\times 288$ (road segment, day, time of day)
- Non-random missing (NM)
- 40% missing rate


In [31]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0).values
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0).values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])
missing_rate = 0.4

## Non-random missing (NM)
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 10
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [32]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.100606
RMSE: 5.68501

Iter: 400
MAPE: 0.100466
RMSE: 5.70087

Iter: 600
MAPE: 0.100782
RMSE: 5.7248

Iter: 800
MAPE: 0.100941
RMSE: 5.74121

Iter: 1000
MAPE: 0.100902
RMSE: 5.74453

Imputation MAPE: 0.100886
Imputation RMSE: 5.74875

Running time: 327 seconds


## Evaluation on London Movement Speed Data

**Scenario setting**:

- Tensor size: $35912\times 30\times 24$ (road segment, day, time of day)
- Random missing (RM)
- 40% missing rate


In [44]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.4

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]

## Random missing (RM)
random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])
binary_mat = np.round(random_mat + 0.5 - missing_rate)
sparse_mat = np.multiply(dense_mat, binary_mat)

dense_tensor = dense_mat.reshape([dense_mat.shape[0], 30, 24])
sparse_tensor = sparse_mat.reshape([sparse_mat.shape[0], 30, 24])
del dense_mat, sparse_mat

**Model setting**:

- Low rank: 20
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [47]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.0922686
RMSE: 2.24709

Iter: 400
MAPE: 0.0921095
RMSE: 2.24406

Iter: 600
MAPE: 0.0920888
RMSE: 2.24348

Iter: 800
MAPE: 0.0920806
RMSE: 2.24304

Iter: 1000
MAPE: 0.0920749
RMSE: 2.24273

Imputation MAPE: 0.0920605
Imputation RMSE: 2.24246

Running time: 12148 seconds


**Scenario setting**:

- Tensor size: $35912\times 30\times 24$ (road segment, day, time of day)
- Random missing (RM)
- 60% missing rate


In [23]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.6

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]

## Random missing (RM)
random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])
binary_mat = np.round(random_mat + 0.5 - missing_rate)
sparse_mat = np.multiply(dense_mat, binary_mat)

dense_tensor = dense_mat.reshape([dense_mat.shape[0], 30, 24])
sparse_tensor = sparse_mat.reshape([sparse_mat.shape[0], 30, 24])
del dense_mat, sparse_mat

**Model setting**:

- Low rank: 20
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [24]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.0935783
RMSE: 2.27746

Iter: 400
MAPE: 0.0934787
RMSE: 2.27442

Iter: 600
MAPE: 0.0934875
RMSE: 2.27368

Iter: 800
MAPE: 0.09349
RMSE: 2.2728

Iter: 1000
MAPE: 0.0934329
RMSE: 2.27185

Imputation MAPE: 0.0934048
Imputation RMSE: 2.27141

Running time: 11621 seconds


**Scenario setting**:

- Tensor size: $35912\times 30\times 24$ (road segment, day, time of day)
- Non-random missing (NM)
- 40% missing rate


In [25]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.4

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]

## Non-random missing (NM)
binary_mat = np.zeros(dense_mat.shape)
random_mat = np.random.rand(dense_mat.shape[0], 30)
for i1 in range(dense_mat.shape[0]):
    for i2 in range(30):
        binary_mat[i1, i2 * 24 : (i2 + 1) * 24] = np.round(random_mat[i1, i2] + 0.5 - missing_rate)
sparse_mat = np.multiply(dense_mat, binary_mat)

dense_tensor = dense_mat.reshape([dense_mat.shape[0], 30, 24])
sparse_tensor = sparse_mat.reshape([sparse_mat.shape[0], 30, 24])
del dense_mat, sparse_mat

**Model setting**:

- Low rank: 20
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [26]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.0953323
RMSE: 2.32739

Iter: 400
MAPE: 0.0954726
RMSE: 2.33105

Iter: 600
MAPE: 0.0954886
RMSE: 2.33127

Iter: 800
MAPE: 0.0954646
RMSE: 2.33132

Iter: 1000
MAPE: 0.0954386
RMSE: 2.33098

Imputation MAPE: 0.0954079
Imputation RMSE: 2.33034

Running time: 11481 seconds


## Evaluation on New York Taxi Data

**Scenario setting**:

- Tensor size: $30\times 30\times 1464$ (origin, destination, time)
- Random missing (RM)
- 40% missing rate


In [11]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/NYC-data-set/tensor.mat')['tensor'].astype(np.float32)
rm_tensor = scipy.io.loadmat('../datasets/NYC-data-set/rm_tensor.mat')['rm_tensor']
missing_rate = 0.4

## Random missing (RM)
binary_tensor = np.round(rm_tensor + 0.5 - missing_rate)
sparse_tensor = dense_tensor.copy()
sparse_tensor[binary_tensor == 0] = np.nan

**Model setting**:

- Low rank: 30
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [12]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 30
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.482715
RMSE: 4.83652

Iter: 400
MAPE: 0.48757
RMSE: 4.90807

Iter: 600
MAPE: 0.487783
RMSE: 4.90695

Iter: 800
MAPE: 0.487116
RMSE: 4.90089

Iter: 1000
MAPE: 0.486995
RMSE: 4.8972

Imputation MAPE: 0.487589
Imputation RMSE: 4.88986

Running time: 832 seconds


**Scenario setting**:

- Tensor size: $30\times 30\times 1464$ (origin, destination, time)
- Random missing (RM)
- 60% missing rate


In [48]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/NYC-data-set/tensor.mat')['tensor'].astype(np.float32)
rm_tensor = scipy.io.loadmat('../datasets/NYC-data-set/rm_tensor.mat')['rm_tensor']
missing_rate = 0.6

## Random missing (RM)
binary_tensor = np.round(rm_tensor + 0.5 - missing_rate)
sparse_tensor = dense_tensor.copy()
sparse_tensor[binary_tensor == 0] = np.nan

**Model setting**:

- Low rank: 30
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [49]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 30
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.488698
RMSE: 5.02758

Iter: 400
MAPE: 0.488083
RMSE: 5.1028

Iter: 600
MAPE: 0.487799
RMSE: 5.11637

Iter: 800
MAPE: 0.487201
RMSE: 5.10741

Iter: 1000
MAPE: 0.487443
RMSE: 5.10497

Imputation MAPE: 0.487178
Imputation RMSE: 5.10753

Running time: 912 seconds


**Scenario setting**:

- Tensor size: $30\times 30\times 1464$ (origin, destination, time)
- Non-random missing (NM)
- 40% missing rate


In [11]:
import scipy.io

dense_tensor = scipy.io.loadmat('../datasets/NYC-data-set/tensor.mat')['tensor']
nm_tensor = scipy.io.loadmat('../datasets/NYC-data-set/nm_tensor.mat')['nm_tensor']
missing_rate = 0.4

## Non-random missing (NM)
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        for i3 in range(61):
            binary_tensor[i1, i2, i3 * 24 : (i3 + 1) * 24] = np.round(nm_tensor[i1, i2, i3] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

**Model setting**:

- Low rank: 30
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [12]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 30
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.523096
RMSE: 4.83399

Iter: 400
MAPE: 0.530243
RMSE: 4.86244

Iter: 600
MAPE: 0.528467
RMSE: 4.88077

Iter: 800
MAPE: 0.527664
RMSE: 4.87587

Iter: 1000
MAPE: 0.527124
RMSE: 4.87593

Imputation MAPE: 0.528696
Imputation RMSE: 4.87214

Running time: 942 seconds


## Evaluation on Pacific Temperature Data

**Scenario setting**:

- Tensor size: $30\times 84\times 396$ (grid, grid, time)
- Random missing (RM)
- 40% missing rate


In [11]:
import numpy as np
np.random.seed(1000)

dense_tensor = np.load('../datasets/Temperature-data-set/tensor.npy').astype(np.float32)
pos = np.where(dense_tensor[:, 0, :] > 50)
dense_tensor[pos[0], :, pos[1]] = 0
random_tensor = np.random.rand(dense_tensor.shape[0], dense_tensor.shape[1], dense_tensor.shape[2])
missing_rate = 0.4

## Random missing (RM)
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = dense_tensor.copy()
sparse_tensor[binary_tensor == 0] = np.nan
sparse_tensor[sparse_tensor == 0] = np.nan

**Model setting**:

- Low rank: 30
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [12]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 30
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.0155704
RMSE: 0.520268

Iter: 400
MAPE: 0.0150931
RMSE: 0.50321

Iter: 600
MAPE: 0.014952
RMSE: 0.498047

Iter: 800
MAPE: 0.0148674
RMSE: 0.495509

Iter: 1000
MAPE: 0.0148097
RMSE: 0.493761

Imputation MAPE: 0.0147723
Imputation RMSE: 0.492445

Running time: 525 seconds


**Scenario setting**:

- Tensor size: $30\times 84\times 396$ (grid, grid, time)
- Random missing (RM)
- 60% missing rate


In [13]:
import numpy as np
np.random.seed(1000)

dense_tensor = np.load('../datasets/Temperature-data-set/tensor.npy').astype(np.float32)
pos = np.where(dense_tensor[:, 0, :] > 50)
dense_tensor[pos[0], :, pos[1]] = 0
random_tensor = np.random.rand(dense_tensor.shape[0], dense_tensor.shape[1], dense_tensor.shape[2])
missing_rate = 0.6

## Random missing (RM)
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = dense_tensor.copy()
sparse_tensor[binary_tensor == 0] = np.nan
sparse_tensor[sparse_tensor == 0] = np.nan

**Model setting**:

- Low rank: 30
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [14]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 30
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.0158356
RMSE: 0.531336

Iter: 400
MAPE: 0.0153585
RMSE: 0.514397

Iter: 600
MAPE: 0.0151223
RMSE: 0.506048

Iter: 800
MAPE: 0.0149601
RMSE: 0.500249

Iter: 1000
MAPE: 0.0148669
RMSE: 0.496963

Imputation MAPE: 0.0148135
Imputation RMSE: 0.495079

Running time: 447 seconds


**Scenario setting**:

- Tensor size: $30\times 84\times 396$ (grid, grid, time)
- Non-random missing (NM)
- 40% missing rate


In [15]:
import numpy as np
np.random.seed(1000)

dense_tensor = np.load('../datasets/Temperature-data-set/tensor.npy').astype(np.float32)
pos = np.where(dense_tensor[:, 0, :] > 50)
dense_tensor[pos[0], :, pos[1]] = 0
random_tensor = np.random.rand(dense_tensor.shape[0], dense_tensor.shape[1], int(dense_tensor.shape[2] / 3))
missing_rate = 0.4

## Non-random missing (NM)
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        for i3 in range(int(dense_tensor.shape[2] / 3)):
            binary_tensor[i1, i2, i3 * 3 : (i3 + 1) * 3] = np.round(random_tensor[i1, i2, i3] + 0.5 - missing_rate)
sparse_tensor = dense_tensor.copy()
sparse_tensor[binary_tensor == 0] = np.nan
sparse_tensor[sparse_tensor == 0] = np.nan

**Model setting**:

- Low rank: 30
- The number of burn-in iterations: 1000
- The number of Gibbs iterations: 200

In [16]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 30
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burn_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
MAPE: 0.0154962
RMSE: 0.518449

Iter: 400
MAPE: 0.0151833
RMSE: 0.506537

Iter: 600
MAPE: 0.0149846
RMSE: 0.499765

Iter: 800
MAPE: 0.0148942
RMSE: 0.496228

Iter: 1000
MAPE: 0.0147725
RMSE: 0.491612

Imputation MAPE: 0.0146422
Imputation RMSE: 0.486257

Running time: 446 seconds


### License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>