# Bayesian Gaussian CP decomposition

**Author**: Xinyu Chen [[**GitHub homepage**](https://github.com/xinychen)]

**Download**: This Jupyter notebook is at our GitHub repository. If you want to evaluate the code, please download the notebook from the [**transdim**](https://github.com/xinychen/transdim/blob/master/large-imputer/BGCP.ipynb) repository.

This notebook shows how to implement the Bayesian Gaussian CP decomposition (BGCP) model on some real-world data sets. In the following, we will discuss:

- What the Bayesian Gaussian CP decomposition is.

- How to implement BGCP mainly using Python `numpy` with high efficiency.

- How to make imputation on some real-world spatiotemporal datasets.

To overcome the problem of missing values within multivariate time series data, this model takes into account low-rank tensor structure by folding data along day dimension. For an in-depth discussion of BGCP, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Xinyu Chen, Zhaocheng He, Lijun Sun (2019). <b>A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation</b>. Transportation Research Part C: Emerging Technologies, 98: 73-84. <a href="https://doi.org/10.1016/j.trc.2018.11.003" title="PDF"><b>[PDF]</b></a> 
</font>
</div>

We start by importing the necessary dependencies. We will make use of `numpy` and `scipy`.

In [1]:
import numpy as np
from numpy.random import multivariate_normal as mvnrnd
from scipy.stats import wishart
from numpy.random import normal as normrnd
from scipy.linalg import khatri_rao as kr_prod
from numpy.linalg import inv as inv
from numpy.linalg import solve as solve
from numpy.linalg import cholesky as cholesky_lower
from scipy.linalg import cholesky as cholesky_upper
from scipy.linalg import solve_triangular as solve_ut

In [2]:
def mvnrnd_pre(mu, Lambda):
    src = normrnd(size = (mu.shape[0],))
    return solve_ut(cholesky_upper(Lambda, overwrite_a = True, check_finite = False), 
                    src, lower = False, check_finite = False, overwrite_b = True) + mu

### CP decomposition

#### CP Combination (`cp_combine`)

- **Definition**:

The CP decomposition factorizes a tensor into a sum of outer products of vectors. For example, for a third-order tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$, the CP decomposition can be written as

$$\hat{\mathcal{Y}}=\sum_{s=1}^{r}\boldsymbol{u}_{s}\circ\boldsymbol{v}_{s}\circ\boldsymbol{x}_{s},$$
or element-wise,

$$\hat{y}_{ijt}=\sum_{s=1}^{r}u_{is}v_{js}x_{ts},\forall (i,j,t),$$
where vectors $\boldsymbol{u}_{s}\in\mathbb{R}^{m},\boldsymbol{v}_{s}\in\mathbb{R}^{n},\boldsymbol{x}_{s}\in\mathbb{R}^{f}$ are columns of factor matrices $U\in\mathbb{R}^{m\times r},V\in\mathbb{R}^{n\times r},X\in\mathbb{R}^{f\times r}$, respectively. The symbol $\circ$ denotes vector outer product.

- **Example**:

Given matrices $U=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]\in\mathbb{R}^{2\times 2}$, $V=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ \end{array} \right]\in\mathbb{R}^{3\times 2}$ and $X=\left[ \begin{array}{cc} 1 & 5 \\ 2 & 6 \\ 3 & 7 \\ 4 & 8 \\ \end{array} \right]\in\mathbb{R}^{4\times 2}$, then if $\hat{\mathcal{Y}}=\sum_{s=1}^{r}\boldsymbol{u}_{s}\circ\boldsymbol{v}_{s}\circ\boldsymbol{x}_{s}$, then, we have

$$\hat{Y}_1=\hat{\mathcal{Y}}(:,:,1)=\left[ \begin{array}{ccc} 31 & 42 & 65 \\ 63 & 86 & 135 \\ \end{array} \right],$$
$$\hat{Y}_2=\hat{\mathcal{Y}}(:,:,2)=\left[ \begin{array}{ccc} 38 & 52 & 82 \\ 78 & 108 & 174 \\ \end{array} \right],$$
$$\hat{Y}_3=\hat{\mathcal{Y}}(:,:,3)=\left[ \begin{array}{ccc} 45 & 62 & 99 \\ 93 & 130 & 213 \\ \end{array} \right],$$
$$\hat{Y}_4=\hat{\mathcal{Y}}(:,:,4)=\left[ \begin{array}{ccc} 52 & 72 & 116 \\ 108 & 152 & 252 \\ \end{array} \right].$$

In [3]:
def cp_combine(var):
    return np.einsum('is, js, ts -> ijt', var[0], var[1], var[2])

In [4]:
factor = [np.array([[1, 2], [3, 4]]), np.array([[1, 3], [2, 4], [5, 6]]), 
          np.array([[1, 5], [2, 6], [3, 7], [4, 8]])]
print(cp_combine(factor))
print()
print('tensor size:')
print(cp_combine(factor).shape)

[[[ 31  38  45  52]
  [ 42  52  62  72]
  [ 65  82  99 116]]

 [[ 63  78  93 108]
  [ 86 108 130 152]
  [135 174 213 252]]]

tensor size:
(2, 3, 4)


### Tensor Unfolding (`ten2mat`)

Using numpy reshape to perform 3rd rank tensor unfold operation. [[**link**](https://stackoverflow.com/questions/49970141/using-numpy-reshape-to-perform-3rd-rank-tensor-unfold-operation)]

In [5]:
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

In [6]:
def mat2ten(mat, dim, mode):
    index = list()
    index.append(mode)
    for i in range(dim.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(dim[index]), order = 'F'), 0, mode)

### Computing Covariance Matrix (`cov_mat`)

For any matrix $X\in\mathbb{R}^{m\times n}$, `cov_mat` can return a $n\times n$ covariance matrix for special use in the following.

In [7]:
def cov_mat(mat, mat_bar):
    mat = mat - mat_bar
    return mat.T @ mat

## Bayesian Gaussian CP decomposition (BGCP)

### Model Description

#### Gaussian assumption

Given a matrix $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$ which suffers from missing values, then the factorization can be applied to reconstruct the missing values within $\mathcal{Y}$ by

$$y_{ijt}\sim\mathcal{N}\left(\sum_{s=1}^{r}u_{is} v_{js} x_{ts},\tau^{-1}\right),\forall (i,j,t),$$
where vectors $\boldsymbol{u}_{s}\in\mathbb{R}^{m},\boldsymbol{v}_{s}\in\mathbb{R}^{n},\boldsymbol{x}_{s}\in\mathbb{R}^{f}$ are columns of latent factor matrices, and $u_{is},v_{js},x_{ts}$ are their elements. The precision term $\tau$ is an inverse of Gaussian variance.

#### Bayesian framework

Based on the Gaussian assumption over tensor elements $y_{ijt},(i,j,t)\in\Omega$ (where $\Omega$ is a index set indicating observed tensor elements), the conjugate priors of model parameters (i.e., latent factors and precision term) and hyperparameters are given as

$$\boldsymbol{u}_{i}\sim\mathcal{N}\left(\boldsymbol{\mu}_{u},\Lambda_{u}^{-1}\right),\forall i,$$
$$\boldsymbol{v}_{j}\sim\mathcal{N}\left(\boldsymbol{\mu}_{v},\Lambda_{v}^{-1}\right),\forall j,$$
$$\boldsymbol{x}_{t}\sim\mathcal{N}\left(\boldsymbol{\mu}_{x},\Lambda_{x}^{-1}\right),\forall t,$$
$$\tau\sim\text{Gamma}\left(a_0,b_0\right),$$
$$\boldsymbol{\mu}_{u}\sim\mathcal{N}\left(\boldsymbol{\mu}_0,\left(\beta_0\Lambda_u\right)^{-1}\right),\Lambda_u\sim\mathcal{W}\left(W_0,\nu_0\right),$$
$$\boldsymbol{\mu}_{v}\sim\mathcal{N}\left(\boldsymbol{\mu}_0,\left(\beta_0\Lambda_v\right)^{-1}\right),\Lambda_v\sim\mathcal{W}\left(W_0,\nu_0\right),$$
$$\boldsymbol{\mu}_{x}\sim\mathcal{N}\left(\boldsymbol{\mu}_0,\left(\beta_0\Lambda_x\right)^{-1}\right),\Lambda_x\sim\mathcal{W}\left(W_0,\nu_0\right).$$


### Posterior Inference

In the following, we will apply Gibbs sampling to implement our Bayesian inference for the matrix factorization task.

#### - Sampling latent factors $\boldsymbol{u}_{i},i\in\left\{1,2,...,m\right\}$

Draw $\boldsymbol{u}_{i}\sim\mathcal{N}\left(\boldsymbol{\mu}_i^{*},(\Lambda_{i}^{*})^{-1}\right)$ with following parameters:

$$\boldsymbol{\mu}_{i}^{*}=\left(\Lambda_{i}^{*}\right)^{-1}\left\{\tau\sum_{j,t:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{v}_{j}\circledast\boldsymbol{x}_{t}\right)+\Lambda_u\boldsymbol{\mu}_u\right\},$$

$$\Lambda_{i}^{*}=\tau\sum_{j,t:(i,j,t)\in\Omega}\left(\boldsymbol{v}_{j}\circledast\boldsymbol{x}_{t}\right)\left(\boldsymbol{v}_{j}\circledast\boldsymbol{x}_{t}\right)^{T}+\Lambda_u.$$


#### - Sampling latent factors $\boldsymbol{v}_{j},j\in\left\{1,2,...,n\right\}$

Draw $\boldsymbol{v}_{j}\sim\mathcal{N}\left(\boldsymbol{\mu}_j^{*},(\Lambda_{j}^{*})^{-1}\right)$ with following parameters:

$$\boldsymbol{\mu}_{j}^{*}=\left(\Lambda_{j}^{*}\right)^{-1}\left\{\tau\sum_{i,t:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{x}_{t}\right)+\Lambda_v\boldsymbol{\mu}_v\right\}$$

$$\Lambda_{j}^{*}=\tau\sum_{i,t:(i,j,t)\in\Omega}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{x}_{t}\right)\left(\boldsymbol{u}_{i}\circledast\boldsymbol{x}_{t}\right)^{T}+\Lambda_v.$$


#### - Sampling latent factors $\boldsymbol{x}_{t},t\in\left\{1,2,...,f\right\}$

Draw $\boldsymbol{x}_{t}\sim\mathcal{N}\left(\boldsymbol{\mu}_t^{*},(\Lambda_{t}^{*})^{-1}\right)$ with following parameters:

$$\boldsymbol{\mu}_{t}^{*}=\left(\Lambda_{t}^{*}\right)^{-1}\left\{\tau\sum_{i,j:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{v}_{j}\right)+\Lambda_x\boldsymbol{\mu}_x\right\}$$

$$\Lambda_{t}^{*}=\tau\sum_{i,j:(i,j,t)\in\Omega}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{v}_{j}\right)\left(\boldsymbol{u}_{i}\circledast\boldsymbol{v}_{j}\right)^{T}+\Lambda_x.$$


#### - Sampling precision term $\tau$

Draw $\tau\in\text{Gamma}\left(a^{*},b^{*}\right)$ with following parameters:

$$a^{*}=a_0+\frac{1}{2}|\Omega|,~b^{*}=b_0+\frac{1}{2}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\sum_{s=1}^{r}u_{is}v_{js}x_{ts}\right)^2.$$


#### - Sampling hyperparameters $\left(\boldsymbol{\mu}_{u},\Lambda_{u}\right)$

Draw

- $\Lambda_{u}\sim\mathcal{W}\left(W_u^{*},\nu_u^{*}\right)$
- $\boldsymbol{\mu}_{u}\sim\mathcal{N}\left(\boldsymbol{\mu}_{u}^{*},\left(\beta_u^{*}\Lambda_u\right)^{-1}\right)$

with following parameters:

$$\boldsymbol{\mu}_{u}^{*}=\frac{m\boldsymbol{\bar{u}}+\beta_0\boldsymbol{\mu}_0}{m+\beta_0},~\beta_u^{*}=m+\beta_0,~\nu_u^{*}=m+\nu_0,$$
$$\left(W_u^{*}\right)^{-1}=W_0^{-1}+mS_u+\frac{m\beta_0}{m+\beta_0}\left(\boldsymbol{\bar{u}}-\boldsymbol{\mu}_0\right)\left(\boldsymbol{\bar{u}}-\boldsymbol{\mu}_0\right)^T,$$
where $\boldsymbol{\bar{u}}=\sum_{i=1}^{m}\boldsymbol{u}_{i},~S_u=\frac{1}{m}\sum_{i=1}^{m}\left(\boldsymbol{u}_{i}-\boldsymbol{\bar{u}}\right)\left(\boldsymbol{u}_{i}-\boldsymbol{\bar{u}}\right)^T$.


#### - Sampling hyperparameters $\left(\boldsymbol{\mu}_{v},\Lambda_{v}\right)$

Draw

- $\Lambda_{v}\sim\mathcal{W}\left(W_v^{*},\nu_v^{*}\right)$
- $\boldsymbol{\mu}_{v}\sim\mathcal{N}\left(\boldsymbol{\mu}_{v}^{*},\left(\beta_v^{*}\Lambda_v\right)^{-1}\right)$

with following parameters:

$$\boldsymbol{\mu}_{v}^{*}=\frac{n\boldsymbol{\bar{v}}+\beta_0\boldsymbol{\mu}_0}{n+\beta_0},~\beta_v^{*}=n+\beta_0,~\nu_v^{*}=n+\nu_0,$$
$$\left(W_v^{*}\right)^{-1}=W_0^{-1}+nS_v+\frac{n\beta_0}{n+\beta_0}\left(\boldsymbol{\bar{v}}-\boldsymbol{\mu}_0\right)\left(\boldsymbol{\bar{v}}-\boldsymbol{\mu}_0\right)^T,$$
where $\boldsymbol{\bar{v}}=\sum_{j=1}^{n}\boldsymbol{v}_{j},~S_v=\frac{1}{n}\sum_{j=1}^{n}\left(\boldsymbol{v}_{j}-\boldsymbol{\bar{v}}\right)\left(\boldsymbol{v}_{j}-\boldsymbol{\bar{v}}\right)^T$.


#### - Sampling hyperparameters $\left(\boldsymbol{\mu}_{x},\Lambda_{x}\right)$

Draw

- $\Lambda_{x}\sim\mathcal{W}\left(W_x^{*},\nu_x^{*}\right)$
- $\boldsymbol{\mu}_{x}\sim\mathcal{N}\left(\boldsymbol{\mu}_{x}^{*},\left(\beta_x^{*}\Lambda_x\right)^{-1}\right)$

with following parameters:

$$\boldsymbol{\mu}_{x}^{*}=\frac{f\boldsymbol{\bar{x}}+\beta_0\boldsymbol{\mu}_0}{f+\beta_0},~\beta_x^{*}=f+\beta_0,~\nu_x^{*}=f+\nu_0,$$
$$\left(W_x^{*}\right)^{-1}=W_0^{-1}+fS_x+\frac{f\beta_0}{f+\beta_0}\left(\boldsymbol{\bar{x}}-\boldsymbol{\mu}_0\right)\left(\boldsymbol{\bar{x}}-\boldsymbol{\mu}_0\right)^T,$$
where $\boldsymbol{\bar{x}}=\sum_{t=1}^{f}\boldsymbol{x}_{t},~S_x=\frac{1}{f}\sum_{t=1}^{f}\left(\boldsymbol{x}_{t}-\boldsymbol{\bar{x}}\right)\left(\boldsymbol{x}_{t}-\boldsymbol{\bar{x}}\right)^T$.

In [8]:
def sample_factor(tau_sparse_tensor, tau_ind, factor, k, beta0 = 1):
    dim, rank = factor[k].shape
    dim = factor[k].shape[0]
    factor_bar = np.mean(factor[k], axis = 0)
    temp = dim / (dim + beta0)
    var_mu_hyper = temp * factor_bar
    var_W_hyper = inv(np.eye(rank) + cov_mat(factor[k], factor_bar) + temp * beta0 * np.outer(factor_bar, factor_bar))
    var_Lambda_hyper = wishart.rvs(df = dim + rank, scale = var_W_hyper)
    var_mu_hyper = mvnrnd_pre(var_mu_hyper, (dim + beta0) * var_Lambda_hyper)
    
    idx = list(filter(lambda x: x != k, range(len(factor))))
    var1 = kr_prod(factor[idx[1]], factor[idx[0]]).T
    var2 = kr_prod(var1, var1)
    var3 = (var2 @ ten2mat(tau_ind, k).T).reshape([rank, rank, dim]) + var_Lambda_hyper[:, :, np.newaxis]
    var4 = var1 @ ten2mat(tau_sparse_tensor, k).T + (var_Lambda_hyper @ var_mu_hyper)[:, np.newaxis]
    for i in range(dim):
        factor[k][i, :] = mvnrnd_pre(solve(var3[:, :, i], var4[:, i]), var3[:, :, i])
    return factor[k]

#### - Sampling precision term $\tau$

Draw $\tau\in\text{Gamma}\left(a^{*},b^{*}\right)$ with following parameters:

$$a^{*}=a_0+\frac{1}{2}|\Omega|,~b^{*}=b_0+\frac{1}{2}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\sum_{s=1}^{r}u_{is}v_{js}x_{ts}\right)^2.$$


In [9]:
def sample_precision_tau(sparse_tensor, tensor_hat, ind):
    var_alpha = 1e-6 + 0.5 * np.sum(ind)
    var_beta = 1e-6 + 0.5 * np.sum(((sparse_tensor - tensor_hat) ** 2) * ind)
    return np.random.gamma(var_alpha, 1 / var_beta)

### Define Performance Metrics

- **RMSE**
- **MAPE**

In [10]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

### Define BGCP with `Numpy`

In [11]:
def BGCP(dense_tensor, sparse_tensor, factor, burn_iter, gibbs_iter):
    """Bayesian Gaussian CP (BGCP) decomposition."""
    
    dim = np.array(sparse_tensor.shape)
    rank = factor[0].shape[1]
    if np.isnan(sparse_tensor).any() == False:
        ind = sparse_tensor != 0
        pos_obs = np.where(ind)
        pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    elif np.isnan(sparse_tensor).any() == True:
        pos_test = np.where((dense_tensor != 0) & (np.isnan(sparse_tensor)))
        ind = ~np.isnan(sparse_tensor)
        pos_obs = np.where(ind)
        sparse_tensor[np.isnan(sparse_tensor)] = 0
    show_iter = 200
    tau = 1
    factor_plus = []
    for k in range(len(dim)):
        factor_plus.append(np.zeros((dim[k], rank)))
    temp_hat = np.zeros(dim)
    tensor_hat_plus = np.zeros(dim)
    for it in range(burn_iter + gibbs_iter):
        tau_ind = tau * ind
        tau_sparse_tensor = tau * sparse_tensor
        for k in range(len(dim)):
            factor[k] = sample_factor(tau_sparse_tensor, tau_ind, factor, k)
        tensor_hat = cp_combine(factor)
        temp_hat += tensor_hat
        tau = sample_precision_tau(sparse_tensor, tensor_hat, ind)
        if it + 1 > burn_iter:
            factor_plus = [factor_plus[k] + factor[k] for k in range(len(dim))]
            tensor_hat_plus += tensor_hat
        if (it + 1) % show_iter == 0 and it < burn_iter:
            temp_hat = temp_hat / show_iter
            print('Iter: {}'.format(it + 1))
            print('MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], temp_hat[pos_test])))
            print('RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], temp_hat[pos_test])))
            temp_hat = np.zeros(sparse_tensor.shape)
            print()
    factor = [i / gibbs_iter for i in factor_plus]
    tensor_hat = tensor_hat_plus / gibbs_iter
    print('Imputation MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], tensor_hat[pos_test])))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
    print()
    
    return tensor_hat, factor

### Experiments on Guangzhou-2M Data Set

In [12]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)

In [13]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 80
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
tensor_hat, factor = BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start) / 60.0))

Iter: 200
MAPE: 0.0844794
RMSE: 3.6278

Iter: 400
MAPE: 0.0839146
RMSE: 3.61087

Iter: 600
MAPE: 0.0836881
RMSE: 3.60329

Iter: 800
MAPE: 0.0835097
RMSE: 3.59796

Iter: 1000
MAPE: 0.0834138
RMSE: 3.59485

Imputation MAPE: 0.083347
Imputation RMSE: 3.59277

Running time: 55.19 minutes


In [14]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)

In [15]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 80
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
tensor_hat, factor = BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start) / 60.0))

Iter: 200
MAPE: 0.0860124
RMSE: 3.69509

Iter: 400
MAPE: 0.0857756
RMSE: 3.69618

Iter: 600
MAPE: 0.0855684
RMSE: 3.69159

Iter: 800
MAPE: 0.0854787
RMSE: 3.69049

Iter: 1000
MAPE: 0.0854447
RMSE: 3.68967

Imputation MAPE: 0.0853548
Imputation RMSE: 3.68634

Running time: 56.72 minutes


In [16]:
pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
bgcp_ghat = dense_tensor[pos_test] - tensor_hat[pos_test]
np.save('bgcp_ghat.npy', bgcp_ghat)

In [17]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Non-random Missing (NM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]

In [18]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
tensor_hat, factor = BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start) / 60.0))

Iter: 200
MAPE: 0.105478
RMSE: 4.38065

Iter: 400
MAPE: 0.105073
RMSE: 4.39039

Iter: 600
MAPE: 0.104957
RMSE: 4.38944

Iter: 800
MAPE: 0.10485
RMSE: 4.38626

Iter: 1000
MAPE: 0.10472
RMSE: 4.37648

Imputation MAPE: 0.104695
Imputation RMSE: 4.374

Running time: 2.85 minutes


In [19]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Non-random Missing (NM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]

In [20]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
tensor_hat, factor = BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start) / 60.0))

Iter: 200
MAPE: 0.107123
RMSE: 4.56218

Iter: 400
MAPE: 0.108416
RMSE: 4.72312

Iter: 600
MAPE: 0.108664
RMSE: 4.7655

Iter: 800
MAPE: 0.107668
RMSE: 4.69038

Iter: 1000
MAPE: 0.107409
RMSE: 4.66156

Imputation MAPE: 0.107403
Imputation RMSE: 4.66324

Running time: 2.90 minutes


### Experiments on London-1M Data Set

In [21]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.3

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
dim1, dim2 = dense_mat.shape
del binary_mat

## Random missing (RM)
sparse_mat = dense_mat * np.round(np.random.rand(dim1, dim2) + 0.5 - missing_rate)
dense_tensor = dense_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
sparse_tensor = sparse_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
del dense_mat, sparse_mat

In [22]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
tensor_hat, factor = BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start) / 60.0))

Iter: 200
MAPE: 0.0923704
RMSE: 2.24761

Iter: 400
MAPE: 0.0920785
RMSE: 2.24273

Iter: 600
MAPE: 0.0919728
RMSE: 2.24107

Iter: 800
MAPE: 0.0919203
RMSE: 2.23963

Iter: 1000
MAPE: 0.0918943
RMSE: 2.23878

Imputation MAPE: 0.0918797
Imputation RMSE: 2.23831

Running time: 240.07 minutes


In [23]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.7

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
dim1, dim2 = dense_mat.shape
del binary_mat

## Random missing (RM)
sparse_mat = dense_mat * np.round(np.random.rand(dim1, dim2) + 0.5 - missing_rate)
dense_tensor = dense_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
sparse_tensor = sparse_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
del dense_mat, sparse_mat

In [24]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
tensor_hat, factor = BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start) / 60.0))

Iter: 200
MAPE: 0.0946519
RMSE: 2.30115

Iter: 400
MAPE: 0.0945316
RMSE: 2.29761

Iter: 600
MAPE: 0.0944499
RMSE: 2.29631

Iter: 800
MAPE: 0.0944288
RMSE: 2.29582

Iter: 1000
MAPE: 0.0944131
RMSE: 2.29549

Imputation MAPE: 0.0944034
Imputation RMSE: 2.29526

Running time: 178.31 minutes


In [25]:
pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
bgcp_lhat = dense_tensor[pos_test] - tensor_hat[pos_test]
np.save('bgcp_lhat.npy', bgcp_lhat)

In [26]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.3

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
dim1, dim2 = dense_mat.shape
del binary_mat

## Non-random missing (NM)
dense_tensor = dense_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, 30) + 0.5 - missing_rate)[:, None, :]
del dense_mat

In [27]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
tensor_hat, factor = BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start) / 60.0))

Iter: 200
MAPE: 0.0945315
RMSE: 2.30887

Iter: 400
MAPE: 0.0947113
RMSE: 2.31177

Iter: 600
MAPE: 0.0946104
RMSE: 2.30884

Iter: 800
MAPE: 0.0945902
RMSE: 2.30817

Iter: 1000
MAPE: 0.0945896
RMSE: 2.30836

Imputation MAPE: 0.0945965
Imputation RMSE: 2.30861

Running time: 175.74 minutes


In [28]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.7

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
dim1, dim2 = dense_mat.shape
del binary_mat

## Non-random missing (NM)
dense_tensor = dense_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, 30) + 0.5 - missing_rate)[:, None, :]
del dense_mat

In [29]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 20
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
tensor_hat, factor = BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start) / 60.0))

Iter: 200
MAPE: 0.0994217
RMSE: 2.4298

Iter: 400
MAPE: 0.099987
RMSE: 2.44432

Iter: 600
MAPE: 0.0999745
RMSE: 2.44414

Iter: 800
MAPE: 0.0999687
RMSE: 2.4439

Iter: 1000
MAPE: 0.0999433
RMSE: 2.44297

Imputation MAPE: 0.0998856
Imputation RMSE: 2.44155

Running time: 191.90 minutes


### Experiments on PeMS-4W Data Set

In [12]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.3

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_tensor, binary_tensor

In [13]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 200
MAPE: 0.0509573
RMSE: 4.34123

Iter: 400
MAPE: 0.050683
RMSE: 4.33272

Iter: 600
MAPE: 0.0505847
RMSE: 4.3285

Iter: 800
MAPE: 0.0504597
RMSE: 4.32072

Iter: 1000
MAPE: 0.0504452
RMSE: 4.31881

Imputation MAPE: 0.0502532
Imputation RMSE: 4.31065

Running time: 287.48 minutes


In [14]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.7

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_tensor, binary_tensor

In [15]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 200
MAPE: 0.0508234
RMSE: 4.3172

Iter: 400
MAPE: 0.0502252
RMSE: 4.31527

Iter: 600
MAPE: 0.0502002
RMSE: 4.31466

Iter: 800
MAPE: 0.0501868
RMSE: 4.31439

Iter: 1000
MAPE: 0.050184
RMSE: 4.31425

Imputation MAPE: 0.0501825
Imputation RMSE: 4.31415

Running time: 286.09 minutes


In [16]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

In [17]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 200
MAPE: 0.0541532
RMSE: 4.57425

Iter: 400
MAPE: 0.0540382
RMSE: 4.5875

Iter: 600
MAPE: 0.0539228
RMSE: 4.58336

Iter: 800
MAPE: 0.0538353
RMSE: 4.58434

Iter: 1000
MAPE: 0.0538185
RMSE: 4.58871

Imputation MAPE: 0.0537746
Imputation RMSE: 4.59074

Running time: 284.77 minutes


In [18]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.7

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

In [19]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 200
MAPE: 0.0599587
RMSE: 5.07417

Iter: 400
MAPE: 0.0609349
RMSE: 5.40117

Iter: 600
MAPE: 0.0606259
RMSE: 5.40451

Iter: 800
MAPE: 0.0608938
RMSE: 5.46919

Iter: 1000
MAPE: 0.0610082
RMSE: 5.49707

Imputation MAPE: 0.0610061
Imputation RMSE: 5.49654

Running time: 285.08 minutes


### Experiments on PeMS-8W Data Set

In [20]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 8 * 7)

missing_rate = 0.3

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_tensor, binary_tensor

In [21]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 200
MAPE: 0.0536342
RMSE: 4.54811

Iter: 400
MAPE: 0.0528521
RMSE: 4.51873

Iter: 600
MAPE: 0.0528394
RMSE: 4.51789

Iter: 800
MAPE: 0.052841
RMSE: 4.51758

Iter: 1000
MAPE: 0.0528411
RMSE: 4.51738

Imputation MAPE: 0.0528409
Imputation RMSE: 4.51715

Running time: 585.60 minutes


In [22]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 8 * 7)

missing_rate = 0.7

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_tensor, binary_tensor

In [23]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 200
MAPE: 0.0532732
RMSE: 4.52602

Iter: 400
MAPE: 0.0527793
RMSE: 4.51802

Iter: 600
MAPE: 0.0527547
RMSE: 4.51666

Iter: 800
MAPE: 0.0527249
RMSE: 4.51404

Iter: 1000
MAPE: 0.0527285
RMSE: 4.51299

Imputation MAPE: 0.0527291
Imputation RMSE: 4.5127

Running time: 553.88 minutes


In [24]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 8 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

In [25]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 200
MAPE: 0.0559252
RMSE: 4.69442

Iter: 400
MAPE: 0.0551243
RMSE: 4.67289

Iter: 600
MAPE: 0.0548693
RMSE: 4.66503

Iter: 800
MAPE: 0.0547806
RMSE: 4.66047

Iter: 1000
MAPE: 0.0547388
RMSE: 4.65913

Imputation MAPE: 0.0547163
Imputation RMSE: 4.65863

Running time: 563.42 minutes


In [26]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 8 * 7)

missing_rate = 0.7

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

In [27]:
import time
start = time.time()
dim = np.array(sparse_tensor.shape)
rank = 10
factor = []
for k in range(len(dim)):
    factor.append(0.1 * np.random.randn(dim[k], rank))
burnin_iter = 1000
gibbs_iter = 200
BGCP(dense_tensor, sparse_tensor, factor, burnin_iter, gibbs_iter)
end = time.time()
print('Running time: %.2f minutes'%((end - start)/60.0))

Iter: 200
MAPE: 0.0573149
RMSE: 4.81128

Iter: 400
MAPE: 0.0562119
RMSE: 4.77796

Iter: 600
MAPE: 0.0562737
RMSE: 4.80638

Iter: 800
MAPE: 0.056286
RMSE: 4.81353

Iter: 1000
MAPE: 0.0563006
RMSE: 4.81698

Imputation MAPE: 0.0563073
Imputation RMSE: 4.81831

Running time: 551.31 minutes


### Experiments on PeMS-4W with graph partitioning

In [12]:
import pandas as pd

graph_pems = pd.read_csv('../datasets/California-data-set/graph_pems.csv')
graph_pems.head()

Unnamed: 0,2,4,8,16,32,64
0,0,1,2,6,5,47
1,0,1,2,6,5,47
2,0,1,2,6,5,47
3,0,1,2,6,5,47
4,0,1,2,6,5,47


In [13]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.3

### Random missing (RM)
sparse_tensor = dense_tensor * np.round(random_tensor + 0.5 - missing_rate)
del data, random_tensor

## Test BGCP Model
pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
var = dense_tensor[pos_test]

rank = 10
burnin_iter = 1000
gibbs_iter = 200

for i in range(1, 7):
    print('Graph partitioning: {}.'.format(2 ** i))
    tensor_hat = np.zeros(dense_tensor.shape)
    road = graph_pems.values[:, i - 1]
    for d in range(2 ** i):
        pos = np.where(road == d)
        dense = dense_tensor[pos[0], :, :]
        sparse = sparse_tensor[pos[0], :, :]
        dim = np.array(sparse.shape)
        factor = []
        for k in range(len(dim)):
            factor.append(0.1 * np.random.randn(dim[k], rank))
        small_tensor, _ = BGCP(dense, sparse, factor, burnin_iter, gibbs_iter)
        tensor_hat[pos[0], :, :] = small_tensor
    print('Final MAPE:')
    print(compute_mape(var, tensor_hat[pos_test]))
    print('Final RMSE:')
    print(compute_rmse(var, tensor_hat[pos_test]))
    print()

Graph partitioning: 2.
Iter: 200
MAPE: 0.0502087
RMSE: 4.29454

Iter: 400
MAPE: 0.0495423
RMSE: 4.27226

Iter: 600
MAPE: 0.0494254
RMSE: 4.26652

Iter: 800
MAPE: 0.0493991
RMSE: 4.26518

Iter: 1000
MAPE: 0.0493924
RMSE: 4.26445

Imputation MAPE: 0.0493873
Imputation RMSE: 4.26357

Iter: 200
MAPE: 0.0496438
RMSE: 4.25193

Iter: 400
MAPE: 0.0494466
RMSE: 4.25616

Iter: 600
MAPE: 0.0494399
RMSE: 4.25577

Iter: 800
MAPE: 0.0494345
RMSE: 4.25547

Iter: 1000
MAPE: 0.0494262
RMSE: 4.2552

Imputation MAPE: 0.0494155
Imputation RMSE: 4.25494

Final MAPE:
0.04940142262468619
Final RMSE:
4.259255210421078

Graph partitioning: 4.
Iter: 200
MAPE: 0.0417617
RMSE: 3.83091

Iter: 400
MAPE: 0.0416754
RMSE: 3.8292

Iter: 600
MAPE: 0.0416767
RMSE: 3.82931

Iter: 800
MAPE: 0.0416714
RMSE: 3.82885

Iter: 1000
MAPE: 0.0416682
RMSE: 3.82853

Imputation MAPE: 0.041667
Imputation RMSE: 3.82836

Iter: 200
MAPE: 0.0572416
RMSE: 4.63006

Iter: 400
MAPE: 0.0570139
RMSE: 4.62617

Iter: 600
MAPE: 0.0569363
RMSE: 4.6

Iter: 800
MAPE: 0.0329992
RMSE: 3.19763

Iter: 1000
MAPE: 0.0329828
RMSE: 3.1948

Imputation MAPE: 0.032998
Imputation RMSE: 3.1928

Iter: 200
MAPE: 0.0572542
RMSE: 4.54685

Iter: 400
MAPE: 0.0572193
RMSE: 4.54376

Iter: 600
MAPE: 0.0569106
RMSE: 4.53295

Iter: 800
MAPE: 0.0567537
RMSE: 4.5322

Iter: 1000
MAPE: 0.0566861
RMSE: 4.53081

Imputation MAPE: 0.0565982
Imputation RMSE: 4.5287

Iter: 200
MAPE: 0.0398163
RMSE: 3.53199

Iter: 400
MAPE: 0.0392721
RMSE: 3.49295

Iter: 600
MAPE: 0.0391493
RMSE: 3.48555

Iter: 800
MAPE: 0.0391506
RMSE: 3.48699

Iter: 1000
MAPE: 0.0391625
RMSE: 3.48693

Imputation MAPE: 0.039167
Imputation RMSE: 3.48691

Iter: 200
MAPE: 0.0485309
RMSE: 3.99376

Iter: 400
MAPE: 0.0484121
RMSE: 3.99316

Iter: 600
MAPE: 0.0483009
RMSE: 3.98606

Iter: 800
MAPE: 0.0482804
RMSE: 3.98162

Iter: 1000
MAPE: 0.0482576
RMSE: 3.97985

Imputation MAPE: 0.0482437
Imputation RMSE: 3.97858

Iter: 200
MAPE: 0.0447911
RMSE: 3.97163

Iter: 400
MAPE: 0.0442035
RMSE: 3.95264

Iter: 600
M

Iter: 200
MAPE: 0.0616803
RMSE: 4.88808

Iter: 400
MAPE: 0.0610836
RMSE: 4.85353

Iter: 600
MAPE: 0.061136
RMSE: 4.84788

Iter: 800
MAPE: 0.0611897
RMSE: 4.84676

Iter: 1000
MAPE: 0.0612244
RMSE: 4.84565

Imputation MAPE: 0.0612324
Imputation RMSE: 4.8435

Iter: 200
MAPE: 0.0539028
RMSE: 4.43986

Iter: 400
MAPE: 0.054273
RMSE: 4.46

Iter: 600
MAPE: 0.054147
RMSE: 4.45518

Iter: 800
MAPE: 0.0541259
RMSE: 4.45463

Iter: 1000
MAPE: 0.0541294
RMSE: 4.454

Imputation MAPE: 0.0541151
Imputation RMSE: 4.4533

Iter: 200
MAPE: 0.0291252
RMSE: 2.84957

Iter: 400
MAPE: 0.0290848
RMSE: 2.86467

Iter: 600
MAPE: 0.0290696
RMSE: 2.86761

Iter: 800
MAPE: 0.0290325
RMSE: 2.8643

Iter: 1000
MAPE: 0.0289529
RMSE: 2.86018

Imputation MAPE: 0.0289412
Imputation RMSE: 2.86711

Iter: 200
MAPE: 0.0324982
RMSE: 3.20374

Iter: 400
MAPE: 0.0324404
RMSE: 3.20178

Iter: 600
MAPE: 0.0324135
RMSE: 3.20023

Iter: 800
MAPE: 0.0324101
RMSE: 3.19995

Iter: 1000
MAPE: 0.0324121
RMSE: 3.19962

Imputation MAPE: 0.0324136
I

Iter: 200
MAPE: 0.0752649
RMSE: 5.43595

Iter: 400
MAPE: 0.074534
RMSE: 5.40611

Iter: 600
MAPE: 0.0740664
RMSE: 5.39056

Iter: 800
MAPE: 0.073761
RMSE: 5.37459

Iter: 1000
MAPE: 0.0738964
RMSE: 5.37011

Imputation MAPE: 0.0739482
Imputation RMSE: 5.37086

Iter: 200
MAPE: 0.0525603
RMSE: 4.33758

Iter: 400
MAPE: 0.052241
RMSE: 4.32125

Iter: 600
MAPE: 0.0513543
RMSE: 4.27135

Iter: 800
MAPE: 0.0515303
RMSE: 4.27978

Iter: 1000
MAPE: 0.0515495
RMSE: 4.2794

Imputation MAPE: 0.0515788
Imputation RMSE: 4.27886

Iter: 200
MAPE: 0.04094
RMSE: 3.73873

Iter: 400
MAPE: 0.0410121
RMSE: 3.7374

Iter: 600
MAPE: 0.0411207
RMSE: 3.74202

Iter: 800
MAPE: 0.0410965
RMSE: 3.74044

Iter: 1000
MAPE: 0.0410735
RMSE: 3.73853

Imputation MAPE: 0.0410581
Imputation RMSE: 3.73517

Iter: 200
MAPE: 0.0269572
RMSE: 2.77505

Iter: 400
MAPE: 0.0265325
RMSE: 2.75543

Iter: 600
MAPE: 0.0265758
RMSE: 2.76589

Iter: 800
MAPE: 0.0265145
RMSE: 2.75908

Iter: 1000
MAPE: 0.0265769
RMSE: 2.76174

Imputation MAPE: 0.02657

0.04417594463639179
Final RMSE:
3.843017844653421



In [14]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.7

### Random missing (RM)
sparse_tensor = dense_tensor * np.round(random_tensor + 0.5 - missing_rate)
del data, random_tensor

## Test BGCP Model
pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
var = dense_tensor[pos_test]

rank = 10
burnin_iter = 1000
gibbs_iter = 200

for i in range(1, 7):
    print('Graph partitioning: {}.'.format(2 ** i))
    tensor_hat = np.zeros(dense_tensor.shape)
    road = graph_pems.values[:, i - 1]
    for d in range(2 ** i):
        pos = np.where(road == d)
        dense = dense_tensor[pos[0], :, :]
        sparse = sparse_tensor[pos[0], :, :]
        dim = np.array(sparse.shape)
        factor = []
        for k in range(len(dim)):
            factor.append(0.1 * np.random.randn(dim[k], rank))
        small_tensor, _ = BGCP(dense, sparse, factor, burnin_iter, gibbs_iter)
        tensor_hat[pos[0], :, :] = small_tensor
    print('Final MAPE:')
    print(compute_mape(var, tensor_hat[pos_test]))
    print('Final RMSE:')
    print(compute_rmse(var, tensor_hat[pos_test]))
    print()

Graph partitioning: 2.
Iter: 200
MAPE: 0.0501214
RMSE: 4.2978

Iter: 400
MAPE: 0.0494994
RMSE: 4.27408

Iter: 600
MAPE: 0.0495413
RMSE: 4.27394

Iter: 800
MAPE: 0.0495404
RMSE: 4.27343

Iter: 1000
MAPE: 0.0495385
RMSE: 4.27318

Imputation MAPE: 0.0495375
Imputation RMSE: 4.27302

Iter: 200
MAPE: 0.0501903
RMSE: 4.2712

Iter: 400
MAPE: 0.0493949
RMSE: 4.24883

Iter: 600
MAPE: 0.0493933
RMSE: 4.24866

Iter: 800
MAPE: 0.0493923
RMSE: 4.24848

Iter: 1000
MAPE: 0.0493935
RMSE: 4.24839

Imputation MAPE: 0.0493954
Imputation RMSE: 4.24834

Final MAPE:
0.04946643043480968
Final RMSE:
4.260700469528457

Graph partitioning: 4.
Iter: 200
MAPE: 0.0421763
RMSE: 3.85313

Iter: 400
MAPE: 0.041799
RMSE: 3.84778

Iter: 600
MAPE: 0.0417882
RMSE: 3.84641

Iter: 800
MAPE: 0.0417741
RMSE: 3.84547

Iter: 1000
MAPE: 0.0417507
RMSE: 3.84436

Imputation MAPE: 0.0416546
Imputation RMSE: 3.83901

Iter: 200
MAPE: 0.05766
RMSE: 4.65573

Iter: 400
MAPE: 0.057269
RMSE: 4.64866

Iter: 600
MAPE: 0.0572093
RMSE: 4.6454

Iter: 800
MAPE: 0.0332621
RMSE: 3.22131

Iter: 1000
MAPE: 0.0332819
RMSE: 3.22406

Imputation MAPE: 0.0332694
Imputation RMSE: 3.22368

Iter: 200
MAPE: 0.0582894
RMSE: 4.61127

Iter: 400
MAPE: 0.057903
RMSE: 4.60246

Iter: 600
MAPE: 0.0571069
RMSE: 4.5602

Iter: 800
MAPE: 0.0568283
RMSE: 4.55332

Iter: 1000
MAPE: 0.0562893
RMSE: 4.51972

Imputation MAPE: 0.0561082
Imputation RMSE: 4.51157

Iter: 200
MAPE: 0.0402176
RMSE: 3.53963

Iter: 400
MAPE: 0.0399939
RMSE: 3.53823

Iter: 600
MAPE: 0.0399807
RMSE: 3.53722

Iter: 800
MAPE: 0.0399786
RMSE: 3.53662

Iter: 1000
MAPE: 0.0399787
RMSE: 3.53646

Imputation MAPE: 0.0399762
Imputation RMSE: 3.53611

Iter: 200
MAPE: 0.0483219
RMSE: 3.98516

Iter: 400
MAPE: 0.0482803
RMSE: 3.9895

Iter: 600
MAPE: 0.0481887
RMSE: 3.98576

Iter: 800
MAPE: 0.0481883
RMSE: 3.98555

Iter: 1000
MAPE: 0.048167
RMSE: 3.98419

Imputation MAPE: 0.048177
Imputation RMSE: 3.98256

Iter: 200
MAPE: 0.0445876
RMSE: 3.96072

Iter: 400
MAPE: 0.0442919
RMSE: 3.96256

Iter: 600


Iter: 200
MAPE: 0.0624691
RMSE: 4.91819

Iter: 400
MAPE: 0.0623202
RMSE: 4.92758

Iter: 600
MAPE: 0.0623091
RMSE: 4.92523

Iter: 800
MAPE: 0.0622235
RMSE: 4.91994

Iter: 1000
MAPE: 0.0621722
RMSE: 4.91596

Imputation MAPE: 0.0619301
Imputation RMSE: 4.90335

Iter: 200
MAPE: 0.0546664
RMSE: 4.49511

Iter: 400
MAPE: 0.0542495
RMSE: 4.49159

Iter: 600
MAPE: 0.0541629
RMSE: 4.47843

Iter: 800
MAPE: 0.0541501
RMSE: 4.47482

Iter: 1000
MAPE: 0.0540071
RMSE: 4.46772

Imputation MAPE: 0.0538709
Imputation RMSE: 4.46762

Iter: 200
MAPE: 0.0300028
RMSE: 2.94765

Iter: 400
MAPE: 0.030138
RMSE: 3.02899

Iter: 600
MAPE: 0.0301679
RMSE: 3.03764

Iter: 800
MAPE: 0.0301602
RMSE: 3.03892

Iter: 1000
MAPE: 0.0301631
RMSE: 3.03901

Imputation MAPE: 0.0301515
Imputation RMSE: 3.03854

Iter: 200
MAPE: 0.0336105
RMSE: 3.26877

Iter: 400
MAPE: 0.0332028
RMSE: 3.25686

Iter: 600
MAPE: 0.0330659
RMSE: 3.24233

Iter: 800
MAPE: 0.0329328
RMSE: 3.23341

Iter: 1000
MAPE: 0.0329239
RMSE: 3.2338

Imputation MAPE: 0.

Iter: 200
MAPE: 0.0745759
RMSE: 5.41812

Iter: 400
MAPE: 0.0741727
RMSE: 5.39881

Iter: 600
MAPE: 0.0740613
RMSE: 5.39382

Iter: 800
MAPE: 0.0739819
RMSE: 5.39085

Iter: 1000
MAPE: 0.0739422
RMSE: 5.38918

Imputation MAPE: 0.0739313
Imputation RMSE: 5.38824

Iter: 200
MAPE: 0.0515911
RMSE: 4.26283

Iter: 400
MAPE: 0.0506782
RMSE: 4.22281

Iter: 600
MAPE: 0.050636
RMSE: 4.21769

Iter: 800
MAPE: 0.0505176
RMSE: 4.20207

Iter: 1000
MAPE: 0.050561
RMSE: 4.20342

Imputation MAPE: 0.0505284
Imputation RMSE: 4.20286

Iter: 200
MAPE: 0.03996
RMSE: 3.69247

Iter: 400
MAPE: 0.0402234
RMSE: 3.69611

Iter: 600
MAPE: 0.0401611
RMSE: 3.6909

Iter: 800
MAPE: 0.0399981
RMSE: 3.68286

Iter: 1000
MAPE: 0.039908
RMSE: 3.68313

Imputation MAPE: 0.0398766
Imputation RMSE: 3.68307

Iter: 200
MAPE: 0.0275362
RMSE: 2.83256

Iter: 400
MAPE: 0.0273026
RMSE: 2.82746

Iter: 600
MAPE: 0.0272365
RMSE: 2.82562

Iter: 800
MAPE: 0.0271944
RMSE: 2.82291

Iter: 1000
MAPE: 0.0271436
RMSE: 2.8209

Imputation MAPE: 0.02705

Imputation MAPE: 0.0271784
Imputation RMSE: 2.81304

Final MAPE:
0.04445674627225727
Final RMSE:
3.8677860628309184



In [15]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

## Test BGCP Model
pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
var = dense_tensor[pos_test]

rank = 10
burnin_iter = 1000
gibbs_iter = 200

for i in range(1, 3):
    print('Graph partitioning: {}.'.format(2 ** i))
    tensor_hat = np.zeros(dense_tensor.shape)
    road = graph_pems.values[:, i - 1]
    for d in range(2 ** i):
        pos = np.where(road == d)
        dense = dense_tensor[pos[0], :, :]
        sparse = sparse_tensor[pos[0], :, :]
        dim = np.array(sparse.shape)
        factor = []
        for k in range(len(dim)):
            factor.append(0.1 * np.random.randn(dim[k], rank))
        small_tensor, _ = BGCP(dense, sparse, factor, burnin_iter, gibbs_iter)
        tensor_hat[pos[0], :, :] = small_tensor
    print('Final MAPE:')
    print(compute_mape(var, tensor_hat[pos_test]))
    print('Final RMSE:')
    print(compute_rmse(var, tensor_hat[pos_test]))
    print()

Graph partitioning: 2.
Iter: 200
MAPE: 0.0542215
RMSE: 4.5651

Iter: 400
MAPE: 0.0541471
RMSE: 4.57107

Iter: 600
MAPE: 0.0540703
RMSE: 4.56889

Iter: 800
MAPE: 0.0540221
RMSE: 4.56543

Iter: 1000
MAPE: 0.0539749
RMSE: 4.56156

Imputation MAPE: 0.0539417
Imputation RMSE: 4.55892

Iter: 200
MAPE: 0.0538144
RMSE: 4.56465

Iter: 400
MAPE: 0.053888
RMSE: 4.61358

Iter: 600
MAPE: 0.0537842
RMSE: 4.61169

Iter: 800
MAPE: 0.0534141
RMSE: 4.58739

Iter: 1000
MAPE: 0.0532095
RMSE: 4.57825

Imputation MAPE: 0.0529795
Imputation RMSE: 4.56508

Final MAPE:
0.05346087450954932
Final RMSE:
4.561998778200497

Graph partitioning: 4.
Iter: 200
MAPE: 0.044191
RMSE: 4.04141

Iter: 400
MAPE: 0.0446217
RMSE: 4.10339

Iter: 600
MAPE: 0.0445585
RMSE: 4.09167

Iter: 800
MAPE: 0.0444608
RMSE: 4.08268

Iter: 1000
MAPE: 0.0443609
RMSE: 4.0753

Imputation MAPE: 0.0438272
Imputation RMSE: 4.03014

Iter: 200
MAPE: 0.062265
RMSE: 4.91926

Iter: 400
MAPE: 0.0616108
RMSE: 4.92458

Iter: 600
MAPE: 0.0615068
RMSE: 4.926

In [16]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

## Test BGCP Model
pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
var = dense_tensor[pos_test]

rank = 10
burnin_iter = 1000
gibbs_iter = 200

i = 3
print('Graph partitioning: {}.'.format(2 ** i))
tensor_hat = np.zeros(dense_tensor.shape)
road = graph_pems.values[:, i - 1]
for d in range(2 ** i):
    pos = np.where(road == d)
    dense = dense_tensor[pos[0], :, :]
    sparse = sparse_tensor[pos[0], :, :]
    dim = np.array(sparse.shape)
    factor = []
    for k in range(len(dim)):
        factor.append(0.1 * np.random.randn(dim[k], rank))
    small_tensor, _ = BGCP(dense, sparse, factor, burnin_iter, gibbs_iter)
    tensor_hat[pos[0], :, :] = small_tensor
print('Final MAPE:')
print(compute_mape(var, tensor_hat[pos_test]))
print('Final RMSE:')
print(compute_rmse(var, tensor_hat[pos_test]))
print()

Graph partitioning: 8.
Iter: 200
MAPE: 0.0505229
RMSE: 4.48282

Iter: 400
MAPE: 0.0501333
RMSE: 4.49224

Iter: 600
MAPE: 0.0493048
RMSE: 4.42449

Iter: 800
MAPE: 0.0491856
RMSE: 4.41725

Iter: 1000
MAPE: 0.0491646
RMSE: 4.42121

Imputation MAPE: 0.0492311
Imputation RMSE: 4.43015

Iter: 200
MAPE: 0.0378236
RMSE: 3.56499

Iter: 400
MAPE: 0.0373933
RMSE: 3.53334

Iter: 600
MAPE: 0.037252
RMSE: 3.53322

Iter: 800
MAPE: 0.0370593
RMSE: 3.5266

Iter: 1000
MAPE: 0.0369912
RMSE: 3.52481

Imputation MAPE: 0.0369821
Imputation RMSE: 3.53131

Iter: 200
MAPE: 0.0498742
RMSE: 4.26603

Iter: 400
MAPE: 0.0501878
RMSE: 4.33354

Iter: 600
MAPE: 0.0498952
RMSE: 4.28586

Iter: 800
MAPE: 0.0497015
RMSE: 4.2594

Iter: 1000
MAPE: 0.0495635
RMSE: 4.24844

Imputation MAPE: 0.0494772
Imputation RMSE: 4.24721

Iter: 200
MAPE: 0.0753886
RMSE: 5.55998

Iter: 400
MAPE: 0.0741768
RMSE: 5.53431

Iter: 600
MAPE: 0.0738511
RMSE: 5.52855

Iter: 800
MAPE: 0.0737909
RMSE: 5.5281

Iter: 1000
MAPE: 0.0737771
RMSE: 5.53011

In [None]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

## Test BGCP Model
pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
var = dense_tensor[pos_test]

rank = 10
burnin_iter = 1000
gibbs_iter = 200

for i in range(4, 7):
    print('Graph partitioning: {}.'.format(2 ** i))
    tensor_hat = np.zeros(dense_tensor.shape)
    road = graph_pems.values[:, i - 1]
    for d in range(2 ** i):
        pos = np.where(road == d)
        dense = dense_tensor[pos[0], :, :]
        sparse = sparse_tensor[pos[0], :, :]
        dim = np.array(sparse.shape)
        factor = []
        for k in range(len(dim)):
            factor.append(0.1 * np.random.randn(dim[k], rank))
        small_tensor, _ = BGCP(dense, sparse, factor, burnin_iter, gibbs_iter)
        tensor_hat[pos[0], :, :] = small_tensor
    print('Final MAPE:')
    print(compute_mape(var, tensor_hat[pos_test]))
    print('Final RMSE:')
    print(compute_rmse(var, tensor_hat[pos_test]))
    print()

In [None]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.7

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
del data, random_matrix, binary_tensor

## Test BGCP Model
pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
var = dense_tensor[pos_test]

rank = 10
burnin_iter = 1000
gibbs_iter = 200

for i in range(1, 7):
    print('Graph partitioning: {}.'.format(2 ** i))
    tensor_hat = np.zeros(dense_tensor.shape)
    road = graph_pems.values[:, i - 1]
    for d in range(2 ** i):
        pos = np.where(road == d)
        dense = dense_tensor[pos[0], :, :]
        sparse = sparse_tensor[pos[0], :, :]
        dim = np.array(sparse.shape)
        factor = []
        for k in range(len(dim)):
            factor.append(0.1 * np.random.randn(dim[k], rank))
        small_tensor, _ = BGCP(dense, sparse, factor, burnin_iter, gibbs_iter)
        tensor_hat[pos[0], :, :] = small_tensor
    print('Final MAPE:')
    print(compute_mape(var, tensor_hat[pos_test]))
    print('Final RMSE:')
    print(compute_rmse(var, tensor_hat[pos_test]))
    print()

### License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>