# Tensor Decomposition with Alternating Least Square (ALS)

---
**About this notebook**: In many real-world applications, data are multi-dimensional tensors by nature rather than table matrices. Then, we provide tensor decomposition techniques using an iterative Alternating Least Square (ALS), which is a good starting point for understanding tensor decomposition. For an in-depth discussion of tensor decomposition, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis and C. Faloutsos (2017). <b>Tensor Decomposition for Signal Processing and Machine Learning</b>. IEEE Transactions on Signal Processing. <a href="https://arxiv.org/pdf/1607.01668.pdf" title="PDF"><b>[PDF]</b></a> 
</font>
</div>

---

## 1.1 Tensor Decomposition Family

### 1.1.1 Tucker Decomposition

The idea of tensor decomposition/factorization is to find a low-rank structure approximating the original data. In mathematics, Tucker decomposition decomposes a tensor into a set of matrices and one small core tensor [[**wiki**](https://en.wikipedia.org/wiki/Tucker_decomposition)]. Formally, given a third-order tensor $\mathcal{Y}\in\mathbb{R}^{M\times N\times T}$, the Tucker form of a tensor (also known as Tucker decomposition) with low-rank $\left(R_1,R_2,R_3\right)$ is defined as

$$\mathcal{Y}\approx\mathcal{G}\times_1 U\times_2 V\times_3 X,$$
where $\mathcal{G}\in\mathbb{R}^{R_1\times R_2\times R_3}$ is core tensor, and $U\in\mathbb{R}^{M\times R_1},V\in\mathbb{R}^{N\times R_2},X\in\mathbb{R}^{T\times R_3}$ are factor matrices.

Element-wise, for any $(i,j,t)$-th entry in tensor $\mathcal{Y}$, the above formula of Tucker decomposition can be rewritten as

$$y_{ijt}\approx\sum_{r_1=1}^{R_1}\sum_{r_2=1}^{R_2}\sum_{r_3=1}^{R_3}g_{r_1r_2r_3}u_{ir_1}v_{jr_2}x_{tr_3},$$
where $i=1,2,...,M$, $j=1,2,...,N$, and $t=1,2,...,T$.

In [1]:
import numpy as np
def tucker_combine(core_tensor, mat1, mat2, mat3):
    return np.einsum('abc, ia, jb, tc -> ijt', core_tensor, mat1, mat2, mat3)

In [2]:
import numpy as np
dim1 = 2
dim2 = 2
dim3 = 3
r1 = 2
r2 = 2
r3 = 2
core_tensor = np.random.rand(r1, r2, r3)
mat1 = np.random.rand(dim1, r1)
mat2 = np.random.rand(dim2, r2)
mat3 = np.random.rand(dim3, r3)
tensor = tucker_combine(core_tensor, mat1, mat2, mat3)
print(tensor)
print()
print('tensor size:')
print(tensor.shape)

[[[0.30204409 0.45306185 0.44171491]
  [0.39689848 0.47625818 0.52061651]]

 [[0.69817681 1.25778476 1.12677462]
  [0.88807325 1.35070885 1.30808337]]]

tensor size:
(2, 2, 3)


### 1.1.2 CP Decomposition

Another common-used type of tensor decomposition is CANDECOMP/PARAFAC (CP) decomposition. This form assumes that a data tensor is approximated by a sum of outer products of few factor vectors. Specifically, given a third-order tensor $\mathcal{Y}\in\mathbb{R}^{M\times N\times T}$, CP decomposition is

$$\mathcal{Y}\approx\sum_{r=1}^{R}\boldsymbol{u}_{r}\circ\boldsymbol{v}_{r}\circ\boldsymbol{x}_{r},$$
where vector $\boldsymbol{u}_{r}\in\mathbb{R}^{M}$ is $r$-th column of factor matrix $U\in\mathbb{R}^{M\times R}$, and there are same definitions for vectors $\boldsymbol{v}_{r}\in\mathbb{R}^{N}$ and $\boldsymbol{x}_{r}\in\mathbb{R}^{T}$ in factor matrices $V\in\mathbb{R}^{N\times R}$ and $X\in\mathbb{R}^{T\times R}$, respectively. In fact, the outer product of these vectors is a rank-one tensor, therefore, we could approximate original data by $R$ rank-one tensors.

Element-wise, for any $(i,j,t)$-th entry in tensor $\mathcal{Y}$, we have

$$y_{ijt}\approx\sum_{r=1}^{R}u_{ir}v_{jr}x_{tr},$$
where $i=1,2,...,M$, $j=1,2,...,N$, and $t=1,2,...,T$. The symbol $\circ$ denotes vector outer product.

- **Example**:

Given matrices $U=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]\in\mathbb{R}^{2\times 2}$, $V=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ \end{array} \right]\in\mathbb{R}^{3\times 2}$ and $X=\left[ \begin{array}{cc} 1 & 5 \\ 2 & 6 \\ 3 & 7 \\ 4 & 8 \\ \end{array} \right]\in\mathbb{R}^{4\times 2}$, then if $\hat{\mathcal{Y}}=\sum_{r=1}^{R}\boldsymbol{u}_{r}\circ\boldsymbol{v}_{r}\circ\boldsymbol{x}_{r}$, then, we have

$$\hat{Y}_1=\hat{\mathcal{Y}}(:,:,1)=\left[ \begin{array}{ccc} 31 & 42 & 65 \\ 63 & 86 & 135 \\ \end{array} \right],$$
$$\hat{Y}_2=\hat{\mathcal{Y}}(:,:,2)=\left[ \begin{array}{ccc} 38 & 52 & 82 \\ 78 & 108 & 174 \\ \end{array} \right],$$
$$\hat{Y}_3=\hat{\mathcal{Y}}(:,:,3)=\left[ \begin{array}{ccc} 45 & 62 & 99 \\ 93 & 130 & 213 \\ \end{array} \right],$$
$$\hat{Y}_4=\hat{\mathcal{Y}}(:,:,4)=\left[ \begin{array}{ccc} 52 & 72 & 116 \\ 108 & 152 & 252 \\ \end{array} \right].$$

In [3]:
import numpy as np
def cp_combine(mat1, mat2, mat3):
    return np.einsum('ir, jr, tr -> ijt', mat1, mat2, mat3)

In [4]:
U = np.array([[1, 2], [3, 4]])
V = np.array([[1, 3], [2, 4], [5, 6]])
X = np.array([[1, 5], [2, 6], [3, 7], [4, 8]])
print(cp_combine(U, V, X))
print()
print('tensor size:')
print(cp_combine(U, V, X).shape)

[[[ 31  38  45  52]
  [ 42  52  62  72]
  [ 65  82  99 116]]

 [[ 63  78  93 108]
  [ 86 108 130 152]
  [135 174 213 252]]]

tensor size:
(2, 3, 4)


## 1.2 Optimization Problem

In this section, we explain Tucker decomposition and CP decomposition using the Alternating Least Square (ALS) algorithm.

### 1.2.1 Tucker Decomposition using ALS

In Tucker decomposition, learning is performed by minimizing the loss function (i.e., sum of residual errors) over core tensor and factor matrices:

$$\min_{\mathcal{G},U,V,X}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\sum_{r_1=1}^{R_1}\sum_{r_2=1}^{R_2}\sum_{r_3=1}^{R_3}g_{r_1r_2r_3}u_{ir_1}v_{jr_2}x_{tr_3}\right)^2,$$
where $y_{ijt}$ is observed if $(i,j,t)\in\Omega$, and $\Omega$ denotes a set of 3-tuple indices.

The main challenge of solving this optimization is the need for learning core tensors and factor matrices simultaneously. One way is through the alternating Least Square (ALS) algorithm, which devides parameters into several disjoint sub-problems and iteratively minimizes the loss function with respect to the parameter of each sub-problem. To be specific, ALS optimizes alternatively over one of these parameters, such as $\mathcal{G}$, $U$, $V$, or $X$ in Tucker decomposition, while keeping others fixed. The reason is that the separated sub-problem in one parameter is convex.

We could, for example, consider the following optimization problem for core tensor $\mathcal{G}\in\mathbb{R}^{R_1\times R_2\times R_3}$:

$$\min_{\mathcal{G}}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\sum_{r_1=1}^{R_1}\sum_{r_2=1}^{R_2}\sum_{r_3=1}^{R_3}g_{r_1r_2r_3}u_{ir_1}v_{jr_2}x_{tr_3}\right)^2,$$

$$\Rightarrow\min_{\mathcal{G}}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)^\top\text{vec}\left(\mathcal{G}\right)\right)^2,$$

$$\Rightarrow\min_{\mathcal{G}}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)^\top\text{vec}\left(\mathcal{G}\right)\right)^\top\left(y_{ijt}-\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)^\top\text{vec}\left(\mathcal{G}\right)\right),$$
where the symbol $\text{vec}\left(\cdot\right)$ denotes vectorization on matrix (or tensor).

---
**Theorem 1**: Suppose $d$-th order tensor $\mathcal{G}\in\mathbb{R}^{n_1\times n_2\times\cdots\times n_d}$ and matrices $U_{k}\in\mathbb{R}^{m_k\times n_k}$ for $k=1,2,...,d$. If the tensor $\mathcal{A}\in\mathbb{R}^{m_1\times m_2\times\cdots\times m_d}$ is the multi-linear product

$$\mathcal{A}=\mathcal{G}\times_1 U_1\times_2 U_2\times_3\cdots\times_d U_d,$$
then

$$\mathcal{A}_{(k)}=U_{k}\mathcal{G}_{(k)}\left(U_d\otimes\cdots\otimes U_{k+1}\otimes U_{k-1}\otimes\cdots\otimes U_1\right)^\top,$$
and

$$\text{vec}\left(\mathcal{A}\right)=\left(U_d\otimes\cdots\otimes U_2\otimes U_1\right)\text{vec}\left(\mathcal{G}\right).$$

If $U_1,U_2,...,U_d$ are all non-singluar, then $\mathcal{G}=\mathcal{A}\times_1U_1^{-1}\times_2U_2^{-1}\times_3\cdots\times_dU_d^{-1}$.

**Reference**: Gene H. Golub, Charles F. Van Loan, 2012. Matrix Computations (4th Edition). (page: 728-729)

---

Obviously, the solution for updating core tensor $\mathcal{G}$ is

$$\text{vec}\left(\mathcal{G}\right)\Leftarrow\left(\sum_{(i,j,t)\in\Omega}\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)^\top\right)^{-1}\left(\sum_{(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)\right).$$

Similar to core tensor $\mathcal{G}$, the optimization problem for factor matrix $U\in\mathbb{R}^{M\times R_1}$ can be written as follows,

$$\min_{U}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\sum_{r_1=1}^{R_1}\sum_{r_2=1}^{R_2}\sum_{r_3=1}^{R_3}g_{r_1r_2r_3}u_{ir_1}v_{jr_2}x_{tr_3}\right)^2,$$
and this optimzation can be decomposed into independent least square problems with $\boldsymbol{u}_{i}\in\mathbb{R}^{R},i=1,2,...,M$:

$$\min_{\boldsymbol{u}_i}\sum_{j,t:(i,j,t)\in\Omega}\left(y_{ijt}-\boldsymbol{u}_i^\top\mathcal{G}_{(1)}\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\right)\right)\left(y_{ijt}-\boldsymbol{u}_i^\top\mathcal{G}_{(1)}\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\right)\right)^\top.$$

In such case, we could derive the least square as

$$\boldsymbol{u}_{i}\Leftarrow\left(\sum_{j,t:(i,j,t)\in\Omega}\mathcal{G}_{(1)}\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\right)\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\right)^\top\mathcal{G}_{(1)}^\top\right)^{-1}\left(\sum_{j,t:(i,j,t)\in\Omega}y_{ijt}\mathcal{G}_{(1)}\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\right)\right),\forall i\in\left\{1,2,...,M\right\}.$$

The alternating least squares for $V\in\mathbb{R}^{N\times R_2}$ and $X\in\mathbb{R}^{T\times R_3}$ are

$$\boldsymbol{v}_{j}\Leftarrow\left(\sum_{i,t:(i,j,t)\in\Omega}\mathcal{G}_{(2)}\left(\boldsymbol{x}_{t}\odot\boldsymbol{u}_{i}\right)\left(\boldsymbol{x}_{t}\odot\boldsymbol{u}_{i}\right)^\top\mathcal{G}_{(2)}^\top\right)^{-1}\left(\sum_{i,t:(i,j,t)\in\Omega}y_{ijt}\mathcal{G}_{(2)}\left(\boldsymbol{x}_{t}\odot\boldsymbol{u}_{i}\right)\right),\forall j\in\left\{1,2,...,N\right\},$$

$$\boldsymbol{x}_{t}\Leftarrow\left(\sum_{i,j:(i,j,t)\in\Omega}\mathcal{G}_{(3)}\left(\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)\left(\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)^\top\mathcal{G}_{(3)}^\top\right)^{-1}\left(\sum_{i,j:(i,j,t)\in\Omega}y_{ijt}\mathcal{G}_{(3)}\left(\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)\right),\forall t\in\left\{1,2,...,T\right\}.$$


In [5]:
'''Prerequisite functions:'''

import numpy as np

def kr_prod(a, b):
    return np.einsum('ir, jr -> ijr', a, b).reshape(a.shape[0] * b.shape[0], -1)

def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def mat2ten(mat, tensor_size, mode):
    index = list()
    index.append(mode)
    for i in range(tensor_size.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(tensor_size[index]), order = 'F'), 0, mode)

def mat2vec(mat):
    dim1, dim2 = mat.shape
    return mat.T.reshape([dim1 * dim2])

def vec2mat(vec, mat_size):
    return vec.reshape([mat_size[1], mat_size[0]]).T

def Tucker_ALS(sparse_tensor, rank, maxiter):
    dim1, dim2, dim3 = sparse_tensor.shape
    rank1 = rank[0]
    rank2 = rank[1]
    rank3 = rank[2]
    
    G = 0.1 * np.random.rand(rank1, rank2, rank3)
    U = 0.1 * np.random.rand(dim1, rank1)
    V = 0.1 * np.random.rand(dim2, rank2)
    X = 0.1 * np.random.rand(dim3, rank3)
    
    pos = np.where(sparse_tensor != 0)
    binary_tensor = np.zeros((dim1, dim2, dim3))
    binary_tensor[pos] = 1
    tensor_hat = np.zeros((dim1, dim2, dim3))
    
    for iters in range(maxiter):
        small_mat = np.zeros((rank1 * rank2 * rank3, rank1 * rank2 * rank3))
        small_vec = np.zeros((rank1 * rank2 * rank3))
        for ind in range(pos[0].shape[0]):
            vec0 = kr_prod(kr_prod(X[pos[2][ind], :].reshape([rank3, 1]), 
                                   V[pos[1][ind], :].reshape([rank2, 1])), 
                           U[pos[0][ind], :].reshape([rank1, 1]))
            vec0 = vec0.reshape([rank1 * rank2 * rank3])
            small_mat += np.outer(vec0, vec0)
            small_vec += sparse_tensor[pos[0][ind], pos[1][ind], pos[2][ind]] * vec0
        small_mat = (small_mat + small_mat.T)/2
        G_vec = np.matmul(np.linalg.inv(small_mat), small_vec)
        G = mat2ten(vec2mat(G_vec, np.array([rank1, rank2 * rank3])), np.array([rank1, rank2, rank3]), 0)

        G1 = ten2mat(G, 0)
        var1 = np.matmul(G1, np.kron(X, V).T)
        var2 = kr_prod(var1, var1)
        var3 = np.matmul(var2, ten2mat(binary_tensor, 0).T).reshape([rank1, rank1, dim1])
        var4 = np.matmul(var1, ten2mat(sparse_tensor, 0).T)
        for i in range(dim1):
            var_Lambda = var3[ :, :, i]
            inv_var_Lambda = np.linalg.inv((var_Lambda + var_Lambda.T)/2)
            U[i, :] = np.matmul(inv_var_Lambda, var4[:, i])

        G2 = ten2mat(G, 1)
        var1 = np.matmul(G2, np.kron(X, U).T)
        var2 = kr_prod(var1, var1)
        var3 = np.matmul(var2, ten2mat(binary_tensor, 1).T).reshape([rank2, rank2, dim2])
        var4 = np.matmul(var1, ten2mat(sparse_tensor, 1).T)
        for j in range(dim2):
            var_Lambda = var3[ :, :, j]
            inv_var_Lambda = np.linalg.inv((var_Lambda + var_Lambda.T)/2)
            V[j, :] = np.matmul(inv_var_Lambda, var4[:, j])

        G3 = ten2mat(G, 2)
        var1 = np.matmul(G3, np.kron(V, U).T)
        var2 = kr_prod(var1, var1)
        var3 = np.matmul(var2, ten2mat(binary_tensor, 2).T).reshape([rank3, rank3, dim3])
        var4 = np.matmul(var1, ten2mat(sparse_tensor, 2).T)
        for t in range(dim3):
            var_Lambda = var3[ :, :, t]
            inv_var_Lambda = np.linalg.inv((var_Lambda + var_Lambda.T)/2)
            X[t, :] = np.matmul(inv_var_Lambda, var4[:, t])

        tensor_hat = tucker_combine(G, U, V, X)
        mape = np.sum(np.abs(sparse_tensor[pos] - tensor_hat[pos])/sparse_tensor[pos])/sparse_tensor[pos].shape[0]
        rmse = np.sqrt(np.sum((sparse_tensor[pos] - tensor_hat[pos]) ** 2)/sparse_tensor[pos].shape[0])
        
        if (iters + 1) % 5 == 0:
            print('Iter: {}'.format(iters + 1))
            print('Training MAPE: {:.6}'.format(mape))
            print('Training RMSE: {:.6}'.format(rmse))
            print()
    
    return tensor_hat, G, U, V, X

- **Example: Data Imputation**

Given a small dataset with form of third-order tensor $\mathcal{X}\in\mathbb{R}^{7\times 4\times 3}$, i.e.,

$$\mathcal{X}(:, :, 1)=\left[\begin{array}{cccc}{155} & {74} & {493} & {426} \\ {108} & {44} & {350} & {359} \\ {175} & {78} & {567} & {581} \\ {181} & {111} & {517} & {552} \\ {137} & {53} & {489} & {485} \\ {90} & {44} & {306} & {290} \\ {139} & {55} & {398} & {390}\end{array}\right],\mathcal{X}( :, :, 2)=\left[\begin{array}{cccc}{172} & {69} & {590} & {386} \\ {104} & {39} & {310} & {304} \\ {158} & {74} & {505} & {546} \\ {176} & {90} & {525} & {552} \\ {150} & {64} & {438} & {459} \\ {73} & {32} & {281} & {299} \\ {127} & {51} & {358} & {382}\end{array}\right],\mathcal{X}( :, :, 3)=\left[\begin{array}{cccc}{225} & {92} & {443} & {436} \\ {94} & {44} & {355} & {356} \\ {139} & {77} & {575} & {604} \\ {175} & {98} & {574} & {553} \\ {126} & {67} & {593} & {484} \\ {58} & {49} & {348} & {301} \\ {144} & {71} & {444} & {396}\end{array}\right],$$
where 7 indicates 7 spatial locations (or sensors), 4 indicates 4 days, and 3 indicates 3 15-minute time intervals. The unit of tensor entries is vehicle per 15 min.

Suppose an incomplete tensor $\mathcal{Y}$:

$$\mathcal{X}(:, :, 1)=\left[\begin{array}{cccc}{155} & {74} & {493} & {426} \\ {108} & {0} & {0} & {0} \\ {175} & {78} & {0} & {0} \\ {0} & {111} & {517} & {0} \\ {137} & {53} & {489} & {0} \\ {90} & {44} & {0} & {0} \\ {139} & {0} & {398} & {0}\end{array}\right],\mathcal{X}( :, :, 2)=\left[\begin{array}{cccc}{172} & {69} & {590} & {0} \\ {104} & {0} & {310} & {304} \\ {0} & {0} & {505} & {546} \\ {0} & {90} & {525} & {552} \\ {0} & {64} & {0} & {0} \\ {73} & {32} & {281} & {299} \\ {127} & {0} & {0} & {0}\end{array}\right],\mathcal{X}( :, :, 3)=\left[\begin{array}{cccc}{225} & {0} & {0} & {436} \\ {0} & {44} & {0} & {356} \\ {0} & {0} & {575} & {604} \\ {175} & {98} & {574} & {0} \\ {126} & {67} & {0} & {0} \\ {58} & {49} & {348} & {0} \\ {144} & {0} & {444} & {0}\end{array}\right],$$
and the problem is to estimate the tensor entries filled with 0.

In [6]:
dim1 = 7
dim2 = 4
dim3 = 3
dense_tensor = np.zeros((dim1, dim2, dim3))
dense_tensor[:, :, 0] = np.array([[155, 74, 493, 426], [108, 44, 350, 359],
                                  [175, 78, 567, 581], [181, 111, 517, 552],
                                  [137, 53, 489, 485], [90, 44, 306, 290],
                                  [139, 55, 398, 390]])
dense_tensor[:, :, 1] = np.array([[172, 69, 590, 386], [104, 39, 310, 304],
                                  [158, 74, 505, 546], [176, 90, 525, 552],
                                  [150, 64, 438, 459], [73, 32, 281, 299],
                                  [127, 51, 358, 382]])
dense_tensor[:, :, 2] = np.array([[225, 92, 443, 436], [94, 44, 355, 356],
                                  [139, 77, 575, 604], [175, 98, 574, 553],
                                  [126, 67, 593, 484], [58, 49, 348, 301],
                                  [144, 71, 444, 396]])
sparse_tensor = np.zeros((dim1, dim2, dim3))
sparse_tensor[:, :, 0] = np.array([[155, 74, 493, 426], [108, 0, 0, 0],
                                   [175, 78, 0, 0], [0, 111, 517, 0],
                                   [137, 53, 489, 0], [90, 44, 0, 0],
                                   [139, 0, 398, 0]])
sparse_tensor[:, :, 1] = np.array([[172, 69, 590, 0], [104, 0, 310, 304],
                                   [0, 0, 505, 546], [0, 90, 525, 552],
                                   [0, 64, 0, 0], [73, 32, 281, 299],
                                   [127, 0, 0, 0]])
sparse_tensor[:, :, 2] = np.array([[225, 0, 0, 436], [0, 44, 0, 356],
                                   [0, 0, 575, 604], [175, 98, 574, 0],
                                   [126, 67, 0, 0], [58, 49, 348, 0],
                                   [144, 0, 444, 0]])

In [7]:
rank1 = 1
rank2 = 1
rank3 = 1
rank = np.array([rank1, rank2, rank3])
maxiter = 20
tensor_hat, G, U, V, X = Tucker_ALS(sparse_tensor, rank, maxiter)
pos = np.where((dense_tensor != 0) & (sparse_tensor == 0))
final_mape = np.sum(np.abs(dense_tensor[pos] - tensor_hat[pos])/dense_tensor[pos])/dense_tensor[pos].shape[0]
final_rmse = np.sqrt(np.sum((dense_tensor[pos] - tensor_hat[pos]) ** 2)/dense_tensor[pos].shape[0])
print('Final Imputation MAPE: {:.6}'.format(final_mape))
print('Final Imputation RMSE: {:.6}'.format(final_rmse))
print()

Iter: 5
Training MAPE: 0.105292
Training RMSE: 25.7634

Iter: 10
Training MAPE: 0.105465
Training RMSE: 25.7619

Iter: 15
Training MAPE: 0.105465
Training RMSE: 25.7619

Iter: 20
Training MAPE: 0.105465
Training RMSE: 25.7619

Final Imputation MAPE: 0.109442
Final Imputation RMSE: 40.2588



In [8]:
print('Core tensor:')
print(G)
print()
print('Factor matrix U:')
print(U)
print()
print('Factor matrix V:')
print(V)
print()
print('Factor matrix X:')
print(X)
print()

Core tensor:
[[[392054.98021001]]]

Factor matrix U:
[[0.14168981]
 [0.09099751]
 [0.15429428]
 [0.15228069]
 [0.13678592]
 [0.08353816]
 [0.11748884]]

Factor matrix V:
[[0.03487343]
 [0.01656731]
 [0.11047979]
 [0.10554358]]

Factor matrix X:
[[0.07879392]
 [0.08353061]
 [0.08723042]]



---
From the factorized results, we could observe that core tensor $\mathcal{G}\in\mathbb{R}$ is an extreme large value. In that case, Tucker decomposition suffers from the overfitting issue. Therefore, we would like to use the $\ell_{1}$ (or $\ell_{2}$) norm in the following to improve robustness against data outliers.

---

### 1.2.2 CP Decomposition using ALS

Indeed, the formula of CP decomposition is a special case of Tucker decomposition. Mathematically, for any $(i,j,t)$-th entry of a given third-order tensor $\mathcal{Y}$, the form of CP decomposition can be written as

$$y_{ijt}\approx\sum_{r=1}^{R}u_{ir}v_{jr}x_{tr}=\sum_{r_1=1}^{R}\sum_{r_2=1}^{R}\sum_{r_3=1}^{R}g_{r_1r_2r_3}u_{ir_1}v_{jr_2}x_{jr_3},$$
where hyper-diagonal entries of the core tensor $\mathcal{G}$ are 1. In other words, $g_{r_1r_2r_3}=1$ for any $r_1=r_2=r_3$ and $g_{r_1r_2r_3}=0$ otherwise.

Regarding CP decomposition as a machine learning problem, we could perform a learning task by minimizing the loss function over factor matrices like aforementioned Tucker decomposition, that is,

$$\min _{U, V, X} \sum_{(i, j, t) \in \Omega}\left(y_{i j t}-\sum_{r=1}^{R}u_{ir}v_{jr}x_{tr}\right)^{2}.$$

Within this optimization problem, multiplication among three factor matrices (acted as parameters) makes this problem difficult. Therefore, following the aforementioned Tucker fdecomposition scheme, we apply the ALS algorithm for CP decomposition.

In particular, the optimization problem for each row $\boldsymbol{u}_{i}\in\mathbb{R}^{R},\forall i\in\left\{1,2,...,M\right\}$ of factor matrix $U\in\mathbb{R}^{M\times R}$ is given by

$$\min _{\boldsymbol{u}_{i}} \sum_{j,t:(i, j, t) \in \Omega}\left[y_{i j t}-\boldsymbol{u}_{i}^\top\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\right)\right]\left[y_{i j t}-\boldsymbol{u}_{i}^\top\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\right)\right]^\top.$$

The least square for this optimization is

$$u_{i} \Leftarrow\left(\sum_{j, t, i, j, t ) \in \Omega} \left(x_{t} \odot v_{j}\right)\left(x_{t} \odot v_{j}\right)^{\top}\right)^{-1}\left(\sum_{j, t :(i, j, t) \in \Omega} y_{i j t} \left(x_{t} \odot v_{j}\right)\right), \forall i \in\{1,2, \ldots, M\}.$$

The alternating least squares for $V\in\mathbb{R}^{N\times R}$ and $X\in\mathbb{R}^{T\times R}$ are

$$\boldsymbol{v}_{j}\Leftarrow\left(\sum_{i,t:(i,j,t)\in\Omega}\left(\boldsymbol{x}_{t}\odot\boldsymbol{u}_{i}\right)\left(\boldsymbol{x}_{t}\odot\boldsymbol{u}_{i}\right)^\top\right)^{-1}\left(\sum_{i,t:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{x}_{t}\odot\boldsymbol{u}_{i}\right)\right),\forall j\in\left\{1,2,...,N\right\},$$

$$\boldsymbol{x}_{t}\Leftarrow\left(\sum_{i,j:(i,j,t)\in\Omega}\left(\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)\left(\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)^\top\right)^{-1}\left(\sum_{i,j:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)\right),\forall t\in\left\{1,2,...,T\right\}.$$


---
**Theorem 2**: Suppose matrices $U^{(k)}\in\mathbb{R}^{m_k\times n}$ for $k=1,2,...,d$. If the tensor $\mathcal{A}\in\mathbb{R}^{m_1\times m_2\times\cdots\times m_d}$ has such CP model

$$\mathcal{A}=\sum_{r=1}^{n}\boldsymbol{u}_{r}^{(1)}\circ\boldsymbol{u}_{r}^{(2)}\circ\cdots\circ\boldsymbol{u}_{r}^{(d)},$$
then

$$\mathcal{A}_{(k)}=U^{(k)}\left(U^{(d)}\odot\cdots\odot U^{(k+1)}\odot U^{(k-1)}\odot\cdots\odot U^{(1)}\right)^\top.$$

---

In [9]:
'''Prerequisite functions:'''

import numpy as np

def kr_prod(a, b):
    return np.einsum('ir, jr -> ijr', a, b).reshape(a.shape[0] * b.shape[0], -1)

def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def CP_ALS(sparse_tensor, rank, maxiter):
    dim1, dim2, dim3 = sparse_tensor.shape
    
    U = 0.1 * np.random.rand(dim1, rank)
    V = 0.1 * np.random.rand(dim2, rank)
    X = 0.1 * np.random.rand(dim3, rank)
    
    pos = np.where(sparse_tensor != 0)
    binary_tensor = np.zeros((dim1, dim2, dim3))
    binary_tensor[pos] = 1
    tensor_hat = np.zeros((dim1, dim2, dim3))
    
    for iters in range(maxiter):
        var1 = kr_prod(X, V).T
        var2 = kr_prod(var1, var1)
        var3 = np.matmul(var2, ten2mat(binary_tensor, 0).T).reshape([rank, rank, dim1])
        var4 = np.matmul(var1, ten2mat(sparse_tensor, 0).T)
        for i in range(dim1):
            var_Lambda = var3[ :, :, i]
            inv_var_Lambda = np.linalg.inv((var_Lambda + var_Lambda.T)/2)
            U[i, :] = np.matmul(inv_var_Lambda, var4[:, i])

        var1 = kr_prod(X, U).T
        var2 = kr_prod(var1, var1)
        var3 = np.matmul(var2, ten2mat(binary_tensor, 1).T).reshape([rank, rank, dim2])
        var4 = np.matmul(var1, ten2mat(sparse_tensor, 1).T)
        for j in range(dim2):
            var_Lambda = var3[ :, :, j]
            inv_var_Lambda = np.linalg.inv((var_Lambda + var_Lambda.T)/2)
            V[j, :] = np.matmul(inv_var_Lambda, var4[:, j])

        var1 = kr_prod(V, U).T
        var2 = kr_prod(var1, var1)
        var3 = np.matmul(var2, ten2mat(binary_tensor, 2).T).reshape([rank, rank, dim3])
        var4 = np.matmul(var1, ten2mat(sparse_tensor, 2).T)
        for t in range(dim3):
            var_Lambda = var3[ :, :, t]
            inv_var_Lambda = np.linalg.inv((var_Lambda + var_Lambda.T)/2)
            X[t, :] = np.matmul(inv_var_Lambda, var4[:, t])

        tensor_hat = cp_combine(U, V, X)
        mape = np.sum(np.abs(sparse_tensor[pos] - tensor_hat[pos])/sparse_tensor[pos])/sparse_tensor[pos].shape[0]
        rmse = np.sqrt(np.sum((sparse_tensor[pos] - tensor_hat[pos]) ** 2)/sparse_tensor[pos].shape[0])
        
        if (iters + 1) % 20 == 0:
            print('Iter: {}'.format(iters + 1))
            print('Training MAPE: {:.6}'.format(mape))
            print('Training RMSE: {:.6}'.format(rmse))
            print()
    
    return tensor_hat, U, V, X

## 1.3 $\ell_{2}$ Norm Regularization

Re-write the optimization problem over core tensor $\mathcal{G}$:

$$\min_{\mathcal{G}}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)^\top\text{vec}\left(\mathcal{G}\right)\right)^\top\left(y_{ijt}-\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)^\top\text{vec}\left(\mathcal{G}\right)\right)+\lambda_g\text{vec}\left(\mathcal{G}\right)^\top\text{vec}\left(\mathcal{G}\right).$$

Thus, we have

$$\text{vec}\left(\mathcal{G}\right)\Leftarrow\left(\sum_{(i,j,t)\in\Omega}\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)^\top+\lambda_gI\right)^{-1}\left(\sum_{(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{x}_{t}\odot\boldsymbol{v}_{j}\odot\boldsymbol{u}_{i}\right)\right).$$

In [10]:
'''Prerequisite functions:'''

import numpy as np

def kr_prod(a, b):
    return np.einsum('ir, jr -> ijr', a, b).reshape(a.shape[0] * b.shape[0], -1)

def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def mat2ten(mat, tensor_size, mode):
    index = list()
    index.append(mode)
    for i in range(tensor_size.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, list(tensor_size[index]), order = 'F'), 0, mode)

def mat2vec(mat):
    dim1, dim2 = mat.shape
    return mat.T.reshape([dim1 * dim2])

def vec2mat(vec, mat_size):
    return vec.reshape([mat_size[1], mat_size[0]]).T

def Tucker_ALS_L2(sparse_tensor, rank, hyper_lambda, maxiter):
    dim1, dim2, dim3 = sparse_tensor.shape
    rank1 = rank[0]
    rank2 = rank[1]
    rank3 = rank[2]
    
    G = 0.1 * np.random.rand(rank1, rank2, rank3)
    U = 0.1 * np.random.rand(dim1, rank1)
    V = 0.1 * np.random.rand(dim2, rank2)
    X = 0.1 * np.random.rand(dim3, rank3)
    
    pos = np.where(sparse_tensor != 0)
    binary_tensor = np.zeros((dim1, dim2, dim3))
    binary_tensor[pos] = 1
    tensor_hat = np.zeros((dim1, dim2, dim3))
    
    for iters in range(maxiter):
        small_mat = np.zeros((rank1 * rank2 * rank3, rank1 * rank2 * rank3))
        small_vec = np.zeros((rank1 * rank2 * rank3))
        for ind in range(pos[0].shape[0]):
            vec0 = kr_prod(kr_prod(X[pos[2][ind], :].reshape([rank3, 1]), 
                                   V[pos[1][ind], :].reshape([rank2, 1])), 
                           U[pos[0][ind], :].reshape([rank1, 1]))
            vec0 = vec0.reshape([rank1 * rank2 * rank3])
            small_mat += np.outer(vec0, vec0)
            small_vec += sparse_tensor[pos[0][ind], pos[1][ind], pos[2][ind]] * vec0
        small_mat += hyper_lambda[0] * np.eye(r1 * r2 * r3)
        small_mat = (small_mat + small_mat.T)/2
        G_vec = np.matmul(np.linalg.inv(small_mat), small_vec)
        G = mat2ten(vec2mat(G_vec, np.array([rank1, rank2 * rank3])), np.array([rank1, rank2, rank3]), 0)

        G1 = ten2mat(G, 0)
        var1 = np.matmul(G1, np.kron(X, V).T)
        var2 = kr_prod(var1, var1)
        var3 = np.matmul(var2, ten2mat(binary_tensor, 0).T).reshape([rank1, rank1, dim1])
        var4 = np.matmul(var1, ten2mat(sparse_tensor, 0).T)
        for i in range(dim1):
            var_Lambda = var3[ :, :, i] + hyper_lambda[1] * np.eye(rank1)
            inv_var_Lambda = np.linalg.inv((var_Lambda + var_Lambda.T)/2)
            U[i, :] = np.matmul(inv_var_Lambda, var4[:, i] + hyper_lambda[1])

        G2 = ten2mat(G, 1)
        var1 = np.matmul(G2, np.kron(X, U).T)
        var2 = kr_prod(var1, var1)
        var3 = np.matmul(var2, ten2mat(binary_tensor, 1).T).reshape([rank2, rank2, dim2])
        var4 = np.matmul(var1, ten2mat(sparse_tensor, 1).T)
        for j in range(dim2):
            var_Lambda = var3[ :, :, j] + hyper_lambda[2] * np.eye(rank2)
            inv_var_Lambda = np.linalg.inv((var_Lambda + var_Lambda.T)/2)
            V[j, :] = np.matmul(inv_var_Lambda, var4[:, j] + hyper_lambda[2])

        G3 = ten2mat(G, 2)
        var1 = np.matmul(G3, np.kron(V, U).T)
        var2 = kr_prod(var1, var1)
        var3 = np.matmul(var2, ten2mat(binary_tensor, 2).T).reshape([rank3, rank3, dim3])
        var4 = np.matmul(var1, ten2mat(sparse_tensor, 2).T)
        for t in range(dim3):
            var_Lambda = var3[ :, :, t] + hyper_lambda[3] * np.eye(rank3)
            inv_var_Lambda = np.linalg.inv((var_Lambda + var_Lambda.T)/2)
            X[t, :] = np.matmul(inv_var_Lambda, var4[:, t] + hyper_lambda[3])

        tensor_hat = tucker_combine(G, U, V, X)
        mape = np.sum(np.abs(sparse_tensor[pos] - tensor_hat[pos])/sparse_tensor[pos])/sparse_tensor[pos].shape[0]
        rmse = np.sqrt(np.sum((sparse_tensor[pos] - tensor_hat[pos]) ** 2)/sparse_tensor[pos].shape[0])
        
        if (iters + 1) % 5 == 0:
            print('Iter: {}'.format(iters + 1))
            print('Training MAPE: {:.6}'.format(mape))
            print('Training RMSE: {:.6}'.format(rmse))
            print()
    
    return tensor_hat, G, U, V, X

## 1.4 Application: Spatiotemporal Data Imputation

### Tucker Factorization on the Guangzhou Dataset

In [12]:
import scipy.io

tensor = scipy.io.loadmat('datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [None]:
import time
start = time.time()

rank1 = 6
rank2 = 6
rank3 = 6
rank = np.array([rank1, rank2, rank3])
maxiter = 20
tensor_hat, G, U, V, X = Tucker_ALS(sparse_tensor, rank, maxiter)
pos = np.where((dense_tensor != 0) & (sparse_tensor == 0))
final_mape = np.sum(np.abs(dense_tensor[pos] - tensor_hat[pos])/dense_tensor[pos])/dense_tensor[pos].shape[0]
final_rmse = np.sqrt(np.sum((dense_tensor[pos] - tensor_hat[pos]) ** 2)/dense_tensor[pos].shape[0])
print('Final Imputation MAPE: {:.6}'.format(final_mape))
print('Final Imputation RMSE: {:.6}'.format(final_rmse))
print()

end = time.time()
print('Running time: %d seconds.'%(end - start))

Iter: 5
Training MAPE: 0.107667
Training RMSE: 4.44874

Iter: 10
Training MAPE: 0.107464
Training RMSE: 4.43889

Iter: 15
Training MAPE: 0.107405
Training RMSE: 4.4368



**Table 1**: Summarized results of Tucker factorization on missing data imputation (Guanghzou dataset).

|  scenario |    `rank`| `maxiter`|       mape |      rmse | running time |
|:----------|---------:|---------:|-----------:|----------:|-------------:|
|**40%, NM**|   (2,2,2)|       20 |     0.1319 |    5.2762 |     393 sec. |
|**40%, NM**|   (3,3,3)|       20 |     0.1210 |    4.8816 |     460 sec. |
|**40%, NM**|   (4,4,4)|       20 |     0.1129 |    4.6304 |     609 sec. |
|**40%, NM**|   (5,5,5)|       20 |     0.1095 |    4.5260 |    1041 sec. |
|**40%, NM**|   (6,6,6)|       20 |     0.1070 |    4.4568 |    1944 sec. |
|**40%, NM**|   (7,7,7)|       20 |     0.1046 |    4.3955 |     sec. |


### Tucker Factorization with Regularization on the Guangzhou Dataset


In [13]:
import scipy.io

tensor = scipy.io.loadmat('datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [14]:
import time
start = time.time()

rank1 = 2
rank2 = 2
rank3 = 2
rank = np.array([rank1, rank2, rank3])
lambda_G = 1e+2
lambda_U = 2e+1
lambda_V = 1e-0
lambda_X = 1e-0
hyper_lambda = np.array([lambda_G, lambda_U, lambda_V, lambda_X])
maxiter = 20
tensor_hat, G, U, V, X = Tucker_ALS_L2(sparse_tensor, rank, hyper_lambda, maxiter)
pos = np.where((dense_tensor != 0) & (sparse_tensor == 0))
final_mape = np.sum(np.abs(dense_tensor[pos] - tensor_hat[pos])/dense_tensor[pos])/dense_tensor[pos].shape[0]
final_rmse = np.sqrt(np.sum((dense_tensor[pos] - tensor_hat[pos]) ** 2)/dense_tensor[pos].shape[0])
print('Final Imputation MAPE: {:.6}'.format(final_mape))
print('Final Imputation RMSE: {:.6}'.format(final_rmse))
print()

end = time.time()
print('Running time: %d seconds.'%(end - start))


Iter: 5
Training MAPE: 0.136286
Training RMSE: 5.45497

Iter: 10
Training MAPE: 0.134713
Training RMSE: 5.3794

Iter: 15
Training MAPE: 0.134585
Training RMSE: 5.37518

Iter: 20
Training MAPE: 0.134518
Training RMSE: 5.37364

Final Imputation MAPE: 0.131941
Final Imputation RMSE: 5.27799

Running time: 429 seconds.


### CP Factorization on the Guangzhou Dataset

In [15]:
import scipy.io

tensor = scipy.io.loadmat('datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

In [16]:
import time
start = time.time()

rank = 10
maxiter = 100
tensor_hat, U, V, X = CP_ALS(sparse_tensor, rank, maxiter)
pos = np.where((dense_tensor != 0) & (sparse_tensor == 0))
final_mape = np.sum(np.abs(dense_tensor[pos] - tensor_hat[pos])/dense_tensor[pos])/dense_tensor[pos].shape[0]
final_rmse = np.sqrt(np.sum((dense_tensor[pos] - tensor_hat[pos]) ** 2)/dense_tensor[pos].shape[0])
print('Final Imputation MAPE: {:.6}'.format(final_mape))
print('Final Imputation RMSE: {:.6}'.format(final_rmse))
print()

end = time.time()
print('Running time: %d seconds.'%(end - start))

Iter: 20
Training MAPE: 0.104495
Training RMSE: 4.34865

Iter: 40
Training MAPE: 0.103633
Training RMSE: 4.32009

Iter: 60
Training MAPE: 0.103208
Training RMSE: 4.30102

Iter: 80
Training MAPE: 0.102848
Training RMSE: 4.28226

Iter: 100
Training MAPE: 0.102578
Training RMSE: 4.27393

Final Imputation MAPE: 0.10446
Final Imputation RMSE: 4.37996

Running time: 19 seconds.


**Table 2**: Summarized results of CP factorization on missing data imputation (Guanghzou dataset).

|  scenario |    `rank`| `maxiter`|       mape |      rmse | running time |
|:----------|---------:|---------:|-----------:|----------:|-------------:|
|**40%, NM**|        2 |       20 |     0.1323 |    5.2920 |      14 sec. |
|**40%, NM**|        3 |      100 |     0.1226 |    4.9319 |      13 sec. |
|**40%, NM**|        4 |      100 |     0.1154 |    4.7171 |      14 sec. |
|**40%, NM**|        5 |      100 |     0.1120 |    4.6026 |      15 sec. |
|**40%, NM**|        6 |      100 |     0.1094 |    4.5436 |      15 sec. |
|**40%, NM**|        7 |      100 |     0.1076 |    4.4770 |      15 sec. |
|**40%, NM**|       10 |      100 |     0.1045 |    4.3800 |      19 sec. |
|**40%, NM**|       15 |      100 |     0.1011 |    4.2905 |      21 sec. |
