## Low-Tubal-Rank Smoothing Tensor Completion Imputer (LSTC-Tubal)

This notebook shows how to implement a LSTC-Tubal imputer on some real-world large-scale data sets. To overcome the problem of missing values within multivariate time series data, this method takes into account both low-rank structure and time series regression. Meanwhile, to make the model scalable, we also integrate linear transform into the LATC model. For an in-depth discussion of LATC-Tubal-imputer, please see [1].

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Xinyu Chen, Yixian Chen, Lijun Sun (2020). <b>Scalable low-rank tensor learning for spatiotemporal traffic data imputation</b>. arXiv: 2008.03194. <a href="https://arxiv.org/abs/2008.03194" title="PDF"><b>[PDF]</b></a> <a href="https://doi.org/10.5281/zenodo.3939792" title="data"><b>[data]</b></a> 
</font>
</div>


### Define LATC-imputer kernel

We start by introducing some necessary functions that relies on `Numpy`.

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>ten2mat</code>:</b> <font color="black">Unfold tensor as matrix by specifying mode.</font></li>
<li><b><code>mat2ten</code>:</b> <font color="black">Fold matrix as tensor by specifying dimension (i.e, tensor size) and mode.</font></li>
<li><b><code>svt</code>:</b> <font color="black">Implement the process of Singular Value Thresholding (SVT).</font></li>
</ul>
</div>

In [1]:
import numpy as np

def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')

def mat2ten(mat, tensor_size, mode):
    index = list()
    index.append(mode)
    for i in range(tensor_size.shape[0]):
        if i != mode:
            index.append(i)
    return np.moveaxis(np.reshape(mat, tensor_size[index].tolist(), order = 'F'), 0, mode)

In [2]:
def unitary_transform(tensor, Phi):
    return np.einsum('kt, ijk -> ijt', Phi, tensor)

def inv_unitary_transform(tensor, Phi):
    return np.einsum('kt, ijt -> ijk', Phi, tensor)

In [3]:
def tsvt_unitary(tensor, Phi, tau):
    dim = tensor.shape
    X = np.zeros(dim)
    tensor = unitary_transform(tensor, Phi)
    for t in range(dim[2]):
        u, s, v = np.linalg.svd(tensor[:, :, t], full_matrices = False)
        r = len(np.where(s > tau)[0])
        if r >= 1:
            s = s[: r]
            s[: r] = s[: r] - tau
            X[:, :, t] = u[:, : r] @ np.diag(s) @ v[: r, :]
    return inv_unitary_transform(X, Phi)

from scipy.fftpack import dctn, idctn

def tsvt_dct(tensor, tau):
    dim = tensor.shape
    X = np.zeros(dim)
    tensor = dctn(tensor, axes = (2,), norm = 'ortho')
    for t in range(dim[2]):
        u, s, v = np.linalg.svd(tensor[:, :, t], full_matrices = False)
        r = len(np.where(s > tau)[0])
        if r >= 1:
            s = s[: r]
            s[: r] = s[: r] - tau
            X[:, :, t] = u[:, : r] @ np.diag(s) @ v[: r, :]
    return idctn(X, axes = (2,), norm = 'ortho')

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>compute_mape</code>:</b> <font color="black">Compute the value of Mean Absolute Percentage Error (MAPE).</font></li>
<li><b><code>compute_rmse</code>:</b> <font color="black">Compute the value of Root Mean Square Error (RMSE).</font></li>
</ul>
</div>

> Note that $$\mathrm{MAPE}=\frac{1}{n} \sum_{i=1}^{n} \frac{\left|y_{i}-\hat{y}_{i}\right|}{y_{i}} \times 100, \quad\mathrm{RMSE}=\sqrt{\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i}\right)^{2}},$$ where $n$ is the total number of estimated values, and $y_i$ and $\hat{y}_i$ are the actual value and its estimation, respectively.

In [4]:
def compute_mape(var, var_hat):
    return np.sum(np.abs(var - var_hat) / var) / var.shape[0]

def compute_rmse(var, var_hat):
    return  np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])

The main idea behind LATC-imputer is to approximate partially observed data with both low-rank structure and time series dynamics. The following `imputer` kernel includes some necessary inputs:

<div class="alert alert-block alert-warning">
<ul>
<li><b><code>dense_tensor</code>:</b> <font color="black">This is an input which has the ground truth for validation. If this input is not available, you could use <code>dense_tensor = sparse_tensor.copy()</code> instead.</font></li>
<li><b><code>sparse_tensor</code>:</b> <font color="black">This is a partially observed tensor which has many missing entries.</font></li>
<li><b><code>time_lags</code>:</b> <font color="black">Time lags, e.g., <code>time_lags = np.array([1, 2, 3])</code>. </font></li>
<li><b><code>alpha</code>:</b> <font color="black">Weights for tensors' nuclear norm, e.g., <code>alpha = np.ones(3) / 3</code>. </font></li>
<li><b><code>rho</code>:</b> <font color="black">Learning rate for ADMM, e.g., <code>rho = 0.0005</code>. </font></li>
<li><b><code>lambda0</code>:</b> <font color="black">Weight for time series regressor, e.g., <code>lambda0 = 5 * rho</code>. If <code>lambda0 = 0</code>, then this imputer is actually a standard low-rank tensor completion (i.e., High-accuracy Low-Rank Tensor Completion, or HaLRTC).</font></li>
<li><b><code>epsilon</code>:</b> <font color="black">Stop criteria, e.g., <code>epsilon = 0.001</code>. </font></li>
<li><b><code>maxiter</code>:</b> <font color="black">Maximum iteration to stop algorithm, e.g., <code>maxiter = 50</code>. </font></li>
</ul>
</div>


In [5]:
def imputer(dense_tensor, sparse_tensor, rho0, lambda0, epsilon, maxiter, transform = "unitary"):
    """Low-Tubal-Rank Smoothing Tensor Completion, LSTC-Tubal-imputer."""
    
    dim = np.array(sparse_tensor.shape)
    dt = int(np.prod(dim) / dim[0])
    sparse_mat = ten2mat(sparse_tensor, 0)
    pos_missing = np.where(sparse_mat == 0)
    pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    var = dense_tensor[pos_test]
    
    T = np.zeros(dim)                         # \boldsymbol{\mathcal{T}}
    Z_tensor = sparse_tensor.copy()           # \boldsymbol{\mathcal{Z}}
    Z_tensor[Z_tensor == 0] = np.mean(sparse_tensor[sparse_tensor != 0])
    Z = sparse_mat.copy()                     # \boldsymbol{Z}
    it = 0
    last_mat = sparse_mat.copy()
    snorm = np.linalg.norm(sparse_mat, 'fro')
    del dense_tensor, sparse_tensor, sparse_mat
    rho = rho0
    Phis = []
    if transform == "unitary":
        temp1 = ten2mat(Z_tensor, 2)
        _, Phi = np.linalg.eig(temp1 @ temp1.T)
        Phis.append(Phi)
        del temp1
    if lambda0 > 0:
        from scipy import sparse
        from scipy.sparse.linalg import inv as inv
        from scipy.sparse.linalg import spsolve as spsolve
        Psi1 = sparse.coo_matrix((np.ones(dt - 1), (np.arange(0, dt - 1), np.arange(0, dt - 1))), 
                                 shape = (dt - 1, dt)).tocsr()
        Psi2 = sparse.coo_matrix((np.ones(dt - 1), (np.arange(0, dt - 1), np.arange(0, dt - 1) + 1)), 
                                 shape = (dt - 1, dt)).tocsr()
        temp0 = Psi2 - Psi1
        Imat = sparse.coo_matrix((np.ones(dt), (np.arange(0, dt), np.arange(0, dt))), shape = (dt, dt)).tocsr()
        const = lambda0 * temp0.T @ temp0
        del Psi1, Psi2, temp0
    while True:
        rho = min(rho * 1.05, 1e5)
        if transform == "unitary":
            X = tsvt_unitary(Z_tensor - T / rho, Phi, 1 / rho)
        elif transform == "dct":
            X = tsvt_dct(Z_tensor - T / rho, 1 / rho)
        mat_hat = ten2mat(X, 0)
        temp = ten2mat(X + T / rho, 0)
        if lambda0 > 0:
            Z[pos_missing] = (spsolve(const / rho + Imat, temp.T).T)[pos_missing]
        elif lambda0 == 0:
            Z[pos_missing] = temp[pos_missing]
        Z_tensor = mat2ten(Z, dim, 0)
        T = T + rho * (X - Z_tensor)
        tol = np.linalg.norm((mat_hat - last_mat), 'fro') / snorm
        last_mat = mat_hat.copy()
        it += 1
        if it % 10 == 0:
            if transform == "unitary":
                temp1 = ten2mat(Z_tensor - T / rho, 2)
                _, Phi = np.linalg.eig(temp1 @ temp1.T)
                Phis.append(Phi)
                del temp1
        if it % 5 == 0:
            print('Iter: {}'.format(it))
            print('Tolerance: {:.6}'.format(tol))
            print('MAPE: {:.6}'.format(compute_mape(var, X[pos_test])))
            print('RMSE: {:.6}'.format(compute_rmse(var, X[pos_test])))
            print()
        if (tol < epsilon) or (it > maxiter):
            break

    print('Total iteration: {}'.format(it))
    print('Tolerance: {:.6}'.format(tol))
    print('Imputation MAPE: {:.6}'.format(compute_mape(var, X[pos_test])))
    print('Imputation RMSE: {:.6}'.format(compute_rmse(var, X[pos_test])))
    print()
    
    return X, Phis

If you want to set parameters reasonably, please use this cross validation on your data set.

### Guangzhou-2M

We generate **random missing (RM)** values on Guangzhou traffic speed data set.

In [6]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)

In [7]:
import time
rho = 2e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.5]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.5
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0411897
MAPE: 0.076646
RMSE: 3.35077

Iter: 10
Tolerance: 0.0169797
MAPE: 0.0717252
RMSE: 3.08456

Iter: 15
Tolerance: 0.0120458
MAPE: 0.0763688
RMSE: 3.23954

Iter: 20
Tolerance: 0.00474478
MAPE: 0.0748609
RMSE: 3.17313

Iter: 25
Tolerance: 0.00907038
MAPE: 0.0777603
RMSE: 3.29503

Iter: 30
Tolerance: 0.00276514
MAPE: 0.0750324
RMSE: 3.1841

Iter: 35
Tolerance: 0.00680763
MAPE: 0.0773624
RMSE: 3.2729

Iter: 40
Tolerance: 0.00186234
MAPE: 0.07506
RMSE: 3.18081

Iter: 45
Tolerance: 0.00481429
MAPE: 0.0761962
RMSE: 3.22791

Iter: 50
Tolerance: 0.00122326
MAPE: 0.0750875
RMSE: 3.18397

Iter: 55
Tolerance: 0.00340529
MAPE: 0.0748327
RMSE: 3.1692

Iter: 60
Tolerance: 0.000826861
MAPE: 0.0749525
RMSE: 3.17508

Total iteration: 60
Tolerance: 0.000826861
Imputation MAPE: 0.0749525
Imputation RMSE: 3.17508

Running time: 0.61 minutes

- coefficient = 0.5
- lambda = 0.001

Iter: 5
Tolerance: 0.0347718
MAPE: 0.080869
RM

In [8]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Random Missing (RM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim2, dim3) + 0.5 - missing_rate)

In [9]:
import time
rho = 2e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.5]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.5
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0578648
MAPE: 0.121979
RMSE: 5.21623

Iter: 10
Tolerance: 0.037995
MAPE: 0.0926346
RMSE: 3.91614

Iter: 15
Tolerance: 0.0262457
MAPE: 0.0964137
RMSE: 4.02559

Iter: 20
Tolerance: 0.00985876
MAPE: 0.0942203
RMSE: 3.92872

Iter: 25
Tolerance: 0.0166135
MAPE: 0.0965021
RMSE: 4.02553

Iter: 30
Tolerance: 0.00595028
MAPE: 0.0936823
RMSE: 3.9139

Iter: 35
Tolerance: 0.0125205
MAPE: 0.0942961
RMSE: 3.93024

Iter: 40
Tolerance: 0.00417277
MAPE: 0.0933013
RMSE: 3.89209

Iter: 45
Tolerance: 0.00954919
MAPE: 0.092108
RMSE: 3.84689

Iter: 50
Tolerance: 0.00292844
MAPE: 0.0924548
RMSE: 3.86182

Iter: 55
Tolerance: 0.00728532
MAPE: 0.0903992
RMSE: 3.78063

Iter: 60
Tolerance: 0.00224549
MAPE: 0.0913338
RMSE: 3.81651

Iter: 65
Tolerance: 0.00519976
MAPE: 0.0897565
RMSE: 3.75821

Iter: 70
Tolerance: 0.00180495
MAPE: 0.0905089
RMSE: 3.78598

Iter: 75
Tolerance: 0.00396312
MAPE: 0.0893262
RMSE: 3.74196

Iter: 80
Tolerance: 0.0

We generate **non-random missing (NM)** values on Guangzhou traffic speed data set.

In [10]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.3

## Non-random Missing (NM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]

In [11]:
import time
rho = 2e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.5]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.5
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0353347
MAPE: 0.106494
RMSE: 4.40503

Iter: 10
Tolerance: 0.0153025
MAPE: 0.111326
RMSE: 4.52854

Iter: 15
Tolerance: 0.0115617
MAPE: 0.11055
RMSE: 4.53114

Iter: 20
Tolerance: 0.00490041
MAPE: 0.109476
RMSE: 4.48613

Iter: 25
Tolerance: 0.00822304
MAPE: 0.110389
RMSE: 4.51808

Iter: 30
Tolerance: 0.00270463
MAPE: 0.108865
RMSE: 4.45921

Iter: 35
Tolerance: 0.00625655
MAPE: 0.109751
RMSE: 4.48702

Iter: 40
Tolerance: 0.00176821
MAPE: 0.108556
RMSE: 4.44463

Iter: 45
Tolerance: 0.00456562
MAPE: 0.108785
RMSE: 4.45094

Iter: 50
Tolerance: 0.00116295
MAPE: 0.108232
RMSE: 4.4342

Iter: 55
Tolerance: 0.00321602
MAPE: 0.107633
RMSE: 4.4072

Iter: 60
Tolerance: 0.000804156
MAPE: 0.107807
RMSE: 4.41544

Total iteration: 60
Tolerance: 0.000804156
Imputation MAPE: 0.107807
Imputation RMSE: 4.41544

Running time: 0.46 minutes

- coefficient = 0.5
- lambda = 0.001

Iter: 5
Tolerance: 0.0350269
MAPE: 0.106463
RMSE: 4.4010

In [12]:
import numpy as np
import scipy.io
np.random.seed(1000)

missing_rate = 0.7

## Non-random Missing (NM)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor'].transpose(0, 2, 1)
dim1, dim2, dim3 = dense_tensor.shape
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, dim3) + 0.5 - missing_rate)[:, None, :]

In [13]:
import time
rho = 2e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.5]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.5
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0551982
MAPE: 0.137431
RMSE: 5.4028

Iter: 10
Tolerance: 0.023387
MAPE: 0.127802
RMSE: 5.14998

Iter: 15
Tolerance: 0.0188327
MAPE: 0.119332
RMSE: 4.839

Iter: 20
Tolerance: 0.00850892
MAPE: 0.117434
RMSE: 4.76802

Iter: 25
Tolerance: 0.014026
MAPE: 0.118345
RMSE: 4.80256

Iter: 30
Tolerance: 0.00507432
MAPE: 0.116579
RMSE: 4.73893

Iter: 35
Tolerance: 0.0107842
MAPE: 0.116356
RMSE: 4.72646

Iter: 40
Tolerance: 0.00356765
MAPE: 0.115812
RMSE: 4.70846

Iter: 45
Tolerance: 0.00882121
MAPE: 0.115194
RMSE: 4.67771

Iter: 50
Tolerance: 0.00277168
MAPE: 0.115383
RMSE: 4.68753

Iter: 55
Tolerance: 0.00717984
MAPE: 0.113613
RMSE: 4.61833

Iter: 60
Tolerance: 0.00225445
MAPE: 0.114233
RMSE: 4.64457

Iter: 65
Tolerance: 0.0055269
MAPE: 0.112791
RMSE: 4.59115

Iter: 70
Tolerance: 0.00192375
MAPE: 0.113428
RMSE: 4.61558

Iter: 75
Tolerance: 0.00411486
MAPE: 0.112439
RMSE: 4.57886

Iter: 80
Tolerance: 0.00161414
MAPE: 0.1

In [14]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.3

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
dim1, dim2 = dense_mat.shape
del binary_mat

## Random missing (RM)
sparse_mat = dense_mat * np.round(np.random.rand(dim1, dim2) + 0.5 - missing_rate)
dense_tensor = dense_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
sparse_tensor = sparse_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
del dense_mat, sparse_mat

In [15]:
import time
rho = 1e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0374107
MAPE: 0.0897797
RMSE: 2.23049

Iter: 10
Tolerance: 0.00298839
MAPE: 0.0916584
RMSE: 2.20764

Iter: 15
Tolerance: 0.000907228
MAPE: 0.0914233
RMSE: 2.20164

Total iteration: 15
Tolerance: 0.000907228
Imputation MAPE: 0.0914233
Imputation RMSE: 2.20164

Running time: 1.18 minutes

- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0373575
MAPE: 0.089788
RMSE: 2.23064

Iter: 10
Tolerance: 0.00298253
MAPE: 0.0916681
RMSE: 2.20787

Iter: 15
Tolerance: 0.000906567
MAPE: 0.0914308
RMSE: 2.20181

Total iteration: 15
Tolerance: 0.000906567
Imputation MAPE: 0.0914308
Imputation RMSE: 2.20181

Running time: 1.45 minutes

Test LSTC-DCT model:
- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0368929
MAPE: 0.0920211
RMSE: 2.28519

Iter: 10
Tolerance: 0.00282188
MAPE: 0.0941668
RMSE: 2.26919

Total iteration: 13
Tolerance: 0.000926284
Imputation MAPE: 0.0940803
Imputation RMSE: 2.26747

Running

In [16]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.7

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
dim1, dim2 = dense_mat.shape
del binary_mat

## Random missing (RM)
sparse_mat = dense_mat * np.round(np.random.rand(dim1, dim2) + 0.5 - missing_rate)
dense_tensor = dense_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
sparse_tensor = sparse_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
del dense_mat, sparse_mat

In [17]:
import time
rho = 1e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.08396
MAPE: 0.105052
RMSE: 2.6108

Iter: 10
Tolerance: 0.0246202
MAPE: 0.0969436
RMSE: 2.37307

Iter: 15
Tolerance: 0.00455316
MAPE: 0.0979582
RMSE: 2.34971

Iter: 20
Tolerance: 0.00101655
MAPE: 0.0976523
RMSE: 2.34589

Iter: 25
Tolerance: 0.000857907
MAPE: 0.0973259
RMSE: 2.33951

Total iteration: 25
Tolerance: 0.000857907
Imputation MAPE: 0.0973259
Imputation RMSE: 2.33951

Running time: 2.05 minutes

- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0838812
MAPE: 0.105039
RMSE: 2.60997

Iter: 10
Tolerance: 0.0245838
MAPE: 0.0969553
RMSE: 2.37343

Iter: 15
Tolerance: 0.00454852
MAPE: 0.0979694
RMSE: 2.34997

Iter: 20
Tolerance: 0.00101591
MAPE: 0.0976636
RMSE: 2.34614

Iter: 25
Tolerance: 0.000857839
MAPE: 0.0973338
RMSE: 2.33968

Total iteration: 25
Tolerance: 0.000857839
Imputation MAPE: 0.0973338
Imputation RMSE: 2.33968

Running time: 2.51 minutes

Test LSTC-DCT model:
- coefficient = 0.001
-

In [18]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.3

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
dim1, dim2 = dense_mat.shape
del binary_mat

## Non-random missing (NM)
dense_tensor = dense_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, 30) + 0.5 - missing_rate)[:, None, :]
del dense_mat

In [19]:
import time
rho = 1e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0403371
MAPE: 0.151649
RMSE: 3.86826

Iter: 10
Tolerance: 0.00686086
MAPE: 0.102021
RMSE: 2.43877

Iter: 15
Tolerance: 0.0011371
MAPE: 0.0987449
RMSE: 2.36934

Total iteration: 16
Tolerance: 0.000731852
Imputation MAPE: 0.0986828
Imputation RMSE: 2.36868

Running time: 1.22 minutes

- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0403245
MAPE: 0.151632
RMSE: 3.86765

Iter: 10
Tolerance: 0.00685606
MAPE: 0.102023
RMSE: 2.43877

Iter: 15
Tolerance: 0.00113632
MAPE: 0.0987487
RMSE: 2.3694

Total iteration: 16
Tolerance: 0.000731276
Imputation MAPE: 0.0986866
Imputation RMSE: 2.36874

Running time: 1.51 minutes

Test LSTC-DCT model:
- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0401213
MAPE: 0.15845
RMSE: 3.99805

Iter: 10
Tolerance: 0.00699059
MAPE: 0.108031
RMSE: 2.58125

Iter: 15
Tolerance: 0.000956217
MAPE: 0.104951
RMSE: 2.52133

Total iteration: 15
Tolerance: 0.000956217
Imputati

In [20]:
import numpy as np
np.random.seed(1000)

missing_rate = 0.7

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
dim1, dim2 = dense_mat.shape
del binary_mat

## Non-random missing (NM)
dense_tensor = dense_mat.reshape([dim1, 30, 24]).transpose(0, 2, 1)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim1, 30) + 0.5 - missing_rate)[:, None, :]
del dense_mat

In [21]:
import time
rho = 1e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.05952
MAPE: 0.257325
RMSE: 6.95042

Iter: 10
Tolerance: 0.0245373
MAPE: 0.190124
RMSE: 5.26781

Iter: 15
Tolerance: 0.016655
MAPE: 0.153902
RMSE: 4.17287

Iter: 20
Tolerance: 0.01111
MAPE: 0.134258
RMSE: 3.50915

Iter: 25
Tolerance: 0.00732728
MAPE: 0.123181
RMSE: 3.12002

Iter: 30
Tolerance: 0.0046157
MAPE: 0.117243
RMSE: 2.90907

Iter: 35
Tolerance: 0.00301777
MAPE: 0.113822
RMSE: 2.79089

Iter: 40
Tolerance: 0.00191368
MAPE: 0.111886
RMSE: 2.72527

Iter: 45
Tolerance: 0.00137568
MAPE: 0.11063
RMSE: 2.68459

Total iteration: 49
Tolerance: 0.000969445
Imputation MAPE: 0.109987
Imputation RMSE: 2.66388

Running time: 3.89 minutes

- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0595187
MAPE: 0.257317
RMSE: 6.94997

Iter: 10
Tolerance: 0.0245365
MAPE: 0.190111
RMSE: 5.26716

Iter: 15
Tolerance: 0.0166543
MAPE: 0.15389
RMSE: 4.17223

Iter: 20
Tolerance: 0.0111087
MAPE: 0.13425
RMSE: 3.50863

Iter: 

### California data - 4W

We generate **random missing (RM)** values on California traffic speed data set.

In [6]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.3

### Random missing (RM)
sparse_tensor = dense_tensor * np.round(random_tensor + 0.5 - missing_rate)
del data, random_tensor

In [7]:
import time
rho = 1e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0133711
MAPE: 0.0169521
RMSE: 1.62061

Iter: 10
Tolerance: 0.00420351
MAPE: 0.0169313
RMSE: 1.57827

Iter: 15
Tolerance: 0.00115539
MAPE: 0.0172504
RMSE: 1.61513

Total iteration: 16
Tolerance: 0.000611106
Imputation MAPE: 0.0171711
Imputation RMSE: 1.61091

Running time: 6.19 minutes

- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.013354
MAPE: 0.0169553
RMSE: 1.62087

Iter: 10
Tolerance: 0.00420101
MAPE: 0.0169354
RMSE: 1.57856

Iter: 15
Tolerance: 0.00115282
MAPE: 0.017255
RMSE: 1.61551

Total iteration: 16
Tolerance: 0.000609762
Imputation MAPE: 0.0171757
Imputation RMSE: 1.61128

Running time: 8.16 minutes

Test LSTC-DCT model:
- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0134723
MAPE: 0.0169014
RMSE: 1.61972

Iter: 10
Tolerance: 0.00426787
MAPE: 0.0169175
RMSE: 1.57444

Total iteration: 14
Tolerance: 0.000634277
Imputation MAPE: 0.017208
Imputation RMSE: 1.61022

Running tim

In [8]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 4 * 7)

missing_rate = 0.7

### Random missing (RM)
sparse_tensor = dense_tensor * np.round(random_tensor + 0.5 - missing_rate)
del data, random_tensor

In [9]:
import time
rho = 1e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0299499
MAPE: 0.0352405
RMSE: 2.90882

Iter: 10
Tolerance: 0.0119233
MAPE: 0.0233871
RMSE: 2.16897

Iter: 15
Tolerance: 0.00513148
MAPE: 0.0257791
RMSE: 2.31612

Iter: 20
Tolerance: 0.0012986
MAPE: 0.0248579
RMSE: 2.27907

Iter: 25
Tolerance: 0.00145629
MAPE: 0.0248136
RMSE: 2.27383

Total iteration: 27
Tolerance: 0.000872863
Imputation MAPE: 0.0246834
Imputation RMSE: 2.26713

Running time: 9.61 minutes

- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0299404
MAPE: 0.0352257
RMSE: 2.90844

Iter: 10
Tolerance: 0.0119198
MAPE: 0.0233867
RMSE: 2.16925

Iter: 15
Tolerance: 0.0051255
MAPE: 0.0257847
RMSE: 2.31648

Iter: 20
Tolerance: 0.00129524
MAPE: 0.0248629
RMSE: 2.27944

Iter: 25
Tolerance: 0.00145486
MAPE: 0.0248168
RMSE: 2.27409

Total iteration: 27
Tolerance: 0.000871685
Imputation MAPE: 0.0246862
Imputation RMSE: 2.26736

Running time: 12.83 minutes

Test LSTC-DCT model:
- coefficient = 0.001

We generate **non-random missing (NM)** values on Guangzhou traffic speed data set. Then, we conduct the imputation experiment.

In [10]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_matrix, binary_tensor

In [11]:
import time
rho = 1e-4
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0216712
MAPE: 0.0531881
RMSE: 4.50298

Iter: 10
Tolerance: 0.011685
MAPE: 0.0564941
RMSE: 4.5622

Iter: 15
Tolerance: 0.00746292
MAPE: 0.0575385
RMSE: 4.62012

Iter: 20
Tolerance: 0.00495434
MAPE: 0.0574953
RMSE: 4.61382

Iter: 25
Tolerance: 0.00459904
MAPE: 0.0566022
RMSE: 4.56514

Iter: 30
Tolerance: 0.0036427
MAPE: 0.0566408
RMSE: 4.56634

Iter: 35
Tolerance: 0.00371153
MAPE: 0.0565057
RMSE: 4.55013

Iter: 40
Tolerance: 0.00281312
MAPE: 0.0565188
RMSE: 4.55214

Iter: 45
Tolerance: 0.00252051
MAPE: 0.0559637
RMSE: 4.52152

Total iteration: 48
Tolerance: 0.000903572
Imputation MAPE: 0.0558843
Imputation RMSE: 4.51882

Running time: 16.34 minutes

- coefficient = 0.001
- lambda = 1.0000000000000001e-07

Iter: 5
Tolerance: 0.0216712
MAPE: 0.0531881
RMSE: 4.50298

Iter: 10
Tolerance: 0.011685
MAPE: 0.0564941
RMSE: 4.5622

Iter: 15
Tolerance: 0.00746284
MAPE: 0.0575385
RMSE: 4.62012

Iter: 20
Tolerance: 0.004954

In [12]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 4 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 4 * 7)

missing_rate = 0.7

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_matrix, binary_tensor

In [13]:
import time
rho = 1e-4
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0308453
MAPE: 0.0698433
RMSE: 5.43514

Iter: 10
Tolerance: 0.0178718
MAPE: 0.064626
RMSE: 5.12181

Iter: 15
Tolerance: 0.0107694
MAPE: 0.066522
RMSE: 5.10754

Iter: 20
Tolerance: 0.00696388
MAPE: 0.0665386
RMSE: 5.10687

Iter: 25
Tolerance: 0.00682244
MAPE: 0.0667357
RMSE: 5.11923

Iter: 30
Tolerance: 0.00468575
MAPE: 0.0667742
RMSE: 5.11726

Iter: 35
Tolerance: 0.00590226
MAPE: 0.0668982
RMSE: 5.12267

Iter: 40
Tolerance: 0.00366018
MAPE: 0.0668135
RMSE: 5.11507

Iter: 45
Tolerance: 0.00510035
MAPE: 0.0667153
RMSE: 5.10923

Iter: 50
Tolerance: 0.00258236
MAPE: 0.0665255
RMSE: 5.0977

Iter: 55
Tolerance: 0.00399879
MAPE: 0.0664064
RMSE: 5.09089

Iter: 60
Tolerance: 0.00122398
MAPE: 0.0661253
RMSE: 5.07663

Iter: 65
Tolerance: 0.00258044
MAPE: 0.0660908
RMSE: 5.07304

Total iteration: 69
Tolerance: 0.000752238
Imputation MAPE: 0.0659465
Imputation RMSE: 5.06558

Running time: 23.72 minutes

- coefficient = 0.0

### California data - 8W

We generate **random missing (RM)** values on California traffic speed data set.

In [14]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 8 * 7)

missing_rate = 0.3

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_tensor, binary_tensor

We use `imputer` to fill in the missing entries and measure performance metrics on the ground truth.

In [15]:
import time
rho = 1e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0135602
MAPE: 0.017267
RMSE: 1.64321

Iter: 10
Tolerance: 0.00420663
MAPE: 0.0172588
RMSE: 1.60299

Iter: 15
Tolerance: 0.00122171
MAPE: 0.0175667
RMSE: 1.64141

Total iteration: 16
Tolerance: 0.000646544
Imputation MAPE: 0.0174727
Imputation RMSE: 1.63657

Running time: 14.25 minutes

- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0135429
MAPE: 0.0172703
RMSE: 1.64348

Iter: 10
Tolerance: 0.00420397
MAPE: 0.0172629
RMSE: 1.6033

Iter: 15
Tolerance: 0.00121915
MAPE: 0.0175714
RMSE: 1.6418

Total iteration: 16
Tolerance: 0.000645254
Imputation MAPE: 0.0174773
Imputation RMSE: 1.63695

Running time: 18.65 minutes

Test LSTC-DCT model:
- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0136591
MAPE: 0.0171879
RMSE: 1.64184

Iter: 10
Tolerance: 0.00426697
MAPE: 0.0172291
RMSE: 1.59896

Total iteration: 14
Tolerance: 0.000583978
Imputation MAPE: 0.0175044
Imputation RMSE: 1.63639

Running t

In [16]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_tensor = np.random.rand(data.values.shape[0], 288, 8 * 7)

missing_rate = 0.7

### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_tensor, binary_tensor

In [17]:
import time
rho = 1e-3
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0303374
MAPE: 0.0363159
RMSE: 2.97343

Iter: 10
Tolerance: 0.011981
MAPE: 0.02394
RMSE: 2.21178

Iter: 15
Tolerance: 0.00523145
MAPE: 0.0263124
RMSE: 2.35885

Iter: 20
Tolerance: 0.00129191
MAPE: 0.0252845
RMSE: 2.31819

Iter: 25
Tolerance: 0.00187607
MAPE: 0.0254077
RMSE: 2.32118

Total iteration: 28
Tolerance: 0.000817618
Imputation MAPE: 0.0251339
Imputation RMSE: 2.30798

Running time: 25.20 minutes

- coefficient = 0.001
- lambda = 1e-06

Iter: 5
Tolerance: 0.0303274
MAPE: 0.0362982
RMSE: 2.97296

Iter: 10
Tolerance: 0.0119773
MAPE: 0.0239398
RMSE: 2.21208

Iter: 15
Tolerance: 0.00522538
MAPE: 0.026318
RMSE: 2.35922

Iter: 20
Tolerance: 0.00128866
MAPE: 0.0252894
RMSE: 2.31857

Iter: 25
Tolerance: 0.00187401
MAPE: 0.0254106
RMSE: 2.32143

Total iteration: 28
Tolerance: 0.00081627
Imputation MAPE: 0.0251364
Imputation RMSE: 2.30819

Running time: 31.90 minutes

Test LSTC-DCT model:
- coefficient = 0.001
-

We generate **non-random missing (NM)** values on California traffic speed data set. Then, we conduct the imputation experiment.

In [18]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 8 * 7)

missing_rate = 0.3

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_matrix, binary_tensor

In [19]:
import time
rho = 1e-4
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0220123
MAPE: 0.0507498
RMSE: 4.42466

Iter: 10
Tolerance: 0.0120932
MAPE: 0.0534321
RMSE: 4.45877

Iter: 15
Tolerance: 0.00737546
MAPE: 0.0540124
RMSE: 4.49339

Iter: 20
Tolerance: 0.00507274
MAPE: 0.0539177
RMSE: 4.4851

Iter: 25
Tolerance: 0.0046852
MAPE: 0.0535293
RMSE: 4.46633

Iter: 30
Tolerance: 0.00369473
MAPE: 0.0535336
RMSE: 4.4663

Iter: 35
Tolerance: 0.00388795
MAPE: 0.0534147
RMSE: 4.45185

Iter: 40
Tolerance: 0.0028647
MAPE: 0.0534115
RMSE: 4.45361

Iter: 45
Tolerance: 0.00280569
MAPE: 0.0529411
RMSE: 4.42568

Total iteration: 48
Tolerance: 0.000984302
Imputation MAPE: 0.0528287
Imputation RMSE: 4.42233

Running time: 39.11 minutes

- coefficient = 0.001
- lambda = 1.0000000000000001e-07

Iter: 5
Tolerance: 0.0220122
MAPE: 0.0507498
RMSE: 4.42466

Iter: 10
Tolerance: 0.0120932
MAPE: 0.0534321
RMSE: 4.45878

Iter: 15
Tolerance: 0.00737537
MAPE: 0.0540124
RMSE: 4.49339

Iter: 20
Tolerance: 0.00507

In [20]:
import numpy as np
import pandas as pd
np.random.seed(1000)

data = pd.read_csv('../datasets/California-data-set/pems-8w.csv', header = None)
dense_tensor = mat2ten(data.values, np.array([data.values.shape[0], 288, 8 * 7]), 0)
random_matrix = np.random.rand(data.values.shape[0], 8 * 7)

missing_rate = 0.7

### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[2]):
        binary_tensor[i1, :, i2] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)

del data, random_matrix, binary_tensor

In [21]:
import time
rho = 1e-4
epsilon = 1e-3
maxiter = 100

## Test LSTC-Tubal model
print('Test LSTC-Tubal model:')
for c in [0, 0.001]:
    start = time.time()
    lambda0 = c * rho
    print('- coefficient = {}'.format(c))
    print('- lambda = {}'.format(lambda0))
    print()
    tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter)
    end = time.time()
    print('Running time: %.2f minutes' % ((end - start) / 60.0))
    print()
    
## Test LSTC-DCT model
print('Test LSTC-DCT model:')
c = 0.001
start = time.time()
lambda0 = c * rho
print('- coefficient = {}'.format(c))
print('- lambda = {}'.format(lambda0))
print()
tensor_hat, Phis = imputer(dense_tensor, sparse_tensor, rho, lambda0, epsilon, maxiter, transform = "dct")
end = time.time()
print('Running time: %.2f minutes' % ((end - start) / 60.0))
print()

Test LSTC-Tubal model:
- coefficient = 0
- lambda = 0.0

Iter: 5
Tolerance: 0.0354722
MAPE: 0.0628432
RMSE: 5.14213

Iter: 10
Tolerance: 0.0201066
MAPE: 0.0589919
RMSE: 4.85231

Iter: 15
Tolerance: 0.0119143
MAPE: 0.0612398
RMSE: 4.84985

Iter: 20
Tolerance: 0.00726715
MAPE: 0.060695
RMSE: 4.83258

Iter: 25
Tolerance: 0.00748687
MAPE: 0.0606767
RMSE: 4.83341

Iter: 30
Tolerance: 0.00486401
MAPE: 0.0605809
RMSE: 4.82458

Iter: 35
Tolerance: 0.00629017
MAPE: 0.0608722
RMSE: 4.84532

Iter: 40
Tolerance: 0.00376783
MAPE: 0.0607261
RMSE: 4.83619

Iter: 45
Tolerance: 0.0057309
MAPE: 0.0608748
RMSE: 4.84268

Iter: 50
Tolerance: 0.00270467
MAPE: 0.0605864
RMSE: 4.82752

Iter: 55
Tolerance: 0.00462296
MAPE: 0.0605437
RMSE: 4.81632

Iter: 60
Tolerance: 0.0013696
MAPE: 0.0601193
RMSE: 4.79429

Iter: 65
Tolerance: 0.00320108
MAPE: 0.0600529
RMSE: 4.78395

Iter: 70
Tolerance: 0.00076217
MAPE: 0.0597234
RMSE: 4.76778

Total iteration: 70
Tolerance: 0.00076217
Imputation MAPE: 0.0597234
Imputation RM

### License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>