# Sparse Non-Negative PARAFAC

This notebook is based on [sparse_demo.ipynb](sparse_demo.ipynb#parafac). 

As before, we start with a random sparse tensor, constructed so that it has a tensor factorization of rank 5.

Because non-negative PARAFAC can take longer to converge than non-masked PARAFAC and also produce dense factors, we will use a smaller tensor than in the other notebook.

In [12]:
shape = (1000, 1001, 1002)
rank = 5

import sparse
starting_factors = [sparse.random((i, rank)) for i in shape]
starting_factors

[<COO: shape=(1000, 5), dtype=float64, nnz=50, fill_value=0.0>,
 <COO: shape=(1001, 5), dtype=float64, nnz=50, fill_value=0.0>,
 <COO: shape=(1002, 5), dtype=float64, nnz=50, fill_value=0.0>]

In [13]:
from tensorly.contrib.sparse.kruskal_tensor import kruskal_to_tensor
tensor = kruskal_to_tensor(starting_factors)
tensor

<COO: shape=(1000, 1001, 1002), dtype=float64, nnz=5194, fill_value=0.0>

In [14]:
tensor.nbytes / 1e9                # Actual memory usage in GB

0.000166208

In [15]:
import numpy as np
np.prod(tensor.shape) * 8 / 1e9    # Memory usage if array was dense, in GB

8.024016

Now we factor the tensor. Note that at this time, you have to use the `non_negative_parafac` function from the sparse backend when using a sparse mask to avoid memory blowups.

In [16]:
import time
%load_ext memory_profiler
from tensorly.contrib.sparse.decomposition import non_negative_parafac

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler


In [17]:
%%memit
start_time = time.time()
factors = non_negative_parafac(tensor, rank=rank, init='random', verbose=True)
end_time = time.time()
total_time = end_time - start_time
print('Took %d mins %d secs' % (divmod(total_time, 60)))

Starting iteration 0
Mode 0 of 3
 Rank 0 of 5
 Rank 1 of 5
 Rank 2 of 5
 Rank 3 of 5
 Rank 4 of 5
Mode 1 of 3
 Rank 0 of 5
 Rank 1 of 5
 Rank 2 of 5
 Rank 3 of 5
 Rank 4 of 5
Mode 2 of 3
 Rank 0 of 5
 Rank 1 of 5
 Rank 2 of 5
 Rank 3 of 5
 Rank 4 of 5
reconstruction error=0.9370602016434976
Starting iteration 1
Mode 0 of 3
 Rank 0 of 5
 Rank 1 of 5
 Rank 2 of 5
 Rank 3 of 5
 Rank 4 of 5
Mode 1 of 3
 Rank 0 of 5
 Rank 1 of 5
 Rank 2 of 5
 Rank 3 of 5
 Rank 4 of 5
Mode 2 of 3
 Rank 0 of 5
 Rank 1 of 5
 Rank 2 of 5
 Rank 3 of 5
 Rank 4 of 5
reconstruction error=0.802866953157342, variation=0.13419324848615555.
Starting iteration 2
Mode 0 of 3
 Rank 0 of 5
 Rank 1 of 5
 Rank 2 of 5
 Rank 3 of 5
 Rank 4 of 5
Mode 1 of 3
 Rank 0 of 5
 Rank 1 of 5
 Rank 2 of 5
 Rank 3 of 5
 Rank 4 of 5
Mode 2 of 3
 Rank 0 of 5
 Rank 1 of 5
 Rank 2 of 5
 Rank 3 of 5
 Rank 4 of 5
reconstruction error=0.6595522697003181, variation=0.14331468345702392.
Starting iteration 3
Mode 0 of 3
 Rank 0 of 5
 Rank 1 of 5
 R

Let's look at one of the values that was masked out.

In [20]:
tensor.coords.T[0]

array([  6, 154,  10])

In [19]:
orig_val = tensor[tuple(tensor.coords.T[0])]
orig_val

0.07592262228073125

See the [sparse_demo.ipynb](sparse_demo.ipynb) for how to calculate individual values from the factors. Note that we do not compare the entire tensor because it would be dense, and memory usage would be prohibitive.

In [21]:
computed_val = np.sum(np.prod(sparse.stack([factors[i][idx] for i, idx in enumerate(tuple(tensor.coords.T[0]))], 0), 0))
computed_val

0.07600645353166938

In [23]:
np.abs(orig_val - computed_val)

8.383125093812394e-05

In [24]:
[f.density for f in factors]

[1.0, 1.0, 1.0]