# Introduction into HOTTBOX: core components
[Return to Table of Contents](./0_Table_of_contents.ipynb)

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
# Basic imports
import numpy as np

#
from hottbox import __version__ as hottbox_version
print("HOTTBOX version: {}".format(hottbox_version))

HOTTBOX version: 0.3.1


<img src="./imgs/different-tensors.png" alt="Drawing" style="width: 600px;"/>

A tensor is a multi-dimenaional array of data where each dimension is conventionally referred to as the **mode** and is associated with a particular characterisic/property of the data at hand. Tensor **order** is defined by the number of its modes which is equivivalent to the number of indices required to identify a particular entry of a multi-dimensional array.

Creation of a tensor starts with a formation of a multidimensional array of data. For ease of visualisation and compact notation, let's consider a third order tensor $\mathbf{\underline{X}} \in \mathbb{R}^{I \times J \times K}$. Thus, an element of such a tensor, $\mathbf{\underline{X}} \in \mathbb{R}^{I \times J \times K}$, can be written in a general form as:

$$ x_{ijk} = \mathbf{\underline{X}}[i, j, k]$$

**Note:** In order to be consistent with Python indexing, count of modes and elements within starts from zeros.

In [3]:
from hottbox.core import Tensor

# Create 3-d array of data
array_3d = np.arange(24).reshape((2, 3, 4))

# Create tensor
tensor = Tensor(array_3d)

# Result preview
print(tensor.data)
print()
print(tensor)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.


Now, you can use top level API for the conventional definitions of the tensor properties and operations (e.g. order, unfold, mode-n product etc).

In [4]:
print('This tensor is of order {}.'.format(tensor.order))
print('The sizes of its modes are {} respectively.'.format(tensor.shape))
print('It consists of {} elemetns.'.format(tensor.size))
print('Its Frobenious norm = {:.2f}'.format(tensor.frob_norm))

This tensor is of order 3.
The sizes of its modes are (2, 3, 4) respectively.
It consists of 24 elemetns.
Its Frobenious norm = 65.76


Our future plans include extension of the API with some additional functionality for ease of integration with other libraries for EDA (see [this repository](https://github.com/hottbox/hottbox-tutorials))

# Transformations and representations

N-dimensional arrays of data can be represented in various different forms. By applying numerical methods (algorithms for tensor decompositions) to the raw data we can obtain, for example, Kruskal or Tucker representation. At the same time, simple data rearrangement procedures (e.g. folding, unfolding) of the raw data also yields different representation.

<img src="./imgs/different-forms-of-data.png" alt="Drawing" style="width: 500px;"/>


## Unfolding and mode-n product

Conventionally, unfolding is considered to be a process of element mapping from a tensor to a matrix. In other words, it arranges the mode-$n$ fibers of a tensor to be the matrix columns. Thus, the mode-$n$ unfolding is denoted as:

$$\mathbf{\underline{A}} \xrightarrow{n} \mathbf{A}_{(n)}$$

Thus, this operations requires to specify a mode along which a tensor will be unfolded. For a third order tensor, a visual representation of such operation is as following

<img src="./imgs/unfolding.png" alt="Drawing" style="width: 600px;"/>

**Note:** unfolding a tensor, $\mathbf{\underline{X}} \in \mathbb{R}^{I \times I \times I}$, along different modes is not equivalent to permutation of data

$$\mathbf{X}_{(n)} \neq \mathbf{P}\mathbf{X}_{(m)}$$

In [5]:
from hottbox.core import Tensor

# Create 3-d array of data
array_3d = np.arange(24).reshape((2, 3, 4))

# Create tensor
tensor = Tensor(array_3d)

# Unfold tensor along first mode (mode-0)
tensor.unfold(mode=2)

# Result preview
print(tensor)
tensor.data

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (4, 6) and ['mode-2', 'mode-0_mode-1'] respectively.


array([[ 0,  4,  8, 12, 16, 20],
       [ 1,  5,  9, 13, 17, 21],
       [ 2,  6, 10, 14, 18, 22],
       [ 3,  7, 11, 15, 19, 23]])

<img src="./imgs/mode_n_product.png" alt="Drawing" style="width: 600px;"/>

The mode-$n$ product is a multiplication of a tensor  by a matrix along the $n^{th}$ mode of a tensor. This essentially means that each mode-$n$ fiber should be multiplied by this matrix. Mathematically, this can be expressed as:

$$\mathbf{\underline{X}} \times_n \mathbf{A} = \mathbf{\underline{Y}} \quad \Leftrightarrow  \quad \mathbf{Y}_{(n)} = \mathbf{A} \mathbf{X}_{(n)}  $$

This is equivalent to projection of a tensor unfolded along a certain mode on to the space spanned by a matrix.

In [6]:
from hottbox.core import Tensor

I, J, K = 2, 3, 4
mode_n_new_size = 5

# Create tensor
array_3d = np.arange(I * J * K).reshape(I, J ,K)
X = Tensor(array_3d)

for mode_n, mode_n_size in enumerate(X.shape):    
    # Create matrix
    A = np.ones((mode_n_new_size, mode_n_size))

    # Perform Mode-n product 
    Y = X.mode_n_product(A, mode=mode_n, inplace=False)

    # Preview of resulting tensor
    print("\tResult of Mode-{} product:\n{}\n".format(mode_n, Y))    


	Result of Mode-0 product:
This tensor is of order 3 and consists of 60 elements.
Sizes and names of its modes are (5, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	Result of Mode-1 product:
This tensor is of order 3 and consists of 40 elements.
Sizes and names of its modes are (2, 5, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	Result of Mode-2 product:
This tensor is of order 3 and consists of 30 elements.
Sizes and names of its modes are (2, 3, 5) and ['mode-0', 'mode-1', 'mode-2'] respectively.



### Properties of mode-n product

1. For distinct modes in a series of multiplications, the order of the multiplication is irrelevent: 

    $$\mathbf{\underline{X}} \times_n \mathbf{A} \times_m \mathbf{B} = \mathbf{\underline{X}} \times_m \mathbf{B} \times_n \mathbf{A} \quad (m \neq n)$$

1. However, this does not hold if the modes are the same :

    $$\mathbf{\underline{X}} \times_n \mathbf{A} \times_n \mathbf{B} = \mathbf{\underline{X}} \times_n (\mathbf{B}\mathbf{A})$$
    
1. Mode-n product of a tensor with the same matrix across different modes does not 

# Tensor decompositions and efficient representation of multidimensional data

<img src="./imgs/efficient_representations.png" alt="Drawing" style="width: 600px;"/>

There are three main forms for efficient representation of multi-dimensional data. Each of them can be obtained either from the original tensor by applying associated tensor decomposition algorithms or constructed from scratch. For all representations and decomposition algorithms, we aim to provide consistent API as much as possible.

Here we will cover only basics of [**HOTTBOX**](https://github.com/hottbox/hottbox) API, but for more information please visit [this page](https://github.com/hottbox/hottbox-tutorials).

### Decomposing original tensor

In [7]:
from hottbox.core import Tensor
from hottbox.algorithms.decomposition import CPD, HOSVD, TTSVD

# Define original tensor
I, J, K = 5, 6, 7
array_3d = np.random.randn(I, J, K)
tensor = Tensor(array_3d)

ranks = [
    (2,),       # rank of Kruskal representation
    (2, 3, 4),  # rank of Tucker representation
    (2, 3)      # rank of Tensor Train representation
]

# Initialise tensor decomposition algorithms
algorithms = [
    CPD(), 
    HOSVD(), 
    TTSVD()
]

# Compute different representations of the same tensor
tensor_representations = []
for i, alg in enumerate(algorithms):
    tensor_rep = alg.decompose(tensor, rank=ranks[i])
    tensor_representations.append(tensor_rep)

# Result preview
for representation in tensor_representations:    
    print("="*50)
    print(representation)
    print()

Kruskal representation of a tensor with rank=(2,).
Factor matrices represent properties: ['mode-0', 'mode-1', 'mode-2']
With corresponding latent components described by (5, 6, 7) features respectively.

Tucker representation of a tensor with multi-linear rank=(2, 3, 4).
Factor matrices represent properties: ['mode-0', 'mode-1', 'mode-2']
With corresponding latent components described by (5, 6, 7) features respectively.

Tensor train representation of a tensor with tt-rank=(2, 3).
Shape of this representation in the full format is (5, 6, 7).
Physical modes of its cores represent properties: ['mode-0', 'mode-1', 'mode-2']



### Construction from scratch

In [8]:
from hottbox.core import TensorCPD, TensorTKD, TensorTT

I, J, K = 5, 6, 7  # define shape of the tensor in full form

R = 2              # rank of Kruskal representation
Q, R, P = 2, 3, 4  # rank of Tucker representation
R_1, R_2 = 2, 3    # rank of Tensor Train representation

# Construct Kruskal representation
A = np.random.randn(I, R)
B = np.random.randn(J, R)
C = np.random.randn(K, R)
lambda_values = np.random.randn(R)
tensor_cpd = TensorCPD(fmat=[A, B, C], core_values=lambda_values)

# Construct Tucker representation
A = np.random.randn(I, Q)
B = np.random.randn(J, R)
C = np.random.randn(K, P)
core_tensor_values = np.random.randn(Q, R, P)
tensor_tkd = TensorTKD(fmat=[A, B, C], core_values=core_tensor_values)


# Construct Tensor Train represenation
core_1_values = np.random.randn(I, R_1)
core_2_values = np.random.randn(R_1, J, R_2)
core_3_values = np.random.randn(R_2, K)
tensor_tt = TensorTT(core_values=[core_1_values, core_2_values, core_3_values])


tensor_representations = {
    "Kruskal" : tensor_cpd,
    "Tucker" : tensor_tkd,
    "Tensor Train" : tensor_tt
}
for name, representation in tensor_representations.items():
    print("="*50)
    print("Reconstruction of {} representation:".format(name))
    print(representation.reconstruct())
    print()

Reconstruction of Kruskal representation:
This tensor is of order 3 and consists of 210 elements.
Sizes and names of its modes are (5, 6, 7) and ['mode-0', 'mode-1', 'mode-2'] respectively.

Reconstruction of Tucker representation:
This tensor is of order 3 and consists of 210 elements.
Sizes and names of its modes are (5, 6, 7) and ['mode-0', 'mode-1', 'mode-2'] respectively.

Reconstruction of Tensor Train representation:
This tensor is of order 3 and consists of 210 elements.
Sizes and names of its modes are (5, 6, 7) and ['mode-0', 'mode-1', 'mode-2'] respectively.

