## UMPCA implementation w/ Tensorly
---
### The UMPCA Objective

A set of tensor objects $ \{X_{1},X_{2},\ldots,X_{M}\} $ is availalble for training, where each sample $ X_{i}\in\mathbb{R}^{I_{1}xI_{2}x\cdots xI_{N}},i=1,2,\ldots,M $. The UMPCA objective is to project each sample using **$ P $ EMPs** to a vector so that:

* The variance of the projected samples ($ S_{T_{p}}=\sum_{m=1}^{M}(y_{m_{p}}-\overline{y}_{p})^{2} $) is maximized in each of the $ P $ directions,
* the projection vectors are of unit length and
* the feature/coordinate vectors ($ g_{p}\in\mathbb{R}^{M},p=1,2,\ldots,P $) are all uncorellated. # Eq. 6.57 [1]

In short, for the p-th EMP (data is centered):
$$ u_{p}^{(1)},u_{p}^{(2)},\ldots,u_{p}^{(N)}=\arg\max_{u_{p}^{(1)},u_{p}^{(2)},\ldots,u_{p}^{(N)}}\sum_{m=1}^{M}y_{m_{p}}^{2} $$
subject to:
$$ u_{p}^{(n)^{T}}u_{p}^{(n)}=1,\frac{g_{p}^{T}g_{q}}{||g_{p}||||g_{q}||}=δ_{pq}=\begin{cases}
1 & p=q\\
0 & otherwise
\end{cases},\,p,q=1,\ldots,P$$


### Algorithm

**Input:** A set of tensor objects $ \{X_{1},X_{2},\ldots,X_{M}\} $, $ X_{i}\in\mathbb{R}^{I_{1}xI_{2}x\cdots xI_{N}},i=1,2,\ldots,M $

**Output:** The TVP $ u_{p}^{(n)},p=1,2,\ldots,P,n=1,2,\ldots,N $ that maximizes the captured variance while producing uncorellated features.

    1. Center the data
    2. Initialize Projection matrices (Uniform)
    3. UMPCA Loop
        for p=1:P
            for k=1:K
                for n=1:N
                    3a. Caclulate the n-mode partial projection
                    3b. Calculate the n-mode total scatter and features produced by the previous p-1 projections (Ypn)
                    3c .Set Up(n) to be the Pn eigenvectors coresponding to the largest Pn eigenvalues of Ypn @ total n-mode scatter
                    4c. Update coordinate vector (i.e. projection)

---
#### References
[1] Multilinear Subspace Learning: Dimensionality Reduction of Multidimensional Data, Haiping Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, Chapman & Hall/CRC Press Machine Learning and Pattern Recognition Series, Taylor and Francis, ISBN: 978-1-4398572-4-3, 2013.

[2] Haiping Lu, K.N. Plataniotis, and A.N. Venetsanopoulos, "Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning", IEEE Transactions on Neural Networks, Vol. 20, No. 11, Page: 1820-1836, Nov. 2009.

In [1]:
# Import necessary libraries

import tensorly as tl
from scipy.io import loadmat
import numpy as np


In [2]:
# Import dataset

FERETC70A15S8 = tl.tensor(loadmat('FERETC70A15S8_80x80.mat')['fea2D'],dtype=np.double)
print(FERETC70A15S8.shape)

(80, 80, 721)


In [52]:
class myUMPCA():
    '''
    An object-oriented implementation of UMPCA (Uncorrelated Multilinear Principle Component Analysis).
    
    Methods
    -------
    __init__(): Initialzies the myUMPCA object and handles error-checking on parameters the user has entered.
    power_method_(): Mainly for internal use. Applies power method on given matrix in order to estimate the 
    largest eigevalue and eigenvector (returns also # of iterations it took).
    fit(): Performs UMPCA with given arguments on samples.
    transform(): Returns the projected data.
    
    Attributes
    ----------
    self.maxK: No of iterations (user-defined)
    self.numP: The number of EMPs per TVP (user-defined)
    self.sample_modes: # of modes of given dataset
    self.sample_dims: # of modes of a sample
    self.samples_no: # of samples
    self.projection_matrices: 2D List (NxP) containing the projection vectors
    self.projected_data: Coordinate vector/Projected data, shape: (no_of_samples,P)
    '''
    
    def __init__(self,numP,maxK):
        '''
        numP: The number of EMPs per TVP.
        maxK: No of iterations
        '''

        # Error handling on input        
        
        try:
            
            if int(maxK) <= 0:
                
                raise Exception('Max iters (k) has to be a positive integer.')
                
            self.maxK = maxK
                
        except:

            raise Exception('Max iters (k) has to be a positive integer.')
        
        try:
            
            if int(numP) <= 0:
                
                raise Exception('numP has to be a positive integer.')
                
            self.numP = numP
                
        except:

            raise Exception('numP has to be a positive integer.')
        
    def power_method_(self,A):
        '''
        Returns the largest eigenavlue along with the respective eigenvector.
        '''
        x = np.zeros((A.shape[0],1))
        x[0] = 1

        l_new = 1
        l_old = 0

        maxItr = 300
        itr = 0
        # tol = np.nextafter(0, 1)
        tol = 2.2204e-16

        while ( abs(l_old-l_new) > tol  and itr < maxItr ):

            l_old  = l_new

            z = A @ x

            x = z / np.linalg.norm(z)

            l_new = np.transpose(x) @ A @ x

            itr = itr + 1

        return l_new, x, itr
        
    def fit(self,samples):
        '''
        Performs UMPCA with given arguments.
        samples: Input tensor of (N+1) dimensions (+1: Group input in one larger tensor)
        '''
        
        # Gather and store basic info on the train data
    
        self.sample_modes = len(samples.shape)
    
        self.sample_dims = len(samples.shape) - 1

        self.samples_no = samples.shape[-1]
        
        # Check that there can be numP uncorrelated features (Corollary 1 on [2])
        
        try:
            
            if int(self.numP) > min(min(samples.shape[:-1]),self.samples_no):
            
                raise Exception('P must be less or equal to min{min(In),M}')
                
        except:

            raise Exception('Problematic P at input.')
                    
        
        ########################################################
        # 1. Center the data
        ########################################################
        
        # Find mean tensor
        
        self.samples_mean = tl.mean(samples,axis=self.sample_dims)
        
        # Center the data

        for i in range(self.samples_no):

            samples[...,i] = np.subtract(samples[...,i],self.samples_mean)

        ########################################################
        # 2. Projection matrix initialization
        ########################################################
        
        # Create the list that will hold the projection vectors
        # Dimensions: N x p
        
        self.projection_matrices = []
        
        for n in range(self.sample_dims):
            
            self.projection_matrices.append([])
            
            for p in range(self.numP):
                
               self.projection_matrices[n].append(np.ones((samples.shape[n],1)) / np.linalg.norm(np.ones((samples.shape[n],1))))
                
        ########################################################
        # 3. UMPCA Loop
        ########################################################
                
        for p in range(self.numP): # step p: Caclulate the p-th EMP
                
            for k in range(self.maxK):
                
                for n in range(self.sample_dims):
                    
                    # Calculate the partial projection
                    
                    projection_modes = [*range(self.sample_dims)]
                    
                    projection_modes.remove(n) # Without using the current mode!
                        
                    projection_matrices2use = np.array(self.projection_matrices)[projection_modes,p]
                    
                    partial_projection = tl.tenalg.multi_mode_dot(samples,projection_matrices2use,modes=projection_modes,transpose=True)            
                    
                    # Remove dims with 1 component
                    
                    if n == 0: partial_projection = np.squeeze(partial_projection,1)
                    else: partial_projection = np.squeeze(partial_projection,0)
                                                
                    # Compute scatter matrix
            
                    scatter_tensor = np.zeros((samples.shape[n],samples.shape[n]))

                    for m in range(self.samples_no): # Eq. 6.65 [1]

                        Xm = partial_projection[...,m].reshape(partial_projection[...,m].shape[0],1) # Get m-th sample

                        scatter_tensor = scatter_tensor + Xm @ np.transpose(Xm)
                                          
                    # Update projection vectors
                    
                    if p > 0:
                        
                        # Eq 16. [2]
                        
                        Phi = np.transpose(Gps) @ np.transpose(partial_projection) @ partial_projection @ Gps 
                        
                        # Eq 15. [2]
                   
                        Psi_p = np.eye(samples.shape[n],samples.shape[n]) - partial_projection @ Gps @ np.linalg.inv(Phi) @ np.transpose(Gps) @ np.transpose(partial_projection)
                        
                        # Eq 14. [2]
                        
                        ST = Psi_p @ scatter_tensor
                        
                        largest_eigenvalue, respective_eigenvector, itr = self.power_method_(ST)
                        
                    else:
            
                        # This is the first EMP, there are no contraints for correlation yet
                        
                        largest_eigenvalue, respective_eigenvector, itr = self.power_method_(scatter_tensor)
                        
                    # In order to get consistent results, force the first component of
                    # each eigenvector to be positive

                    if respective_eigenvector[0] < 0.0:

                        respective_eigenvector = respective_eigenvector * (-1)

                    # Normalize and save eigenvector
            
                    self.projection_matrices[n][p] = np.array(respective_eigenvector / np.linalg.norm(respective_eigenvector),copy=True)
            
            # Update coordinate/feature vector at p
        
            projection_matrices2use = np.array(self.projection_matrices)[:,p]
            
            gp = tl.tenalg.multi_mode_dot(samples,projection_matrices2use,modes=[*range(self.sample_dims)],transpose=True)
            
            # Remove dims with 1 component
            
            gp = np.transpose(np.squeeze(gp,1))
            
            if p == 0:
                
                # Gps: Coordinate vectors/projected data, shape: (no_of_samples,P)
        
                Gps = gp
                
            else:

                Gps = np.append(Gps,gp,axis=1)
                
        # Store projected data to an attribute for easier access
            
        self.projected_data = Gps
        
    def transform(self):
        '''
        Returns projected data.
        '''

        return self.projected_data


---
## Examples

In [53]:
# Example 1

# Create a UMPCA object and fit the dataset

umpca = myUMPCA(numP=5,maxK=10)
umpca.fit(FERETC70A15S8)

# Print size of projected data

print(f'Initial dataset dimentsions: {FERETC70A15S8.shape}')
print(f'Projected dataset dimensions: {umpca.projected_data.shape}')

# Print last 5 samples

projection = umpca.transform()

print('\nLast 5 samples of dataset\n')

for i in range(5,0,-1):
    print(projection[(-1)*i])

Initial dataset dimentsions: (80, 80, 721)
Projected dataset dimensions: (721, 5)

Last 5 samples of dataset

[ 429.85902011  246.07388462  412.41512923  247.22049476 -300.41639218]
[ 747.83984431 -312.46753243 -105.17760367   42.55406415 -244.97867826]
[ 852.00691257 -467.09885636 -524.98453317  193.00136183 -400.0652324 ]
[ 172.9769887  -467.69845555 -197.95613391 -264.63551908  170.85526663]
[ 185.86222407 -578.19332358  -53.62526958   69.64092645 -306.2309778 ]
