<a href="https://colab.research.google.com/github/dnguyend/lagrange_rayleigh/blob/master/TwoLeftInverses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

$\newcommand{\bC}{\boldsymbol{C}}$
$\newcommand{\bF}{\boldsymbol{F}}$
$\newcommand{\bI}{\boldsymbol{I}}$
$\newcommand{\bJ}{\boldsymbol{J}}$
$\newcommand{\bJtw}{\boldsymbol{J}^{(2)}}$
$\newcommand{\bJthr}{\boldsymbol{J}^{(3)}}$
$\newcommand{\bx}{\boldsymbol{x}}$
$\newcommand{\bH}{\boldsymbol{H}}$
$\newcommand{\bL}{\boldsymbol{L}}$
$\newcommand{\bT}{\boldsymbol{T}}$
$\newcommand{\bLx}{\boldsymbol{L}_{\bx}}$
$\newcommand{\blbd}{\boldsymbol{\lambda}}$
$\newcommand{\cL}{\mathcal{L}}$
$\newcommand{\PibH}{\boldsymbol{\Pi}_{\bH}}$

# Two left inverses:

We want to demonstrate the fact that there are a lot of freedom to choose the Rayleigh quotient and the projection operator.
For an equation of form
$$\bL(\bx, \blbd)=\bF(\bx) - \bH(\bx)\blbd = 0$$
with a constraint $\bC(\bx) = 0$
We can pick a left inverse $\bH^-_1$ to compute $\blbd = \bH^-_1(\bx)\bF(\bx)$, and another left inverse $\bH^-_2$ in the Riemannian Newton equation:
$$\PibH = \bI - \bH\bH^-_2$$
$$\PibH \bLx(\bx,\blbd) = - \bL(\bx, \blbd)$$

**THE LEFT INVERSE RELATED TO LAMBDA CONTROLS THE RATE OF CONVERGENCE, THE LEFT INVERSE FOR THE PROJECTION DOES NOT CHANGE THE RATE OF CONVERGENCE**

We could see this from the fact that in the Schur form solution, only lambda appears.

In the following, we consider the case $\bC(\bx) = \bx^T\bx-1$, the unit sphere with the function given as the tensor $\bT(\bI,\bx,\cdots,\bx).$ and consider the eigentensor problem:
$$\bL(\bx, \blbd) \bT(\bI,\bx,\cdots,\bx) - \bx\blbd =0$$
Let $A, B$ be two non degenerate matrices, and $a, b$ be two nonnegative integers
For the Rayleight quotient we consider the left inverse:
$$\bH_1^-(\bx) = ((\bx^a)^T A\bx)^{-1}(\bx^a)^T A$$
So 
$$\blbd = \bH_1^-(\bx) \bT(\bI,\bx,\cdots,\bx)$$
For the projection $\PibH$ we consider
$$\bH_2^-(\bx) = ((\bx^b)^T B\bx)^{-1}(\bx^b)^T B$$
We consider the Riemannian Newton algorithm with these two left inverses:


In [2]:
!git clone https://github.com/dnguyend/lagrange_rayleigh

Cloning into 'lagrange_rayleigh'...
remote: Enumerating objects: 217, done.[K
remote: Counting objects: 100% (217/217), done.[K
remote: Compressing objects: 100% (152/152), done.[K
remote: Total 217 (delta 113), reused 139 (delta 60), pack-reused 0[K
Receiving objects: 100% (217/217), 14.30 MiB | 16.89 MiB/s, done.
Resolving deltas: 100% (113/113), done.


In [0]:
import numpy as np
from numpy.linalg import norm, solve
from numpy import eye
from scipy.linalg import null_space
from lagrange_rayleigh.core import utils
from lagrange_rayleigh.core.eigen_tensor_solver import symmetric_tv_mode_product
    

def ortho_sphere_power(
        T, max_itr, delta, x_init=None, a=None,
        b=None, AA=None, BB=None, by_line=False):
    """Tangent form rayleigh with two different left inverses
    for this first one, let a, b be  two odd integers
    then x^a.T x and x^b.T x are positive.
    left inverse for lamda is (xt^a x)^{-1}AAxt^a AA
    the left inverse for projection is (xt^b x)^{-1}BB xt^b BB
    """
    def pw(x, a):
        if a == 0:
            return np.ones_like(x)
        ret = x.copy()
        for i in range(a-1):
            ret *= x
        return ret
        
    # get tensor dimensionality and order
    n_vec = T.shape
    m = len(n_vec)
    n = T.shape[0]
    R = 1

    if a is None:
        a = 1
    if b is None:
        b = 1
    if BB is None:
        BB = eye(n)
    if AA is None:
        AA = eye(n)

    converge = False

    # if not given as input, randomly initialize
    if x_init is None:
        x_init = np.random.randn(n)
        x_init = x_init/norm(x_init)

    # init lambda_(k) and x_(k)
    x_k = x_init / np.linalg.norm(x_init)
    T_x_m_2 = symmetric_tv_mode_product(T, x_k, m-2)
    T_x_m_1 = T_x_m_2 @ x_k
    # x_t_A_x = x_k.T @ x_k
    lbd = (pw(x_k.T, a) @ AA @ T_x_m_1) / (pw(x_k.T, a) @ AA @ x_k)
    ctr = 0

    while (R > delta) and (ctr < max_itr):
        # compute T(I,I,x_k,...,x_k), T(I,x_k,...,x_k) and g(x_k)
        g = -lbd * x_k + T_x_m_1
        # compute Hessian H(x_k)
        H = (m-1)*T_x_m_2-lbd*eye(n)
        xB = pw(x_k, b).reshape(1, -1) @ BB
        PiB = np.eye(n) - x_k.reshape(-1, 1) @ xB.reshape(1, -1) /(xB @ x_k)
        U_x_k_b = null_space(xB)
        U_x_k = null_space(x_k.reshape(1, -1))
        H_p = U_x_k_b.T @ PiB @ H @ U_x_k
        # fix eigenvector
        y = U_x_k @ solve(H_p, -U_x_k_b.T @ PiB @ g)
        if by_line:
          print('y=%s' % str(y))
        x_k_n = (x_k + y)/(np.linalg.norm(x_k + y))

        #  update residual and lbd
        R = norm(x_k-x_k_n)
        x_k = x_k_n
        T_x_m_2 = symmetric_tv_mode_product(T, x_k, m-2)
        T_x_m_1 = T_x_m_2 @ x_k

        lbd = (pw(x_k.T, a) @ AA @ T_x_m_1) / (pw(x_k.T, a) @ AA @ x_k)
        # print('ctr=%d lbd=%f' % (ctr, lbd))
        ctr += 1
    x = x_k
    err = norm(symmetric_tv_mode_product(
        T, x, m-1) - lbd * x)

    if ctr < max_itr:
        converge = True

    return x, lbd, ctr, converge, err


We see the the separate choices of left inverse still give a fast convergence algorithm:

In [43]:
  
  n = 10
  m = 3
  tol = 1e-10
  max_itr = 200
  np.random.seed(0)
  n_test = 10
  for i in range(n_test):
      a = np.random.randint(0, 5)
      b = np.random.randint(0, 5)
      x_init = np.random.randn(n)
      x_init /= np.linalg.norm(x_init)
      BB = utils.gen_random_symmetric_pos(n)
      AA = utils.gen_random_symmetric_pos(n)

      T = utils.generate_symmetric_tensor(n, m)
      x, lbd, ctr, converge, err = ortho_sphere_power(
          T, max_itr, tol, x_init, a=a,
          b=b, AA=AA, BB=BB)
      print('x=%s, lbd=%f, ctr=%d, converge=%d, err=%f' % (
          str(x), lbd, ctr, converge, err))


x=[ 0.29856281  0.24713928  0.36620796 -0.48561051  0.10320006 -0.6138202
 -0.02626536 -0.13007225 -0.27050474  0.04061554], lbd=1.529224, ctr=11, converge=1, err=0.000000
x=[0.34087415 0.27923109 0.33199413 0.31457023 0.2997952  0.30388138
 0.33195865 0.31484761 0.31845993 0.3220201 ], lbd=16.386986, ctr=31, converge=1, err=0.000000
x=[-0.02977056 -0.06175282 -0.00241597  0.51796464  0.32407133 -0.34131946
 -0.14831985 -0.65451882  0.2301945  -0.04585558], lbd=1.010379, ctr=17, converge=1, err=0.000000
x=[-0.00608114  0.59366995 -0.39779547 -0.34257716 -0.23540459 -0.06312969
  0.26614991  0.33923489 -0.08522748  0.34545358], lbd=2.383691, ctr=27, converge=1, err=0.000000
x=[-0.29404104 -0.32688864 -0.31055265 -0.31938101 -0.30648427 -0.30770691
 -0.28010165 -0.32443913 -0.31621864 -0.36865762], lbd=-15.564369, ctr=23, converge=1, err=0.000000
x=[ 0.39221132 -0.17581908  0.35237569  0.39380003  0.08897726 -0.21461213
  0.24978638 -0.28694705 -0.57487463  0.08260046], lbd=-3.094310, ct

## **We note the the left inverse for the projection does not have an effect on convergence - all changes in xB give the same answer as the Schur form. There is a strong effect if we change A, the left inverse related to lambda.**

In [52]:
  
  n = 15
  m = 3
  tol = 1e-10
  max_itr = 200
  np.random.seed(0)
  n_test = 5

  x_init = np.random.randn(n)
  x_init /= np.linalg.norm(x_init)
  a = 1
  AA = np.eye(n)
  # a = np.random.randint(0, 5)
  # AA = utils.gen_random_symmetric_pos(n)
  T = utils.generate_symmetric_tensor(n, m)
  for i in range(n_test):
      b = np.random.randint(0, 7)
      BB = utils.gen_random_symmetric_pos(n)
      # BB = np.eye(n)
      # by_line will print the iteration line by line
      x, lbd, ctr, converge, err = ortho_sphere_power(
          T, max_itr, tol, x_init, a=a,
          b=b, AA=AA, BB=BB, by_line=False)
      print('b=%d x=%s, lbd=%f, ctr=%d, converge=%d, err=%f' % (
          b,  str(x), lbd, ctr, converge, err))


b=1 x=[ 0.04964858 -0.08937529 -0.00552724  0.21692838  0.09315258  0.03618049
  0.34047679 -0.34162985  0.69346359 -0.13014038 -0.12040857  0.35724421
 -0.15364912 -0.17791238  0.06801464], lbd=1.394927, ctr=16, converge=1, err=0.000000
b=5 x=[ 0.04964858 -0.08937529 -0.00552724  0.21692838  0.09315258  0.03618049
  0.34047679 -0.34162985  0.69346359 -0.13014038 -0.12040857  0.35724421
 -0.15364912 -0.17791238  0.06801464], lbd=1.394927, ctr=16, converge=1, err=0.000000
b=3 x=[ 0.04964858 -0.08937529 -0.00552724  0.21692838  0.09315258  0.03618049
  0.34047679 -0.34162985  0.69346359 -0.13014038 -0.12040857  0.35724421
 -0.15364912 -0.17791238  0.06801464], lbd=1.394927, ctr=16, converge=1, err=0.000000
b=2 x=[ 0.04964858 -0.08937529 -0.00552724  0.21692838  0.09315258  0.03618049
  0.34047679 -0.34162985  0.69346359 -0.13014038 -0.12040857  0.35724421
 -0.15364912 -0.17791238  0.06801464], lbd=1.394927, ctr=16, converge=1, err=0.000000
b=2 x=[ 0.04964858 -0.08937529 -0.00552724  0.21

In [53]:
  
  # n = 15
  # m = 3
  tol = 1e-10
  max_itr = 200
  np.random.seed(0)
  n_test = 5

  # x_init = np.random.randn(n)
  # x_init /= np.linalg.norm(x_init)
  # b = np.random.randint(0, 5)
  # BB = utils.gen_random_symmetric_pos(n)
  b = 1
  BB = np.eye(n)

  # T = utils.generate_symmetric_tensor(n, m)
  for i in range(n_test):
      a = np.random.randint(0, 7)
      AA = utils.gen_random_symmetric_pos(n)
      # AA = np.eye(n)

      x, lbd, ctr, converge, err = ortho_sphere_power(
          T, max_itr, tol, x_init, a=a,
          b=b, AA=AA, BB=BB, by_line=False)
      print('a=%d x=%s, lbd=%f, ctr=%d, converge=%d, err=%f' % (
          a, str(x), lbd, ctr, converge, err))


a=4 x=[0.25042159 0.26103617 0.2519741  0.24987413 0.2720132  0.24902325
 0.25955763 0.25865254 0.26273021 0.27921937 0.25616251 0.24152512
 0.2581056  0.27240664 0.24734178], lbd=28.703310, ctr=7, converge=1, err=0.000000
a=6 x=[-0.25042159 -0.26103617 -0.2519741  -0.24987413 -0.2720132  -0.24902325
 -0.25955763 -0.25865254 -0.26273021 -0.27921937 -0.25616251 -0.24152512
 -0.2581056  -0.27240664 -0.24734178], lbd=-28.703310, ctr=17, converge=1, err=0.000000
a=2 x=[-0.25042159 -0.26103617 -0.2519741  -0.24987413 -0.2720132  -0.24902325
 -0.25955763 -0.25865254 -0.26273021 -0.27921937 -0.25616251 -0.24152512
 -0.2581056  -0.27240664 -0.24734178], lbd=-28.703310, ctr=10, converge=1, err=0.000000
a=6 x=[-0.25042159 -0.26103617 -0.2519741  -0.24987413 -0.2720132  -0.24902325
 -0.25955763 -0.25865254 -0.26273021 -0.27921937 -0.25616251 -0.24152512
 -0.2581056  -0.27240664 -0.24734178], lbd=-28.703310, ctr=33, converge=1, err=0.000000
a=6 x=[0.25042159 0.26103617 0.2519741  0.24987413 0.2720