# Matrix Equivalents from the Hypercomplex Algebra

Two numbers from a hypercomplex algebra $z, w \in \mathbb{H}^N$ can be multiplied such that the result $m = zw, m \in \mathbb{H}^N$ is an element of the same algebra. The space is closed under addition as well, and so hypercomplex algebras are also vector spaces.

The map between an elementwise product of two hypercomplex vector  $\hat{z}, \hat{w} \in \mathbb{H}^{M \times N}$ can be viewed as as a linear transformation on the elements of the right factor of the product $\hat{m} = \hat{z} \hat{w} = A \hat{w}$.

Here the matrix $A$ depends on the entries of the geometric vector $\hat{z}$. In this notebook, we explore a systematic way to generate these transformation matrices.

$\hat{z}, \hat{w} \in \mathbb{H}^{M \times N}$

$\hat{m} \in \mathbb{H}^{M \times N}, \hat{z} \hat{w} = \hat{m} $

frame as a linear transformation with matrix $A_{[\hat{z}]}$ conitioned on $\hat{z}$

$ A_{[\hat{z}]} \hat{w} = \hat{m} $ 

estimate $A_{[\hat{z}]}$ given $\hat{w}, \hat{m}$ with a static $\hat{z}$ by randomly sampling $\hat{w}$ and computing $\hat{m}$

$ A_{[\hat{z}]}[t] \in \mathbb{R}^{N \times N}, A_{[\hat{z}]}[t + 1] = (\hat{y} \hat{x}^{T} + \gamma A_{[\hat{z}]}[t])(\hat{x} \hat{x}^{T} + \gamma I)^{-1} $

Why does this converge to the right matrix?
Why does this converge to a full rank matrix?
Is this matrix different or equivalent to hypercomplex vector a?

In [1]:
import numpy as np
import matplotlib.pyplot as plt

In [2]:
def hypercomplex_conjugate(a):
    c = np.ones(a.shape)
    c[..., 1:] *= -1
    return c * a
def hypercomplex_multiply(a, b):
    if a.shape[-1] == 1:
        return a * b
    else:
        def cayley_dickson(p, q, r, s):
            return np.concatenate([
                (hypercomplex_multiply(
                    p,
                    r) -
                hypercomplex_multiply(
                    hypercomplex_conjugate(s),
                    q)),
                (hypercomplex_multiply(
                    s,
                    p) +
                hypercomplex_multiply(
                    q,
                    hypercomplex_conjugate(r))),
            ], axis=(len(a.shape) - 1))
        return cayley_dickson(
            a[..., :(a.shape[-1] // 2)],
            a[..., (a.shape[-1] // 2):],
            b[..., :(a.shape[-1] // 2)],
            b[..., (a.shape[-1] // 2):])

In [3]:
def hypercomplex_conjugate_gradient(a, da):
    return hypercomplex_conjugate(da)   
def hypercomplex_multiply_gradient(a, b, da, db):
    return (hypercomplex_multiply(da, b),
        hypercomplex_multiply(a, db))
def hypercomplex_basis_gradient(a):
    basis = np.zeros((a.shape[-1], a.shape[-1]))
    np.fill_diagonal(basis, 1)
    basis = basis.reshape((1, a.shape[-1], a.shape[-1]))
    return basis

In [4]:
class HCX(object):
    def random(*kdims, hcx_size=1, mean=0, std=1):
        shape = [
            kdims[i] if i < len(kdims) 
            else 2**hcx_size
            for i in range(len(kdims) + 1)]
        return np.random.normal(mean, std, shape)
    def basis(x, dx=0, dir=1):
        if dir > 0:
            return x
        else:
            return hypercomplex_basis_gradient(x)
    def conj(x, dx=0, dir=1):
        if dir > 0:
            return hypercomplex_conjugate(x)
        else:
            return hypercomplex_conjugate_gradient(x, dx)
    def add(x, y, dx=0, dy=0, dir=1):
        if dir > 0:
            return x + y
        else:
            return dx, dy
    def sub(x, y, dx=0, dy=0, dir=1):
        if dir > 0:
            return x - y
        else:
            return dx, -dy
    def mul(x, y, dx=0, dy=0, dir=1):
        if dir > 0:
            return hypercomplex_multiply(x, y)
        else:
            return hypercomplex_multiply_gradient(x, y, dx, dy)
    def norm(x, dx=0, dir=1):
        if dir > 0:
            return np.sum(
                hypercomplex_multiply(HCX.conj(x), x),
                axis=(len(x.shape) - 1))**0.5
        else:
            c = 0.5 / np.sum(
                hypercomplex_multiply(HCX.conj(x), x),
                axis=(x.shape[-1] - 1))**0.5
            g = hypercomplex_multiply_gradient(
                HCX.conj(x), 
                x, 
                HCX.conj(x, dx, dir=-1), 
                dx)
            r = HCX.conj(g[0]) + g[1]
            return c * r
    def inv(x, dx=0, dir=1):
        if dir > 0:
            return HCX.conj(x) / np.reshape(HCX.norm(x)**2, (-1, 1))
        else:
            return (HCX.conj(x, dx, dir=-1) 
                / HCX.norm(x)**2 
                - 2 * HCX.conj(x) 
                / HCX.norm(x)**3
                * HCX.norm(x, dx, dir=-1))

In [5]:
M = 5     # The hypercomplex size
N = 100   # The vector elements
T = 1e-20 # A convergenece threshold
L = 0.05  # A hyperparameter to tune
V = 100   # The number of validation steps
D = 100   # The number of iterations before validating


def validate(_a, _A):
    _loss = 0.0
    for i in range(V):
        _b = HCX.random(N, 1, hcx_size=M).transpose(0, 2, 1)
        _c = HCX.mul(
            _a.transpose(0, 2, 1), 
            _b.transpose(0, 2, 1)).transpose(0, 2, 1)
        _loss += np.sum(_c - np.matmul(_A, _b))**2 / V
    return _loss


a = HCX.random(N, 1, hcx_size=M).transpose(0, 2, 1)
A = HCX.random(N, 2**M, hcx_size=M)
I = np.reshape(np.eye(2**M), (1, 2**M, 2**M))


loss = 1.0
iterations = 0
while loss > T:
    iterations += 1
    b = HCX.random(N, 1, hcx_size=M).transpose(0, 2, 1)
    c = HCX.mul(
        a.transpose(0, 2, 1), 
        b.transpose(0, 2, 1)).transpose(0, 2, 1)
    A = np.matmul(
        (c * b.transpose(0, 2, 1)) + L * A,
        np.linalg.inv(
            (b * b.transpose(0, 2, 1)) + L * I))
    if iterations % D == 0:
        loss = validate(a, A)
        print("Validation Loss:", loss)
    

print("Iterations until Convergence:", iterations)

Validation Loss: 8503.70241855
Validation Loss: 351.600597325
Validation Loss: 14.2963250144
Validation Loss: 0.580349162686
Validation Loss: 0.028797913631
Validation Loss: 0.00107115941327
Validation Loss: 3.70557734342e-05
Validation Loss: 1.55183561216e-06
Validation Loss: 6.17599993083e-08
Validation Loss: 2.17271455948e-09
Validation Loss: 9.67877590938e-11
Validation Loss: 5.18808848901e-12
Validation Loss: 2.19021587219e-13
Validation Loss: 7.97304141779e-15
Validation Loss: 2.71109297424e-16
Validation Loss: 1.27677876108e-17
Validation Loss: 4.56014582343e-19
Validation Loss: 2.09589293323e-20
Validation Loss: 1.9063466388e-21
Iterations until Convergence: 1900


In [6]:
print(np.linalg.det(A)) # This matrix is full rank

[  7.33654710e+16   1.31817327e+17   8.68601436e+18   1.20959725e+23
   1.42016163e+20   1.79508154e+16   1.31836607e+16   2.58928903e+19
   5.38538742e+22   1.84690720e+21   1.71293027e+17   4.90497290e+18
   4.81180129e+22   1.13645383e+20   4.09735653e+19   1.74161517e+18
   3.00236887e+22   6.04652357e+15   1.25438368e+23   8.79198493e+19
   2.78752145e+16   1.02076061e+20   6.17642413e+24   2.16488833e+21
   5.70675405e+18   8.99779116e+19   4.25840775e+17   1.67776126e+19
   2.06628834e+19   2.63165048e+20   1.76515731e+18   9.63157385e+18
   5.32036671e+22   6.16270434e+22   7.41605821e+20   1.90819544e+24
   1.09388195e+19   9.74264416e+23   7.86003041e+18   1.59675326e+19
   1.19821823e+22   2.17396215e+21   1.10810435e+21   5.91026352e+17
   8.35750448e+19   2.73518379e+20   7.06657283e+21   7.17386410e+17
   2.30308386e+20   3.19653651e+25   6.97293900e+19   3.29291631e+18
   1.76762887e+22   3.45546836e+22   4.43931531e+20   2.14366382e+22
   1.01318409e+15   5.41913675e+21