Skip to content

ckerce/hippo-s4-mamba-operator-dynamics

 
 

Repository files navigation

Operator evolution in HiPPO / s4 / Mamaba

There appears to be a discrepancy between the way operators are defined in the papers (Hippo, S4, and Mamba) and the way they are implemented in the code. The code uses elementwise exponentiation, while the papers use formulas for the fundamental matrix solution (see, for example (1)). Most methods for computing the fundamental matrix solution $exp(A)$ require an undesirable complexity and memomory usage to differentiate, however such implementations have been previously demonstrated in the numerical weather modeling community (see (2) and references therein).

  • A first obvious approach to investigating this possible discrepancy is to use the algebraic form of Zassenhaus formula in conjunction with the existing s4/Mamaba technique, which starts from an initial good approximation for the optimal state propagator, $exp(tA_{opt})$: $$exp(t(A + dA)) = exp(tA) * M. $$ Here $M$ is defined by matrix exponentials of (high-order) commutators with $A$ & $dA$, and is implemented as a trainable parameter intialized @ $M = Id = eye()$.

Other easy-to-implement options include the following, but at progressivly more computational expense:

  • Use Runge-Kutta integration to get the initial $exp(tA)$, and potentially make a small number of steps part of the training loop.

  • Use Pade Approximates to get the initial $exp(tA)$; Golub and Van Loan (3).

  • Use other techniques from Moler and Van Loan's "19 dubious way's paper" (the most recent update, (4)).

(1) On the exponential solution of differential equations for a linear operator; Wilhelm Magnus; Communications on Pure and Applied Mathematics, November 1954; https://doi.org/10.1002/cpa.3160070404

(2) Assimilation of angle of arrival measurements from an antenna of GPS receivers in the WRF model; F Vandenberghe, Clayton Kerce, Robert Bock; Assimilation of Remote Sensing and In Situ Data in Modern Numerical Weather and Environmental Prediction Models

(3) Matrix Computations; Golub and Van Loan

(4) Nineteen Dubious Ways to Compute the Exponential of a Matrix, 25 years later; Moler and Van Loan

About

Investigating alternative methods for defining operator dynamics in s4/mamba; examining the effect on trainability.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 50.9%
  • Cuda 34.0%
  • C++ 13.9%
  • C 1.2%