There appears to be a discrepancy between the way operators are defined in the papers (Hippo, S4, and Mamba) and the way they are implemented in the code. The code uses elementwise exponentiation, while the papers use formulas for the fundamental matrix solution (see, for example (1)). Most methods for computing the fundamental matrix solution
- A first obvious approach to investigating this possible discrepancy is to use the algebraic form of Zassenhaus formula in conjunction with the existing s4/Mamaba technique, which starts from an initial good approximation for the optimal state propagator,
$exp(tA_{opt})$ :$$exp(t(A + dA)) = exp(tA) * M. $$ Here$M$ is defined by matrix exponentials of (high-order) commutators with$A$ &$dA$ , and is implemented as a trainable parameter intialized @$M = Id = eye()$ .
Other easy-to-implement options include the following, but at progressivly more computational expense:
-
Use Runge-Kutta integration to get the initial
$exp(tA)$ , and potentially make a small number of steps part of the training loop. -
Use Pade Approximates to get the initial
$exp(tA)$ ; Golub and Van Loan (3). -
Use other techniques from Moler and Van Loan's "19 dubious way's paper" (the most recent update, (4)).
(3) Matrix Computations; Golub and Van Loan
(4) Nineteen Dubious Ways to Compute the Exponential of a Matrix, 25 years later; Moler and Van Loan