# Rational Transfer Functions and the Power of Dynamic Representation
# On the Limitations of OLS Regression with Lagged Variables

Every budding modeler at some early point stumbles upon the apparent predictive power of regressing on lagged dependent variables. That it happens long before any formal coursework on the subject is because it is so intuitive. To borrow from [Tobler's first law of geography](https://en.wikipedia.org/wiki/Tobler%27s_first_law_of_geography), "everything is related to everything else, but near things are more related than distant things," a quote which surely holds as true for time as it does for space. And of all the complex factors that come together to create a variable, only a fraction of which we may measure or fully understand, we can rely on the lagged value to be of a similar formula and hense lagged dependent variables encapsulate what we do not know. This is the essense of the reduced dynamic form, and the purpose of this article is to test its limits especially when paired with Ordinary Least Squares regression.

## Dynamic challenge: convert a nonlinear, nonstationary, structural time-series model

There's a time-series model I've been studying because its nonlinearity, nonstationarity and its encapsulation of latent hypothetical concepts into regression variables. It is the fitness-fatigue model of atheletic performance and has the form

$$
p_t = \mu + k_1 \sum_{i=1}^{t - 1} w_i \exp\left(\frac{-(t - i)}{\tau_1}\right) - k_2 \sum_{i=1}^{t - 1} w_i \exp\left(\frac{-(t - i)}{\tau_2}\right) + \epsilon_t,
$$
where $p_t$ is a numeric measure of (athletic) performance, $w_t$ is training "dose" (i.e., time-weighted training intensity) occuring in time period $t$, and $\epsilon_t$ is i.i.d. guassian error. The model functions as a linear regression with two nonlinear features, these being:

$$
h_t =\sum_{i=1}^{t - 1} w_i \exp\left(\frac{-(t - i)}{\tau_1}\right), \, \text{and} \, 
g_t =\sum_{i=1}^{t - 1} w_i \exp\left(\frac{-(t - i)}{\tau_2}\right),
$$
latent representations of athletic "fitness" and "fatigue," respectively. These complicated features are convolutions of athletic training history with exponential decay but differ in the decay rate. The first convolution sum, $h_t$ typically has a much longer decay and represents fitness. Fatigue, or $g_t$ is much more transient and associated with a faster exponential decay. The regression coefficient on fatigue is typically much larger than for fitness as a counterpoint to its transience.

Simple math (done here) show that fitness and fatigue can be put into dynamic form as so:
$$
\begin{aligned}
h_t &= \theta_1 h_{t - 1} + \theta_1 w_{t - 1}, \\
g_t &= \theta_2 g_{t - 1} + \theta_2 w_{t - 1},
\end{aligned}
$$
where $\theta_1 =  e^{-1 / \tau_1}$ and $\theta_2 = e^{-1 / \tau_2}$.
This is a starting point to place this model within a Kalman Filter framework, which works quite well (link). Other approaches to fitting the model are to use brute force nonlinear least squares (link needed) and to use a distributed lag approach with a flexible functional form (link needed). Those methods work well also, but they are complex, and here I pose the central question of the article: *what would happen if you just OLS regressed on lagged variables of performance and training? How far could you get?*

The answer is "pretty far," but there are caviats, and blindly applying OLS regression on lagged variables will never get you to the truth.

## Rational transfer functions to the rescue

If we plug the bivarate dynamic representation of fitness and fatigue into the original model, we arrive at
$$
p_t = \mu + k_1 \theta_1 h_{t - 1} - k_2 \theta_2 g_{t - 1} + (\theta_1 + \theta_2) w_{t - 1} + \epsilon_t,
$$
which looks nicer but does us little good since $h_t$ and $g_t$ are unavailable to the modeler. Working a little harder, we can use the "backshift operator" $\text{B}$ defined by $\text{B} y_t = y_{t-1}$ for arbitrary time-indexed variable $y$, and arrive at 
$$
\begin{aligned}
(1 - \theta_1 \text{B}) h_t &= \theta_1 \text{B} w_t, \\
(1 - \theta_2 \text{B}) g_t &= \theta_2 \text{B} w_t.
\end{aligned}
$$
Solving for $h_t$ and $g_t$ and plugging back into the original model, we arrive at
$$
p_t = \mu + k_1 \frac{\theta_1 \text{B}}{1 - \theta_1\text{B}} w_t - k_2 \frac{\theta_2 \text{B}}{1 - \theta_2\text{B}} w_t + \epsilon_t.
$$
Thus we have two rational transfer functions operating on the exogenous input series $w_t$ (the training load that comes from the coach!). With rational transfer functions, denomonator terms of the form $(1 - \theta \text{B})$ correspond to an autoregressive impulse response, i.e., a process with a long memory, and this is a nuissance to us. There is an option to rid ourselves of the denomonator component, but not one without a cost.

A "common filter," as discussed in [Identification of Multiple-Input Transfer Function Models](https://www.researchgate.net/publication/276953549_Identification_of_Multiple-Input_Transfer_Function_Models) by Liu & Hanssens (1982) premultiplies the right and left-hand side of a time-series equation by $(1 - \theta \text{B})$. It does not change the transfer function weights, so you can apply multiple common filters in succession, and besides causing complexity and losing some rows due to lags, you have not destroyed the relationship between the input and output series.

The point is, if we were to use the common filter $(1 - \theta_1 \text{B}) (1 - \theta_2 \text{B})$, we would be rid of the autoregressive components. Let's see.

$$
(1 - \theta_1 \text{B}) (1 - \theta_2 \text{B}) p_t = (1 - \theta_1 \text{B}) (1 - \theta_2 \text{B}) \mu + k_1 \theta_1 (\text{B} - \theta_2 \text{B}^2) w_t - k_2 \theta_2 (\text{B} - \theta_1 \text{B}^2) w_t + (1 - \theta_1 \text{B}) (1 - \theta_2 \text{B})\epsilon_t.
$$
It still looks ugly, but after expanding the polynomials applying the backshift operations, we arrive at:

$$
p_t = (1 - \theta_1) (1 - \theta_2) \mu + (\theta_1 + \theta_2) p_{t - 1} - \theta_1 \theta_2 p_{t - 2} + (k_1\theta_1 - k_2 \theta_2) w_{t-1} - \theta_1 \theta_2 (k_1 - k_2) w_{t-2} + \epsilon_t - (\theta_1 + \theta_2) \epsilon_{t-1} + \theta_1\theta_2 \epsilon_{t-2}.
$$

TODO: Similify the best you can. It's just a regression with lags, or is it?









