# Problem 4:

(8p)

**a)** Consider the problem of one-step-ahead prediction for a time series $\{y_t\}_{t \geq 1}$. The code cell below defines a three-layer Jordan-Elman RNN for solving this problem. Write out the mathematical expressions corresponding to this model, in terms of the hidden state variables, the state update equiations, and the output equations. Also, for all parameters in your model, specify their respective dimensions.
<div style="text-align: right"> (5p) </div>


In [1]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.SimpleRNN(units = 10, input_shape=(None,1), return_sequences=True, activation='tanh'),
    layers.SimpleRNN(units = 15, input_shape=(None,1), return_sequences=True, activation='tanh'),
    layers.SimpleRNN(units = 20, input_shape=(None,1), return_sequences=True, activation='tanh'),
    layers.Dense(units = 60, activation='relu'),
    layers.Dense(units = 1, activation='linear')
])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn (SimpleRNN)      (None, None, 10)          120       
                                                                 
 simple_rnn_1 (SimpleRNN)    (None, None, 15)          390       
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, None, 20)          720       
                                                                 
 dense (Dense)               (None, None, 60)          1260      
                                                                 
 dense_1 (Dense)             (None, None, 1)           61        
                                                                 
Total params: 2,551
Trainable params: 2,551
Non-trainable params: 0
_________________________________________________________________


**Solution**

$$h_t^{(1)}=\sigma(W^{(1)}h^{(1)}_{t-1}+U^{(1)}y_{t-1}+b^{(1)})$$
$$h_t^{(2)}=\sigma(W^{(2)}h^{(2)}_{t-1}+U^{(2)}h_{t-1}^{(1)}+b^{(2)})$$
$$h_t^{(3)}=\sigma(W^{(3)}h^{(3)}_{t-1}+U^{(3)}h_{t-1}^{(2)}+b^{(3)})$$
$$\hat{y}_{t|t-1}=C*ReLU(W^{(4)}h_t^{(3)}+b^{(4)})+c$$

## The parameters:

### RNN Layer 1

$W^{(1)}$ of size 10*10=100

$U^{(1)}$ of size 10*1=10

$b^{(1)}$ of size 10*1=10

Total:120

### RNN Layer 2

$W^{(2)}$ of size 15*15=225

$U^{(2)}$ of size 15*10=150

$b^{(2)}$ of size 15*1=15

Total:390

### RNN Layer 3

$W^{(3)}$ of size 20*20=400

$U^{(3)}$ of size 20*15=300

$b^{(3)}$ of size 20*1=20

Total:720

### Full Connection Layer 1

$W^{(4)}$ of size 20*60=1200

$b^{(4)}$ of size 60*1=60

Total:1260

### Full Connection Layer 2

$C$ of size 60*1=60

$c$ of size 1

Total:61

----

**b)** When working with a **state-space model**,
$$
    \begin{cases} \alpha_t = T \alpha_{t-1} + R \eta_t, & \eta_t \sim \mathcal{N}(0,Q), \\ y_t = Z \alpha_t + \varepsilon_t, & \varepsilon_t \sim \mathcal{N}(0,H), \end{cases}
$$
we are able to find equivivalent models. For instance if we set $\tilde{\alpha}_t = \Gamma \alpha_t$ where $\Gamma$ is an invertible matrix. What is the equivivalent model in this case? When we say that the two models are equivivalent, what is it that is equivivalent?

<div style="text-align: right"> (3p) </div>

**Solution**

$\alpha_t=\Gamma^{-1}\tilde{\alpha_t}$, hence

$\tilde{\alpha_t}=\Gamma T \Gamma^{-1} \tilde{\alpha_{t-1}}+\Gamma R \eta_t$, and

$y_t=Z\Gamma^{-1}\tilde{\alpha_t} + \epsilon_t$

Let $\tilde{T} = \Gamma T \Gamma^{-1}$, $\tilde{R}=\Gamma R$, and $\tilde{Z}=Z\Gamma^{-1}$

We have 

$$
    \begin{cases} \tilde{\alpha_t} = \tilde{T} \tilde{\alpha_{t-1}} + \tilde{R} \eta_t, & \eta_t \sim \mathcal{N}(0,Q), \\ y_t = \tilde{Z} \tilde{\alpha_t} + \varepsilon_t, & \varepsilon_t \sim \mathcal{N}(0,H), \end{cases}
$$

When we say that the two models are equivivalent, two different models have the same information fitting ability or predictive ability