# A Quick Tutorial for Implicit Deep Learning and its variants

This tutorial introduces the **Implicit Deep Learning** (IDL) framework using the `idl` package in 3 main parts:

1. **Overview of IDL framework**
    - Implcit Base model
    - Implicit RNN
    - Implicit Attention Head
    - State-driven Implicit Model (SIM)
2. **A toy example**
    - Training Implicit Models and Their Variants
3. **Custom Activation for Implicit framework**
    - Customize Activation Functions

In [None]:
# install idl
!pip install -e .

## Overview of IDL framework

### Implicit Base model
Implicit Deep Learning (IDL) models find a **fixed point solution** instead of explicitly stacking layers. 

Assume we have a dataset with $m$ data samples, each represented by $p$ input features. The model maps these inputs to a hidden state of dimension $n$ and produces predictions over $q$ output classes.

Given an input matrix $ U \in \mathbb{R}^{p \times m} $, the model maintains a state matrix $ X \in \mathbb{R}^{n \times m} $ that satisfies:

$$
X = \phi(AX + BU)
$$

where:
- $ A \in \mathbb{R}^{n \times n} $, $ B \in \mathbb{R}^{n \times p} $: learnable parameters,
- $ \phi: \mathbb{R}^{n \times m} \to \mathbb{R}^{n \times m} $: the activation (e.g., ReLU, tanh, sigmoid),
- $ U \in \mathbb{R}^{p \times m} $:  the input matrix (each column is a data sample),
- $ X \in \mathbb{R}^{n \times m} $:  the hidden state.

The model predicts the output matrix $ \hat{Y} \in \mathbb{R}^{q \times m} $ using:

$$
\hat{Y} = CX + DU
$$

where:
- $ C \in \mathbb{R}^{q \times n} $, $ D \in \mathbb{R}^{q \times p} $: learnable parameters,
- $ \hat{Y} \in \mathbb{R}^{q \times m} $: final model output.

The model has a well-posed fixed-point solution, meaning that a **unique solution exists**, if:
- The activation function is 1-Lipschitz: 
  $$
  |\phi(x) - \phi(y)| \leq |x - y|
  $$

- The matrix $ A $ satisfies:
  $$
  \| A \|_{\infty} < 1
  $$

In [6]:
from idl import ImplicitModel

### Implicit RNN
Instead of explicitly defining the hidden state update with a linear transformation, Implicit RNNs use an Implicit Base models to formulate the recurrence in a conventional RNN form. Assume we have a dataset with $m$ data samples, each consisting of a sequence length $T$. At each timestep, the input has $p$ features, and the model maintains a hidden state of dimension $n$.

Given an input sequence $ U \in \mathbb{R}^{m \times T \times p} $, the implicit hidden state $ X_t \in \mathbb{R}^{m \times n} $ satisfies the equilibrium equation:

$$
X_t = \phi(AX_t + B [U_t, H_{t-1}]),
$$

followed by the output mapping equation, to return the RNN hidden state $H_t$:

$$
H_t = CX_t + DU_t,
$$

where:
- $ A \in \mathbb{R}^{n \times n} $, $ B \in \mathbb{R}^{n \times (p+n)} $, $C \in \mathbb{R}^{n \times n}$, $D \in \mathbb{R}^{n \times p}$: learnable parameters,
- $ U_t \in \mathbb{R}^{m \times p} $: the input at timestep $ t $,
- $ X_t \in \mathbb{R}^{m \times n} $: implicit hidden state at time step $t$ (solved via a fixed-point equation), 
- $ H_t $: the hidden state at timestep $ t $,
- $ \phi $: the activation (e.g., ReLU, tanh, sigmoid).

Finally, a linear layer projects the final hidden state $H_T$ to the output dimension.

In [1]:
from idl import ImplicitRNN