# Intelligent Agents and Active Inference

### Preliminaries

- Goal 
  - Introduction to Active Inference and application to the design of synthetic intelligent agents 
- Materials        
  - Mandatory
    - These lecture notes
    - Karl Friston - 2016 - [The Free Energy Principle](https://www.youtube.com/watch?v=NIu_dJGyIQI) (video)
  - Optional
    - Raviv (2018), [The Genius Neuroscientist Who Might Hold the Key to True AI](https://www.wired.com/story/karl-friston-free-energy-principle-artificial-intelligence/).
        - Interesting article on Karl Friston, who is a leading theoretical neuroscientist working on a theory that relates life and intelligent behavior to physics (and Free Energy minimization). (**highly recommended**) 
  - References

### Illustrative Example

<span style="color:red">LET'S DO THE MOUNTAIN CAR TASK HERE (THIJS CODE?) or BATMAN PARKING. ANY SUGGESTIONS? </span>

### Agents

- In the previous lessons we assumed that a data set was given. 
- In this lesson we consider _agents_. An agent is a system that _interacts_ with its environment through both sensors and actuators.
- Crucially, by acting onto the environment, the agent is able to affect the data that it will sense in the future.
  - As an example, by changing the direction where I look, I can affect the sensory data that will be sensed by my retina.
- With this definition of an agent, (biological) organisms are agents, and so are robots, self-driving cars, etc.
- In an engineering context, we are particularly interesting in agents that behave with purpose (with a goal in mind), e.g. to drive a car or to design a speech recognition algorithm.
- In this lesson, we will describe how __goal-directed behavior__ by biological (and synthetic) agents can also be interpreted as minimization of a free energy functional $F[q]$. 

### Motivation: Karl Friston on the Cost Function for Behavior

- Have a look at this [video segment by Karl Friston on the cost function for intelligent behavior](https://www.vibby.com/watch?vib=71iPtUJxd).




### What makes a good agent?

<img src="./figures/good-regulator.png" width="600px">

### Agent vs environment interaction

<img src="./figures/agent-environment-interaction.png" width="600px">

### System architecture 

- An active inference-based agent comprises of
  1. A free energy functional $F[q] = \mathbb{E}_q\left[ \log\frac{q(z)}{p(x,z)}\right]$, where
    - $p(x,z) = \prod_k p(x_k,z_k|z_{k-1})$ with (latent) $z_k = \{ s_k, u_k, \theta_k\}$ is a _generative_ model.
    - $q(z)$ is a _recognition_ model.
  2. A recipe to minimize the free energy $F[q]$

- We also assume that the agent interacts with an environment, which we represent by a dynamic model
$$
(y_t,\tilde{s}_t) = R_t\left( a_t,\tilde{s}_{t-1}\right)
$$
where $a_t$ are _actions_ , $y_t$ are _outcomes_ and $\tilde{s}_t$ holds the environmental _states_. 

- The agent can push actions $a_t$ onto the environment and measure responses $y_t$, but has no access to the environmental states $\tilde{s}_t$.

- Interactions between the agent and environment are described by 
$$\begin{align*}
a_t &\sim q(u_t) \\
x_t &= y_t 
\end{align*}$$
iow, actions are drawn from the posterior over control signals.

- Note that this system implies a recursive dependency since the agent's future observations depend on the agent's current (and past) actions: $$x_{t+1} = x_{t+1} \left( a_{t+1} \right) = x_{t+1} \left( a_{t+1} \left( u_{t+1}\left( x_t \left( a_t \left( \cdots \right) \right) \right)\right) \right)$$
  - The agent selects its own data set!

### Biological interpretation
- In biotic parlance, 
  - _behavior_ is inference for the control signals ($u$)
  - _perception_ is inference for the internal states ($s$). 
  - _learning_ is inference for the parameters ($\theta$)
 
- The CA decomposition of free energy shows that _actions_ aim to maximize accuracy since complexity is not a function of the data ($x = x(a)$) 
$$ F[q]=  \underbrace{\sum_z q(z)\log\frac{q(z)}{p(z)}}_{\text{complexity}} - \underbrace{\sum_z q(z) \log p(x|z)}_{\text{accuracy}}$$

- The DE decomposition reveals that _perception_ minimizes inference costs since log-evidence is not affected by inference (not a function of $q$)
$$F[q] = \underbrace{\sum_z q(z) \log \frac{q(z)}{p(z|x)}}_{\text{divergence}} - \underbrace{\log p(x)}_{\text{log-evidence}}$$

- Finally, the EE decomposition discloses a deep link with the 2nd law of thermodynamics (drive towards maximum entropy). An agent aims to maximize entropy over its beliefs subject to constraints put up by its generative model and inference skills (the energy term)
$$F[q] = \underbrace{-\sum_z q(z) \log p(x,z)}_{\text{energy}} - \underbrace{\sum_z q(z) \log \frac{1}{q(z)}}_{\text{entropy}}$$

### Model specification

- We assume that agents live in a dynamic environment and consider the following generative model for the agent (omitting parameters $\theta$)
$$\begin{align*}
p^\prime(x,s,u) &= p(s_{t-1}) \prod_{k=t}^{t+T} \underbrace{p(x_k|s_k) \cdot p(s_k | s_{k-1}, u_k)}_{\text{internal dynamics}} \cdot\underbrace{p(u_k)}_{\substack{\text{control prior}}}
\end{align*}$$

- In order to infer _goal-driven_ (i.e., purposeful) behavior, we now add prior beliefs $p^+(x)$ about future outcomes, leading to an extended agent model:
$$\begin{align*}
p(x,s,u) &= \frac{p^\prime(x,s,u) p^+(x)}{\int_x p^\prime(x,s,u) p^+(x) \mathrm{d}x} \\
  &\propto p(s_{t-1}) \prod_{k=t}^{t+T} p(x_k|s_k) p(s_k | s_{k-1}, u_k) p(u_k) p^+(x_k)
\end{align*}$$





### FFG for Agent Model

- After selecting an action $a_t$ and making an observation $y_t$, the FFG for the model is given by the following FFG:

<img src="./figures/fig-active-inference-model-specification.png" width="800px">

- The (brown) dashed box is the agent's Markov blanket.  


### Online Active Inference

- Online active inference proceeds by iteratively executing three stages: (1) act-execute-observe, (2) infer, (3) slide forward

<img src="./figures/fig-online-active-inference.png" width="600px">

### Specification of Free Energy 

- Consider the agent's inference task at time step $t$, right after having selected an action $a_t$ and having made an observation $y_t$.

- As usual, we record actions and observations by substituting the values into the generative model(in the Act-Execute-Observe phase):
$$\begin{align*}
p(x,s,u) &\propto  \underbrace{p(x_t=y_t|s_t)}_{\text{observation}} p(s_t|s_{t-1},u_t) p(s_{t-1}) \underbrace{p(u_t=a_t)}_{\text{action}} \\ & \quad \cdot \underbrace{\prod_{k=t+1}^{t+T} p(x_k|s_k) p(s_k | s_{k-1}, u_k) p(u_k) p^+(x_k)}_{\text{future}}
\end{align*}$$


- Note that (future) $x$ is also a latent variable and hence we include $x$ in the recognition model.  

- This leads to the following free energy functional
$$\begin{align*}
F[q] &\propto \sum_{x,s,u} q(x,s,u) \log \frac{q(x,s,u)}{p(x,s,u)} 
\end{align*}$$

### FE Decompositions 

- Lots of interesting FE decompositions are possible again. For instance
$$\begin{align*}
F[q] &\propto \sum_{x,s,u} q(x,s,u) \log \frac{q(x,s,u)}{p(x,s,u)} \\
&= \sum_{u} q(u) \underbrace{\sum_{x,s} q(x,s|u)\log \frac{q(x,s|u)}{p(x,s|u)}}_{F_u[q]} + \underbrace{\sum_{u} q(u) \log \frac{q(u)}{p(u)}}_{\text{complexity}}
\end{align*}$$
breaks the FE into a complexity term and a term $F_u[q]$ that is conditioned on the policy $u$. 

- It can be shown (exercise) that the optimal posterior for the policy is now given by
$$
q^*(u) \propto p(u) \exp \left( -F^*_u \right)
$$

- Let's consider a break-up $x=(x_t,x_{>t})$ with $x_{>t} = (x_{t+1},\ldots,x_{t+T})$ that recognizes the distinction between already observed and future data. Then
$$\begin{align*}
F_u[q] &= \underbrace{-\log p(x_t)}_{\substack{-\log(\text{evidence})  \\ \text{(surprise)}}} + \underbrace{\sum_{x,s} q(x_{>t},s|u)\log \frac{q(x_{>t},s|u)}{p(x_{>t},s|u)}}_{\substack{\text{divergence}\\ \text{(inference costs)}}}\,.
\end{align*}$$

- The inference costs (divergence term) can be further decomposed to 
$$\begin{align*} \underbrace{-\sum_{x} q(x_{>t}) \log p(x_{>t})}_{\substack{\text{expected surprise}  \\ \text{(goal-directed, pragmatic costs)}}} + \underbrace{\sum_{x,s} q(x_{>t},s|u) \log \frac{q(x_{>t},s|u)}{p(s|x_{>t},u)}}_{\text{epistemic costs}}
\end{align*}$$

- Minimizing goal-directed costs selects actions that (expect to) fullfil the priors over future observations. Minimization of epistemic ("knowledge seeking") costs leads to actions that maximize information gain about the environmental dynamics. This can be seen by further decomposition of the epistemic costs into
$$\begin{align*}
&\sum_{x,s} q(x_>t,s|u) \log \frac{q(s|u)}{p(s|x_{>t},u)} + \sum_{x,s} q(x_{>t},s|u) \log q(x_{>t}|s,u) \\
\approx &\underbrace{\sum_{x,s} q(x_>t,s|u) \log \frac{q(s|u)}{q(s|x_{>t},u)}}_{-\text{mutual information}} - \underbrace{\mathbb{E}_{q(s|u)}\left[ H\left[ q(x_{>t}|s,u)\right]\right]}_{\text{ambiguity}} 
\end{align*}$$
where we used the approximation $q(s|x_{>t},u) \approx p(s|x_{>t},u)$ to illuminate the link to the mutual information. 

- <font color="red">this seems to be a problem since we want to minimize ambiguity. Check this further </font>

- Minimizing FE leads (approximately) to mutual information maximization between internal states $s$ and observations $x$. In other words, FEM leads to actions that aim to seek out observations that are maximally informative about the hidden causes of these observations. 

- Ambiguous states have uncertain mappings to observations. Minimizing FE leads to actions that try to avoid ambiguous states. 

- In short, if the generative model includes variables that represent (yet) unobserved future observations, then action selection by FEM leads to a very sophisticated behavioral strategy that is maximally consistent with  
  - Bayesian notions of model complexity
  - evidence from past observations
  - goal-directed imperatives by priors on future observations
  - epistemic (knowledge seeking) value maximization, both in terms of MI maximization and avoidance of ambiguous states
  
- All these imperatives are simultaneously represented and automatically balanced against each other in a single time-varying cost function (Free Energy) that needs no tuning parameters. 

- (Just to be sure, you don't need to memorize these derivations nor are you expected to derive them on-the-spot. We present these decompositions only to provide insight into the multitude of forces that underlie FEM-based action selection.)



### Free energy distribution in FFG 

(obsolete now, needs update)

<img src="./figures/ffg-active-inference-for-policy.png" width="800px">


### 

[Friston at CCN-2016 on Active Inference Applications](https://www.vibby.com/v/Q1Vir9QKGd)

In [1]:
open("../../styles/aipstyle.html") do f
    display("text/html", read(f,String))
end