# Intelligent Agents and Active Inference

### Preliminaries

- Goal 
  - Introduction to Active Inference and application to the design of synthetic intelligent agents 
- Materials        
  - Mandatory
    - These lecture notes
    - Karl Friston - 2016 - [The Free Energy Principle](https://www.youtube.com/watch?v=NIu_dJGyIQI) (video)
  - Optional
  - References

### Illustrative Example

### Agents

- In the previous lessons we assumed that a data set was given. 
- In this lesson we consider _agents_. An agent is a system that _interacts_ with its environment through both sensors and actuators.
- Crucially, by acting onto the environment, the agent is able to affect the data that it will sense in the future.
  - As an example, by changing the direction where I look, I can affect the sensory data that will be sensed by my retina.
- With this definition of an agent, (biological) organisms are agents, and so are robots, self-driving cars, etc.
- In an engineering context, we are particularly interesting in agents that behave with purpose (with a goal in mind), e.g. to drive a car or to design a speech recognition algorithm.
- In this lesson, we will describe how __goal-directed behavior__ by biological (and synthetic) agents can also be interpreted as minimization of a free energy functional $F[q]$. 

Friston at CCN-2016 2 fragments

https://vib.by/v/71iPtUJxd




https://youtu.be/b1hEc6vay_k?start=254&end=475

https://youtu.be/b1hEc6vay_k?t=4505&end4860





### What makes a good agent?

<img src="./figures/good-regulator.png" width="600px">

### Agent vs environment interaction

<img src="./figures/agent-environment-interaction.png" width="600px">

### System architecture 

- An agent comprises of
  1. a generative model $p(x|z) p(z)$, where $z = \{ s, u, \theta\}$.
  2. a recognition model $q(z)$
  3. a recipe to minimize FE $F[q]$

- We also assume that the agent interacts with an environment, which we represent by a dynamic model
$$
(y_t,\tilde{s}_t) = R_t\left( a_t,\tilde{s}_{t-1}\right)
$$
where $a_t$ are _actions_ , $y_t$ are _outcomes_ and $\tilde{s}_t$ holds the environmental _states_. 

- The agent can push actions $a_t$ onto the environment and measure responses $y_t$, but has no access to the environmental states $\tilde{s}_t$.

- Interactions between the agent and environment are described by 
$$\begin{align*}
a_t &\sim q(u_t) \\
x_t &= y_t 
\end{align*}$$
iow, actions are drawn from the posterior over control signals.

### Biological interpretation
- Biologically, 
  - _actions_ onto the environment follow from inference for control signals ($u$)
  - _perception_ is inference for the internal states ($s$). 
  - _learning_ relates to inference for the parameters ($\theta$)
  
- The CA decomposition of free energy shows that _actions_ aim to maximize accuracy since complexity is not a function of the data ($x$) 
$$ F[q]=  \underbrace{\sum_z q(z)\log\frac{q(z)}{p(z)}}_{\text{complexity}} - \underbrace{\sum_z q(z) \log p(x|z)}_{\text{accuracy}}$$

- The DE decomposition reveals that _perception_ minimizes inference costs since log-evidence is not affected by inference (not a function of $q$)
$$F[q] = \underbrace{\sum_z q(z) \log \frac{q(z)}{p(z|x)}}_{\text{divergence}} - \underbrace{\log p(x)}_{\text{log-evidence}}$$

- Finally, the EE decomposition discloses a deep link with the 2nd law of thermodynamics (drive towards maximum entropy). Agents aims to maximize entropy subject to constraints put up by its generative model and inference skills (the energy term)
$$F[q] = \underbrace{-\sum_z q(z) \log p(x,z)}_{\text{energy}} - \underbrace{\sum_z q(z) \log \frac{1}{q(z)}}_{\text{entropy}}$$

### Model specification

- We assume that agents live in a dynamic environment and consider the following generative model for the agent (omitting parameters $\theta$)
$$\begin{align*}
p^\prime(x,s,u) &= p(s_{t-1}) \prod_{k=t}^{t+T} \underbrace{p(x_k|s_k) \cdot p(s_k | s_{k-1}, u_k)}_{\text{internal dynamics}} \cdot\underbrace{p(u_k)}_{\substack{\text{control prior}}}
\end{align*}$$

- In order to infer _goal-driven_ (i.e., purposeful) behavior, we now add prior beliefs $p^+(x)$ about future outcomes, leading to an extended agent model:
$$\begin{align*}
p(x,s,u) &= \frac{p^\prime(x,s,u) p^+(x)}{\int_x p^\prime(x,s,u) p^+(x) \mathrm{d}x} \\
  &\propto p(s_{t-1}) \prod_{k=t}^{t+T} p(x_k|s_k) p(s_k | s_{k-1}, u_k) p(u_k) p^+(x_k)
\end{align*}$$





### FFG for Agent Model

- After selecting an action $a_t$ and making an observation $y_t$, the FFG for the model is given by the following FFG:

<img src="./figures/fig-active-inference-model-specification.png" width="800px">

- The (brown) dashed box is the agent's Markov blanket.  


### Online Active Inference

- Online active inference proceeds by iteratively executing three stages: (1) act-execute-observe, (2) infer, (3) slide forward

<img src="./figures/fig-online-active-inference.png" width="600px">

### Specification of Free Energy 

- Consider the agent's inference task at time step $t$, right after having selected an action $a_t$ and having made an observation $y_t$.

- As usual, we record actions and observations by substituting the values into the generative model(in the Act-Execute-Observe phase):
$$\begin{align*}
p(x,s,u) &\propto  \underbrace{p(x_t=y_t|s_t)}_{\text{observation}} p(s_t|s_{t-1},u_t) p(s_{t-1}) \underbrace{p(u_t=a_t)}_{\text{action}} \\ & \quad \cdot \underbrace{\prod_{k=t+1}^{t+T} p(x_k|s_k) p(s_k | s_{k-1}, u_k) p(u_k) p^+(x_k)}_{\text{future}}
\end{align*}$$


- Note that (future) $x$ is also a latent variable and hence we include $x$ in the recognition model.  

- This leads to the following free energy functional
$$\begin{align*}
F[q] &\propto \sum_{x,s,u} q(x,s,u) \log \frac{q(x,s,u)}{p(x,s,u)} 
\end{align*}$$

### FE Decompositions 

- Lots of interesting FE decompositions are possible again. For instance
$$\begin{align*}
F[q] &= \sum_{u} q(u) \underbrace{\sum_{x,s,u} q(x,s|u)\log \frac{q(x,s|u)}{p(x,s|u)}}_{F_u[q]} + \underbrace{\sum_{u} q(u) \log \frac{q(u)}{p(u)}}_{\text{complexity}}
\end{align*}$$
breaks the FE into a complexity term and a term $F_u[q]$ that is conditioned on the policy $u$. 

- It can be shown (exercise) that the optimal posterior for the policy is now given by
$$
q^*(u) \propto p(u) \exp \left( -F^*_u \right)
$$

- Let's consider a break-up $x=(x_t,x_{>t})$ with $x_{>t} = (x_{t+1},\ldots,x_{t+T})$ that recognizes the distinction between already observed and future data (and similarly $s=(s_t,s_{>t})$). Then (see Schwoebel et al 2019, eq.2.15), 
$$\begin{align*}
F_u[q] &= \sum_{x,s,u} q(x,s|u)\log \left( \frac{q(s_t|u)}{p(x_t,s_t|u)} \cdot \frac{q(x_{>t},s_{>t}|s_t, u)}{p(x_{>t},s_{>t}|s_t,u)} \right)\\
&= \underbrace{\sum_{s_t} q(s_t|u)\log \frac{q(s_t|u)}{p(x_t,s_t|u)}}_{\text{observed free energy }V_u[q]} +  \underbrace{\sum_{x_{>t},s_{>t},s_t} q(x_{>t},s_{>t},s_t|u) \log \frac{q(x_{>t},s_{>t}|s_t, u)}{p(x_{>t},s_{>t}|s_t,u)}}_{\text{predicted free energy }G_u[q]}
\end{align*}$$
where the observed free energy relates to FE as a result of observed value $x_t$ and the predicted free energy depends on priors for future observations $x_{>t}$

- In particular, using $p(x_{>t},s_{>t}|s_t,u) \propto p^\prime(x_{>t},s_{>t}|s_t,u) \cdot p^+(x_{>t})$, we can further break down the free energies into divergence and evidence terms as
$$\begin{align*}
G_u[q] &\propto \underbrace{\sum_{x_{>t},s_{>t},s_t} q(x_{>t},s_{>t},s_t|u) \log \frac{q(x_{>t},s_{>t}|s_t, u)}{p^\prime(x_{>t},s_{>t}|s_t,u)}}_{\text{inference costs (as KL divergence)}}  - \underbrace{\sum_{x_{>t}} q(x_{>t}|u) \log p^+(x_{>t})}_{\substack{\text{goal-directed behavior} \\ \text{(expected log-evidence)}}} \\
V_u[q] &= \underbrace{\sum_{s_t} q(s_t|u)\log \frac{q(s_t|u)}{p(s_t|,x_t,u)}}_{\text{inference costs}} - \underbrace{\log p(x_t)}_{\text{log-evidence}} 
\end{align*}$$

- Thus, minimizing FE $F_u[q]$ leads to a policy $u$ that is driven by a goal-directed term. Inaccuracies in the policy are due to inference costs. 



In [2]:
open("../../styles/aipstyle.html") do f
    display("text/html", read(f,String))
end