# Causal Entropic Forces

_Causal Entropic Forces_ [[Wissner-Gross & Freer, 2013a]](http://math.mit.edu/~freer/papers/PhysRevLett_110-168702.pdf) is a 2013 paper by [Alexander Wissner-Gross](https://www.alexwg.org/) and [Cameron Freer](https://www.cfreer.org/). The paper describes an agent that takes actions to maximize the diversity of future paths in an environment. The authors argue that such behavior mathematically formalizes the word "intelligence"—and justify their argument with computer simulations showing that intelligent behaviors of tool use and multi-agent cooperation emerge from maximizing causal entropy.

In this writeup, I use Python to reimplement the _Causal Entropic Forces_ agent in the simplest environment described in the paper: the particle-in-a-box. Under causal entropic forcing, a particle moves to the center of the box over time. My implementation differs from the original in three respects:
1. Use of a prefactor
2. Path subsampling (suggested by Google Gemini)
3. A different kernel and bandwidth used in kernel density estimation

The following works helped me better understand the original paper:

1. Causal Entropic Forces [[Wissner-Gross & Freer, 2013a]](http://math.mit.edu/~freer/papers/PhysRevLett_110-168702.pdf).
2. Supplementary Material to Causal Entropic Forces [[Wissner-Gross & Freer, 2013b]](https://journals.aps.org/prl/supplemental/10.1103/PhysRevLett.110.168702).
3. Comment: Causal entropic forces [[Kappen, 2013]](https://arxiv.org/abs/1312.4185).
4. Causal Entropic Forces: Intelligent Behaviour, Dynamics and Pattern Formation [[Hornischer, 2015]](https://pure.mpg.de/rest/items/item_2300851/component/file_2300850/content).
5. Fractal AI: A fragile theory of intelligence [[Cerezo & Ballester, 2018]](https://arxiv.org/abs/1803.05049).

I have used a different mathematical notation to that in _Causal Entropic Forces_, inspired in part by some of the above works.

I first import Python libraries

In [1]:
import numpy as np
import scipy

from matplotlib import pyplot as plt
from matplotlib import patches as patches
from matplotlib import cm

import copy

set a seed and datatype

In [2]:
np.random.seed(0)

dtype = np.float32

and specify hyperparameters as described

In [3]:
EPSILON = .025                              # timesteps (seconds)
M       = 1e-21                             # mass (kg)
T_R     = 4e5                               # temperature of random agent (Kelvins)
T_C     = 2e6                               # temperature of causal agent (Kelvins)
K_B     = scipy.constants.Boltzmann         # Boltzmann constant (J/K, or m^2 kg / s^2 K)
TAU     = 10.                               # simulation time horizon

L       = 400                               # length (meters) 
Q_MIN   = np.array([0., 0.  ], dtype=dtype) # minimum box displacement
Q_MAX   = np.array([L , L/5.], dtype=dtype) # maximum box displacement

In [4]:
NUM_SUBSAMPLES = 3

In [5]:
timesteps = int(TAU / EPSILON)

## Mathematical Background

_Causal Entropic Forces_ uses thermodynamics to describe an agent-environment system.
At time $t$ the agent-environment is described by a phase space $\mathbf{x}_t = \{ \mathbf{q}_t, \mathbf{p}_t \}$, where $\mathbf{q}$ is a position vector and $\mathbf{p}$ is a momentum vector.

If the agent does not act under a causal entropic force, Wissner-Gross & Freer [[2013b](https://journals.aps.org/prl/supplemental/10.1103/PhysRevLett.110.168702), [2013a, p. 2, col. 2](http://math.mit.edu/~freer/papers/PhysRevLett_110-168702.pdf)] state that the agent-environment phase space evolves following the Langevin dynamics of
$$
\begin{align*}
    \mathbf{p}_t
    &=
    \mathbf{f} \left( \left\lfloor \frac{t}{\epsilon} \right\rfloor \epsilon \right)
    + \mathbf{h} \left( \mathbf{x}_t \right)
\\
    \dot{\mathbf{q}}_t
    &=
    \frac{\mathbf{p}_t}{m}
\end{align*}
$$

where
* $\epsilon$ is the time between two timesteps abstracting the time-evolution into discrete chunks
* $h$ handles forces imposed on the agent by the system (the perfectly elastic collisions of the particle against a wall)
* $f \sim \mathcal{N} \left(\mu = 0, \sigma = \frac{\sqrt{m_j \, k_B \, T_r}}{\epsilon} \right)$ [https://math.stackexchange.com/a/1426406] sampled for every timestep and vector element, calculated with the Boltzmann constant $k_B$ and the environment temperature $T_r$.
* $m$ is the mass of the particle

The following five cells specify the Langevin dynamics of a random agent

In [6]:
def generate_fs(num_samples):
    return np.random.normal( loc   = 0. ,
                             scale = np.sqrt(M * K_B * T_R) / EPSILON ,
                             size  = (num_samples, 2) ).astype(dtype)

In [7]:
def update_momentum(p, force):
    p     = force * EPSILON
    p_max = M * np.abs(Q_MAX - Q_MIN) / EPSILON
    return np.sign(p) * np.minimum(p_max, np.abs(p))

In [8]:
def update_position(q, p):
    return q  +  EPSILON * p / M

In [9]:
def elastic_collisions(p, q):
    q  = np.maximum(q, 2 * Q_MIN - q)
    p *= np.sign(q - Q_MIN)
    q  = np.minimum(q, 2 * Q_MAX - q)
    p *= np.sign(Q_MAX - q)
    return p, q

In [10]:
def update_phase_space(p, q, f):
    p    = update_momentum(p, f)
    q    = update_position(q, p)
    p, q = elastic_collisions(p, q)
    return p, q

The Langevin dynamics are calculated for $\tau / \epsilon$ timesteps, where $\tau$ is the time-horizon of the simulation [[Wissner-Gross & Freer 2013b, p. 2]](https://journals.aps.org/prl/supplemental/10.1103/PhysRevLett.110.168702). A rollout is a simulation of Langevin dyanmics for a finite number of timesteps. The following function calculates rollouts of Langevin dynamics in parallel.

In [11]:
def rollouts(p, q, num_samples):
    """
    paths.shape = (timesteps, num_samples, q_dim)
    """
    ps = np.tile(p[None, :], (num_samples, 1))
    qs = np.tile(q[None, :], (num_samples, 1))

    paths = copy.deepcopy(qs[None, :])

    fs       = generate_fs(num_samples)
    first_fs = copy.deepcopy(fs)
    
    for i in range(timesteps):
        ps, qs = update_phase_space(ps, qs, fs)
        paths  = np.append(paths, qs[None, :], axis=0)
        fs     = generate_fs(num_samples)

    return paths, first_fs

Visualize 1000 rollouts starting from the center of the box in 2D:

In [12]:
# num_samples = 1000

# p = np.array([0.   , 0.   ])
# # q = np.array([L/10., L/10.])
# q = np.array([L/2., L/10.])

# paths, _ = rollouts(p, q, num_samples)

In [13]:
# # Plot the random walks
# fig, ax = plt.subplots(figsize=(13, 13))
# ax.plot(*paths.swapaxes(0, 1).T, alpha=0.5)
# ax.set_xlim(Q_MIN[0], Q_MAX[0])
# ax.set_ylim(Q_MIN[1], Q_MAX[1])
# ax.set(aspect='equal')
# plt.show()

and in 3D with time as the vertical axis

In [14]:
# # Adapted from https://jakevdp.github.io/PythonDataScienceHandbook/04.12-three-dimensional-plotting.html
# ax = plt.axes(projection='3d')
# for ns in range(num_samples):
#     ax.plot3D( paths.swapaxes(0, 1).T[0, :, ns] ,
#                paths.swapaxes(0, 1).T[1, :, ns] ,
#                np.arange(1 + TAU / EPSILON) * EPSILON ,
#                alpha=.5 )
# ax.view_init(20, -70)

## Causal Entropic Forcing

Causal entropic forcing changes an agent's momentum by
$$
\begin{align*}
    \mathbf{p}_t
    &=
    \mathbf{h} \left( \mathbf{x}_t \right)
    +
    \mathbf{f}^{(c)}(\mathbf{x}_t, \tau)
\end{align*}
$$

Here
$$
\begin{align*}
    \mathbf{f}^{(c)}(\mathbf{x}_t, \tau) = T_c \, \left. \nabla_{\mathbf{x}} \, s(\mathbf{x}, \tau) \right|_{\mathbf{x}_t}
\end{align*}
$$
is the causal entropic force on $\mathbf{x}_t$ with a time-horizon of $\tau$

The causal entropic force is calculated from
* $T_c$: the temperature of the environment when causal entropic forcing is performed
* $s(\mathbf{x}_t, \tau)$: the causal entropy on $\mathbf{x}_t$ with time-horizon $\tau$ 

Causal entropy is defined by
$$
\begin{align*}
    s(\mathbf{x}_t, \tau) =
    - k_B \,
    \mathbb{E}_{p(\mathbf{x}_{t:t+\tau} \mid \mathbf{x}_t)}
    \left[
        \ln p(\mathbf{x}_{t:t+\tau} \mid \mathbf{x}_t)
    \right]
\end{align*}
$$
where $\mathbf{x}_{t:t+\tau}$ is a path in phase space from time $t$ to $t+\tau$.

### Calculating The Causal Entropic Force

[Wissner-Gross & Freer [2013a, pp. 2–3]](http://math.mit.edu/~freer/papers/PhysRevLett_110-168702.pdf) and [Hornischer [2015, §3.1]](https://pure.mpg.de/rest/items/item_2300851/component/file_2300850/content) explain how to calculate $\mathbf{f}^{(c)}_j(\mathbf{x}_t, \tau)$.

As
$$
\begin{align*}
    \mathbf{f}^{(c)}(\mathbf{x}_t, \tau)
&=
    T_c \, \left. \nabla_{\mathbf{x}} \, s(\mathbf{x}, \tau) \right|_{\mathbf{x}_t}
\end{align*}
$$
there are two components: the derivatives wrt $q$
$$
\begin{align*}
    T_c \, \left. \frac{\partial}{\partial \, q_j} \, s(\mathbf{x}, \tau) \right|_{q_t}
\end{align*}
$$
and the derivatives wrt $p$
$$
\begin{align*}
    T_c \, \left. \frac{\partial}{\partial \, p_j} \, s(\mathbf{x}, \tau) \right|_{p_t}
\end{align*}
$$

$$
\begin{align*}
    f^{(c)}_j(\mathbf{x}_t, \tau)
    &= T_c \, \left. \frac{\partial}{\partial q_j} \, s(\{ \mathbf{q}, \mathbf{p}_t \}, \tau) \right|_{\mathbf{q}_t}
\\
    &= - T_c \, k_B \, \left. \frac{\partial}{\partial q_j} \,
    \mathbb{E}_{\mathbf{x}_{t+\tau} \sim p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})}
    \left[
        \ln p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})
    \right]\right|_{\mathbf{q}_t}
\\
    &= - T_c \, k_B \, \left. \frac{\partial}{\partial q_j}
    \int
        p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \}) \ln p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})
        \,\mathrm{d} \mathbf{x}_{t+\tau}
    \right|_{\mathbf{q}_t}
\\
    &= - T_c \, k_B \, \left.
    \int
        \frac{\partial}{\partial q_j} \,
        p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \}) \ln p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})
        \,\mathrm{d} \mathbf{x}_{t+\tau}
    \right|_{\mathbf{q}_t}
\\
    &= - T_c \, k_B \, \left.
    \int
        \frac{\partial \, p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})}{\partial q_j}
        \ln p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
        + p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \}) \frac{\partial \ln p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})}{\partial q_j}
        \,\mathrm{d} \mathbf{x}_{t+\tau}
    \right|_{\mathbf{q}_t}
\\
    &= - T_c \, k_B \, \left.
    \int
        \frac{\partial \, p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})}{\partial q_j}
        \ln p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
        + \frac{\partial}{\partial q_j} p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})
        \,\mathrm{d} \mathbf{x}_{t+\tau}
    \right|_{\mathbf{q}_t}
\\
    &= - T_c \, k_B
    \int
        \left.
            \frac{\partial}{\partial q_j} p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})
        \right|_{\mathbf{q}_t}
        \ln p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
    \,\mathrm{d} \mathbf{x}_{t+\tau}
    + \left. \int
        \frac{\partial}{\partial q_j} p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})
    \,\mathrm{d} \mathbf{x}_{t+\tau}
    \right|_{\mathbf{q}_t}
\\
    &= - T_c \, k_B
    \int
        \left.
            \frac{\partial}{\partial q_j} p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})
        \right|_{\mathbf{q}_t}
        \ln p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
    \,\mathrm{d} \mathbf{x}_{t+\tau}
    + \frac{\partial}{\partial q_j} 1
\\
    &= - T_c \, k_B
    \int
        \left.
            \frac{\partial}{\partial q_j} p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})
        \right|_{\mathbf{q}_t}
        \ln p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
    \,\mathrm{d} \mathbf{x}_{t+\tau}
\end{align*}
$$

As the system dynamics are Markovian and discrete-time
$$
\begin{align*}
    p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t) = 
    p(\mathbf{x}_{t+\epsilon} \mid \mathbf{x}_t)
    \prod_{n=1}^{\tau/\epsilon-1}
        p(\mathbf{x}_{t + (n+1)\epsilon} \mid \mathbf{x}_{t + n\epsilon})
\end{align*}
$$

$$
\begin{align*}
    \left.
        \frac{\partial}{\partial q_j} p(\mathbf{x}_{t+\tau} \mid \{ \mathbf{q}, \mathbf{p}_t \})
    \right|_{\mathbf{q}_j}
&=
    \left.
        \frac{\partial}{\partial q_j} \,
        p(\mathbf{x}_{t+\epsilon} \mid \{ \mathbf{q}, \mathbf{p}_t \})
        \prod_{n=1}^{\tau/\epsilon-1}
            p(\mathbf{x}_{t + (n+1)\epsilon} \mid p(\mathbf{x}_{t + n\epsilon})
    \right|_{\mathbf{q}_j}
\\
&=
    \left.
        \frac{\partial}{\partial q_j} \,
        p(\mathbf{x}_{t+\epsilon} \mid \{ \mathbf{q}, \mathbf{p}_t \})
    \right|_{\mathbf{q}_j}
    \prod_{n=1}^{\tau/\epsilon-1}
        p(\mathbf{x}_{t + (n+1)\epsilon} \mid p(\mathbf{x}_{t + n\epsilon})
\end{align*}
$$

Using the dependence of $\mathbf{p}$ on $\mathbf{q}$,
$$
\begin{align*}
    p(\mathbf{x}_{t+\epsilon} \mid \mathbf{x}_t)
&=
    p( \{ \mathbf{q}_{t+\epsilon}, \mathbf{p}_{t+\epsilon} \} \mid \{ \mathbf{q}_t, \mathbf{p}_t \} )
\\
&=
    p( \mathbf{q}_{t+\epsilon} \mid \{ \mathbf{q}_t, \mathbf{p}_t \} )
\\
&=
    p( \mathbf{q}_{t+\epsilon} \mid \mathbf{q}_t )
\end{align*}
$$

As $f_{j, t}$ is a normal random variable, $q_{j, t+\epsilon} \mid q_{j, t}$ is also a normal random variable determined by
$$
\begin{align*}
    p(q_{j, t+\epsilon} \mid q_{j, t})
&=
    \mathcal{N} \left( \mu= q_{j, t} + h_j(\mathbf{x}_t) \frac{\epsilon^2}{m_{j}}, \sigma^2= k_B \, T_r \, \frac{\epsilon^2}{m_j} \right)
\end{align*}
$$

thanks to a reparmeterization of $f$ and Langevin dynamics (**differs by a factor of 2!**)
$$
\begin{align*}
    q_{j, t+\epsilon}
    &= q_{j, t} + \dot{q}_{j, t} \, \epsilon
\\
    &= q_{j, t} + \frac{p_{j, t}}{m_{j}} \epsilon
\\
    &= q_{j, t} + \frac{p_{j, t} + \dot{p}_{j, t} \, \epsilon}{m_{j}} \epsilon
\\
    &= q_{j, t} + \frac{p_{j, t}}{m_{j}} \epsilon + \frac{
    - \frac{1}{\epsilon} \, p_{j, t} + f_{j, t} + h_j(\mathbf{x}_t)}{m_{j}} \epsilon^2
\\
    % &= q_{j, t} + \frac{p_{j, t}}{m_{j}} \epsilon + \frac{f_{j, t} + h_j(\mathbf{x}_t)}{m_{j}} \epsilon^2
    % &= q_{j, t} + \left( f_{j, t} + h_j(\mathbf{x}_t) \right) \frac{\epsilon^2}{m_{j}}
    &= q_{j, t} + \frac{f_{j, t} + h_j(\mathbf{x}_t)}{m_{j}} \epsilon^2
\\
    &= q_{j, t} + f_{j, t} \frac{\epsilon^2}{m_{j}} + h_j(\mathbf{x}_t) \frac{\epsilon^2}{m_{j}}
\end{align*}
$$

For an arbitrary normal distribution
$$
\begin{align*}
    \frac{\partial}{\partial x} p(x)
    &=
    \frac{\partial}{\partial x} \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(- \frac{(x - \mu)^2}{2 \sigma^2} \right)
\\
    &=
    - \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(- \frac{(x - \mu)^2}{2 \sigma^2} \right)
    \frac{\partial}{\partial x} \frac{(x - \mu)^2}{2 \sigma^2}
\\
    &=
    - \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(- \frac{(x - \mu)^2}{2 \sigma^2} \right)
    \frac{x - \mu}{\sigma^2}
\\
    &=
    - p(x) \frac{x - \mu}{\sigma^2}
\end{align*}
$$

so, after substitution $\mu$ and $\sigma$,
$$
\begin{align*}
    -
    \left.
        \frac{\partial}{\partial \, q} \, p( q \mid q_{j, t} )
    \right|_{q_{j, t+\epsilon}}
&=
    - p( q_{j, t+\epsilon} \mid q_{j, t} ) \frac{f_{j,t}}{k_B \, T_r}
\end{align*}
$$

therefore
$$
\begin{align*}
    f^{(c)}_j(\mathbf{x}_t, \tau)
&= - \frac{T_c}{T_r}
    \int
        f_{j,t} \,
        p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
        \ln p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
    \,\mathrm{d} \mathbf{x}_{t+\tau}
\end{align*}
$$

### Approximating The Entropic Force

[Wissner-Gross & Freer [2013b, p. 10]](https://journals.aps.org/prl/supplemental/10.1103/PhysRevLett.110.168702) and [Hornischer, [2015, Section 3.2]](https://pure.mpg.de/rest/items/item_2300851/component/file_2300850/content) present a method for approximating the computation of the integral in the above expression.

Consider $M$ Monte-Carlo samples of $\mathbf{x}_{t+\tau}^{(i)}$s for $i$ from $1$ to $M$.
For convenience, denote the following conditional likelihood as 
$$
\begin{align*}
    p(\mathbf{x}_{t+\tau}^{(i)} \mid \mathbf{x}_t)
    =
    \frac{1}{M \, \Omega_i}
\end{align*}
$$

The $M$-sample Monte-Carlo estimate of the causal entropic force is
$$
\begin{align*}
    f^{(c)}_j(\mathbf{x}_t, \tau)
    &= - \frac{T_c}{T_r}
    \int
        f_{j,t} \,
        p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
        \ln p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
    \,\mathrm{d} \mathbf{x}_{t+\tau}
\\
    &= - \frac{T_c}{T_r} \,
    \mathbb{E}_{\mathbf{x}_{t+\tau} \sim p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)} [
        f_{j,t} \,
        \ln p(\mathbf{x}_{t+\tau} \mid \mathbf{x}_t)
    ]
\\
    &\approx - \frac{T_c}{T_r}
    \frac{1}{M}
    \sum_{i=1}^M
        f_{j,t}^{(i)} \,
        \ln p(\mathbf{x}_{t+\tau}^{(i)} \mid \mathbf{x}_t)
\\
    &= - \frac{T_c}{T_r}
    \frac{1}{M}
    \sum_{i=1}^M
        f_{j,t}^{(i)} \,
        \ln \frac{1}{M \, \Omega_i}
\\
    &= \frac{T_c}{T_r}
    \frac{1}{M}
    \left(
    \sum_{i=1}^M
        f_{j,t}^{(i)} \,
        \ln M
    + \sum_{i=1}^M
        f_{j,t}^{(i)} \,
        \ln \Omega_i
    \right)
\\
    &\approx \frac{T_c}{T_r}
    \frac{1}{M}
    \sum_{i=1}^M
        f_{j,t}^{(i)} \,
        \ln \Omega_i
\end{align*}
$$

The first term in the penultimate line disappears because
$$
\begin{align*}
    \frac{1}{M}
    \sum_{i=1}^M
        f_{j,t}^{(i)}
&\approx
    0
\end{align*}
$$

Given
$$
\begin{align*}
    \left(
        \frac{1}{M}
        \sum_{i=1}^M
            f_{j,t}^{(i)}
    \right)
    \ln \sum_{i=1}^M
        \Omega_i
\approx
    0
\end{align*}
$$

$$
\begin{align*}
    \frac{T_c}{T_r}
    \frac{1}{M}
    \sum_{i=1}^M
        f_{j,t}^{(i)} \,
        \ln \Omega_i
&\approx
    \frac{T_c}{T_r}
    \frac{1}{M}
    \sum_{i=1}^M
        f_{j,t}^{(i)} \,
        \ln \Omega_i
    -
    \frac{T_c}{T_r}
    \left(
        \frac{1}{M}
        \sum_{i=1}^M
            f_{j,t}^{(i)}
    \right)
    \ln \sum_{i=1}^M
        \Omega_i
\\
&=
    \frac{T_c}{T_r}
    \frac{1}{M}
    \sum_{i=1}^M
        f_{j,t}^{(i)}
        \left(
            \ln \Omega_i
            - \ln \sum_{i=1}^M
                \Omega_i
        \right)
\\
&=
    \frac{T_c}{T_r}
    \frac{1}{M}
    \sum_{i=1}^M
        f_{j,t}^{(i)}
        \ln \frac{\Omega_i}{\sum_{i=1}^M \Omega_i}
\end{align*}
$$

and therefore
$$
\begin{align*}
    f^{(c)}_j(\mathbf{x}_t, \tau)
&\approx
    \frac{T_c}{T_r}
    \frac{1}{M}
    \sum_{i=1}^M
        f_{j,t}^{(i)}
        \ln \frac{\Omega_i}{\sum_{i=1}^M \Omega_i}
\end{align*}
$$

[Wissner-Gross & Freer [2013b, p. 10]](https://journals.aps.org/prl/supplemental/10.1103/PhysRevLett.110.168702) present an alternate derivation using an arithmetic average of the above quantity, represented by the angle brackets around the entire expression.

Given
$$
\begin{align*}
    \Omega_i
    =
    {\frac{1}{M \, p(\mathbf{x}_{t+\tau}^{(i)} \mid \mathbf{x}_t)}}
\end{align*}
$$

$$
\begin{align*}
    \sum_{i=1}^M
        f_{j,t}^{(i)}
        \ln \frac{\Omega_i}{\sum_{i=1}^M \Omega_i}
&=
    \sum_{i=1}^M
        f_{j,t}^{(i)}
        \ln \frac
            {\frac{1}{M \, p(\mathbf{x}_{t+\tau}^{(i)} \mid \mathbf{x}_t)}}
            {\sum_{k=1}^M \frac{1}{M \, p(\mathbf{x}_{t+\tau}^{(k)} \mid \mathbf{x}_t)}}
\\&=
    \sum_{i=1}^M
        f_{j,t}^{(i)}
        \ln \frac
            {\frac{1}{p(\mathbf{x}_{t+\tau}^{(i)} \mid \mathbf{x}_t)}}
            {\sum_{k=1}^M \frac{1}{p(\mathbf{x}_{t+\tau}^{(k)} \mid \mathbf{x}_t)}}
\end{align*}
$$

In practice, $p(\mathbf{x}_{t+\tau}^{(k)} \mid \mathbf{x}_t)$ is approximated by kernel density estimation.

In [15]:
def log_volume_fracs(paths):
    volume       = 1. / scipy.stats.gaussian_kde(paths).pdf(paths)
    volume_fracs = volume / volume.sum() 
    return np.log(volume_fracs)

In [16]:
def entropic_force(p, q, num_samples):
    paths, fs = rollouts(p, q, num_samples)
    
    # remove the first state because we calculate conditional density
    paths = paths[1:]

    # subsample paths: idea and code from Google Gemini to speed up KDE
    sub_indices = np.linspace( 0 ,
                               timesteps - 1 ,
                               NUM_SUBSAMPLES ,
                               dtype=int ) 
    sub_paths   = paths[sub_indices, :, :]
    sub_paths   = sub_paths.transpose((2, 0, 1)).reshape(-1, num_samples)
    
    # calculate entropic force
    return np.mean( fs * log_volume_fracs(sub_paths)[:, None] , axis=0 )

now we run the causal entropic forces agent

In [None]:
p = np.array([0.   , 0.   ])
q = np.array([L/10., L/10.])

path = copy.deepcopy(q[None, :])

for timestep in range(200):
    print(timestep, q)
    num_samples = 50_000
    f_c  = 2 * T_C / T_R * entropic_force(p, q, num_samples)
    p, q = update_phase_space(p, q, f_c)
    path = np.append(path, q[None, :], axis=0)

0 [40. 40.]
1 [40.53854324 38.98063433]
2 [40.63174919 38.14109986]
3 [42.74087873 39.46971615]
4 [42.2120383  39.05657861]
5 [42.0749558  40.29409079]
6 [40.87258483 40.25702743]
7 [39.82521577 40.3922173 ]
8 [41.62625358 41.46699154]
9 [41.66222239 40.96857333]
10 [42.15771712 40.25155556]
11 [42.64673208 40.69738883]
12 [42.94242484 41.83499598]
13 [43.74577589 40.48147532]
14 [42.93671519 40.69812054]
15 [43.29059407 43.90268295]
16 [41.87503704 44.37111152]
17 [41.50343445 44.73426603]
18 [41.8313388  45.44414292]
19 [41.38712943 45.79934333]
20 [42.74510086 44.87525314]
21 [43.83538321 43.87029439]
22 [45.29638232 44.42943518]
23 [46.19765862 46.38287908]
24 [46.88931234 46.1710318 ]
25 [48.35955454 44.04930575]
26 [47.97545723 44.09194502]
27 [47.23850536 44.56015364]
28 [48.80075632 43.45598301]
29 [49.05059979 42.19691313]
30 [48.88712027 40.2810376 ]
31 [48.31486101 41.06879038]
32 [48.86645505 42.03304907]
33 [49.82950567 42.11020655]
34 [50.84933699 44.21376884]
35 [51.8405

In [None]:
fig, ax = plt.subplots(figsize=(13, 13))
# from https://stackoverflow.com/a/37437395
# ax.add_patch(patches.Rectangle(Q_MIN, L, L/5., edgecolor='k', fill=False))
ax.plot(*path.T)
ax.set_xlim(Q_MIN[0], Q_MAX[0])
ax.set_ylim(Q_MIN[1], Q_MAX[1])
ax.set(aspect='equal')
plt.show()