# Continuous Observation Spaces

In this notebook we develop the active inference agent for environments with continuous observation spaces. 

We start by modifying the minimal environment to emit continous-valued observations that represent the fraction of repeated experiments yielding food in a given state. Then, we modify the components of the minimal agent that currently exploit the discreteness of the observation space, namely the belief-update after a new observation occurred and the estimation of information gain during action selection.

#### Housekeeping (run once per kernel restart)

In [None]:
# change directory to parent
import os
os.chdir('..')
print(os.getcwd())

# Imports

In [None]:
import importlib

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.markers import CARETUP, CARETDOWN
import pandas as pd
from scipy.stats import beta
import seaborn as sns
import torch

# Continuous Observation Environment

So far, the environment emitted binary observations `NO_FOOD` or `FOOD` with a probability that decayed exponentially with the distance of a state from the food source. To make observations continuous we replace this binary observation with samples from the _Beta_distribution with mean equal to the probability of observing `FOOD` , which represent (roughly) the proportion of positive observations in a finite set of repeated coin flips. The `sample_size` governs the number of repeated experiments with variance decreasing with increasing sample size. In the limit of inifinite observations, the _Beta_ distribution is zero everywhere except at the mean.

## The Beta distribution

Let's explore the density of the _Beta_ distribution with varying sample size.

In [None]:
mean = 0.3
params = [(mean, 3), (mean, 10), (mean, 30), (mean, 200)]
fig, ax = plt.subplots(figsize=(8, 6))
plt.sca(ax)
for mean, sample_size in params:
  
  x = np.linspace(0, 1, 1000)
  p = beta.pdf(x, a=mean * sample_size, b=(1-mean) * sample_size)
  plt.plot(x, p, label=f'mean:{mean}, sample size:{sample_size}')
  
plt.legend()
plt.title('Density of the Beta distribution with varying sample size')
plt.ylabel('density')
plt.xlabel('observation o')

## From discrete to continuous observations
We can now define, sample from, and visualise the distribution of observations generated in each state of the environment, where observations are sampled from the _Beta_ distribution with mean equal to the probability of observing food in a single coin flip experiment as in the `MinimalEnvironment`.

In [None]:
import minimal_environment as me
importlib.reload(me)

sample_size=10
n_samples = 1000

fig, ax = plt.subplots(figsize=(12, 6))
plt.sca(ax)
env = me.MinimalEnv(N=16, # number of states
                    s_food=0, # location of the food source
                    o_decay=0.2) # decay of observing food away from source

def emission_probability(sample_size):
  means = env.emission_probability()[:,1]
  return [beta(a=m*sample_size, b=(1-m)*sample_size) for m in means]

def sample_o(p_o_given_s, s, n_samples):
  return p_o_given_s[s].rvs(size=n_samples)
  
p_o_given_s = emission_probability(sample_size)
samples = []
for s in range(env.s_N):
  oo = sample_o(p_o_given_s, s, n_samples)
  samples.append(oo)
  
df = pd.DataFrame(np.array(samples).T)
sns.violinplot(df, cut=0, width=2)
plt.xlabel('state s')
plt.ylabel('p(o|s)')
plt.title('Continuous environment emission probability')

## Full environment specification

We modify the code of `MinimalEnv` as follows
- `emission_probability(sample_size)` returns a list of beta distributions, one for each state.
- `p_o_given_s` stores a copy of this list, which is computed once at initialization.
- `sample_o` samples from the beta distribution of the current state.

In [None]:
import numpy as np
from scipy.stats import beta

# environment
class ContinuousObservationEnv(object):
  """ Wrap-around 1D state space with single food source.
  
  The probability of sensing food at locations near the food source decays 
  exponentially with increasing distance.
  
  state (int): 1 of N discrete locations in 1D space.
  observation (float): proportion of times food detected in finite sample.
  actions(int): {-1, 1} intention to move left or right.
  """
  def __init__(self, 
               N = 16, # how many discrete locations can the agent reside in
               s_0 = 0, # where does the agent start each episode?
               s_food = 0, # where is the food?
               p_move = 0.75, # execute intent with p, else don't move.
               o_sample_size=10, # observation Beta distribution parameter.
               p_o_max = 0.9, # maximum probability of sensing food
               o_decay = 0.2 # decay rate of observing distant food source
               ):
    
    self.o_decay = o_decay
    self.p_move = p_move
    self.o_sample_size = o_sample_size
    self.p_o_max = p_o_max
    self.s_0 = s_0
    self.s_food = s_food
    self.s_N = N
    self.a_N = 2 # {0, 1} to move left/ right in wrap-around 1D state-space
    """
    environment dynamics are governed by two probability distributions
    1. state transition probability p(s'|s, a)
    2. emission/ observation probability p(o|s)
    although we only need to be able to sample from these distributions to 
    implement the environment, we pre-compute the full conditional probability
    table (1.) and conditional emission random variables (2.) here so agents 
    can access the true dynamics if required.
    """
    self.p_s1_given_s_a = self.transition_dynamics() # Matrix B
    self.p_o_given_s = self.emission_probability() # Matrix A
    self.s_t = None # state at current timestep


  def transition_dynamics(self):
    """ computes transition probability p(s'| s, a) 
    
    Returns:
    p[s, a, s1] of size (s_N, a_N, s_N)
    """

    p = np.zeros((self.s_N, self.a_N, self.s_N))
    p[:,0,:] = self.p_move * np.roll(np.identity(self.s_N), -1, axis=1) \
              + (1-self.p_move) * np.identity(self.s_N)
    p[:,1,:] = self.p_move * np.roll(np.identity(self.s_N), 1, axis=1) \
              + (1-self.p_move) * np.identity(self.s_N)
    return p

  def emission_probability(self):
    """ initialises conditional random variables p(o|s). 
    
    Returns:
    p[s] of size (s_N) with one scipy.stats.rv_continuous per state
    """
    s = np.arange(self.s_N)
    # distance from food source
    d = np.minimum(np.abs(s - self.s_food), 
                   np.minimum(
                   np.abs(s - self.s_N - self.s_food), 
                   np.abs(s + self.s_N - self.s_food)))
  
    # exponentially decaying concentration ~ probability of detection
    mean = self.p_o_max * np.exp(-self.o_decay * d)
    # continuous relaxation: proportion of food detected in finite sample
    sample_size = self.o_sample_size
    return [beta(a=m*sample_size, b=(1-m)*sample_size) for m in mean]

  def reset(self):
    self.s_t = self.s_0
    return self.sample_o()

  def step(self, a):
    if (self.s_t is None):
      print("Warning: reset environment before first action.")
      self.reset()

    if (a not in [0, 1]):
      print("Warning: only permitted actions are [0, 1].")

    # convert action index to action
    a = [-1,1][a]

    if np.random.random() < self.p_move:
      self.s_t = (self.s_t + a) % self.s_N
    return self.sample_o()

  def sample_o(self):
    return self.p_o_given_s[self.s_t].rvs()

## Random Agent Behavior

To test the environment we simulate a random agent's interactions with it. Here, the random agent samples actions uniformly in the interval `[-2, 2]`.

In [None]:
import continuous_observation_environment as coe
importlib.reload(coe)

env = coe.ContinuousObservationEnv(N=16, # number of states
                    s_food=0, # location of the food source
                    o_sample_size=100) # variance of observation decreases with increasing sample size.

n_steps = 100
ss, oo, aa = [], [], []

o = env.reset()
ss.append(env.s_t)
oo.append(o)

for i in range(n_steps):
  a = np.random.choice([0,1]) # random agent
  o = env.step(a)
  ss.append(env.s_t)
  oo.append(o)
  aa.append(a)

We inspect the sequence of states, actions and emissions during this interaction.

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(16, 12))
ax[0].plot(ss, label='agent state $s_t$')
ax[0].plot(np.ones_like(ss) * env.s_food, 
           'r--', label='food source', linewidth=1)
for i in range(len(aa)):
  ax[0].plot([i, i], [ss[i], ss[i]+[-1,1][aa[i]]], 
             color='orange', 
             linewidth=0.5,
             marker= CARETUP if aa[i] > 0 else CARETDOWN,
             label=None if i > 0 else 'action')
  
ax[0].set_xlabel('timestep t')
ax[0].set_ylabel('state s')
ax[0].legend()
ax[1].plot(np.array(oo))
ax[1].set_xlabel('timestep t')
ax[1].set_ylabel('observation s')