<a href="https://colab.research.google.com/github/CoAxLab/BiologicallyIntelligentExploration/blob/main/Labs/Lab6_Concepts_and_information.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 6 - The concept of information

Here we will explore how our little artificial organisms do two things:

- Learn a *concept* of information.
- How this concept of information facilitates learning in noisy environments.

## Background 

In this lab we return to _taxic explorations_. We revisit the sniff world (aka _ScentGrid_) now with a new twist. We look at what happens when sense information is not just noisy, but partially observed. In otherwords, when there is distortion in the channel of information.

Our environment this times just deletes scent information from the grid, with a probability $(1- p_{scent})$. The noisy background is of course unaffected by this deletion.

The presence of this dual uncertainty makes decisions--of the kind common to decision theory--a necessity. So, we'll be using accumulator models again. 

The decisions to be made are: 

- Q1: Is there a scent at all?
- Q2: Is the gradient of the scent increasing or decreasing? 


## E Coli, again?
Recall our basic model of E. Coli exploration is as simple as can be. 

- When the gradient is positive, meaning you are going "up" the gradient, the probability of turning is set to _p pos_. 
- When the gradient is negative, the turning probability is set to _p neg_. (See code below, for an example). 
- If the agent "decides" to turn, the direction it takes is uniform random.
- The length of travel before the next turn decision is sampled from an exponential distribution just like the _DiffusionDiscrete_

### Information agents
We will study three agents. One who does _chemotaxis_. One who does a kind of _infotaxis_. One that does random search (aka Diffusion). For fun, let's call this one a _randotaxis_ agent. This last rando-agent is really a control. A reference point.

In a sense the _chemotaxis_ agent only tries to answer question Q2 (above). While _infotaxis_ only tries to answer Q1. They are extreme strategies, in other words. The bigger question we will ask, in a very limited setting, is which extreme method is better _generally_? 


### Costly cognition
Both _chemo-_ and _infotaxis_ agents will use a DDM-style accumulator to try and make better decisions about the direction of the gradient. These decisions are of course statistical in nature. (We won't be tuning the accumulator parameters in this lab. Assume the parameters I give you, for the DDM, are "good enough".)

For the _randotaxis_ agent number of steps means the number of steps or actions the agent takes. 


## A definition of _chemotaxis_:
Our _chemotaxis_ agent (_AccumulatorGradientGrid_) tries to directly estimate the gradient $\nabla$ in scent by comparing the level of scent at the last grid position it occupied to the current scent level ($o$). By last grid position here we mean the last grid position when it moved last. 

$$\nabla \approx o_t - o_{t-1}$$

Because an accumulator is present, our chemo- sequentially tries to estimate this gradient by sampling the new current location, until the threshold is met.

 Chemo-accumulators have what we can think of as two cognitive or behavioral steps:

1. Use an accumulator to (stabely) estimate the chemo gradient
2. Use the gradient to make turning decisions

## A definition of _infotaxis_:
Compared to chemo- definition the definition of infotaxis is a little more involved. It has what we can think of as five cognitive or behavioral steps:

1. Use an accumulator to (stabely) estimate if there is a scent or not. AKA hits and misses.
2. Build a probability model of hits/misses (at every point)
3. Measure information gained when probability model changes 
4. Measure the gradient of information gains
5. Use the gradient to make turning decisions

_Note_: Even though the info-accumulator is more complex, it can take advantage of missing scent information to drive its behavior. It can also use positive scent hits, of course, too.

If you want to look at exactly how this agent works, check out line 1242 in the _agents.py_ file: /usr/local/lib/python3.7/dist-packages/explorationlib/agent.py

## Section - Setup

First let's set things up for the two parts of the lab. You've done this before, so we don't need to specify each installation and module step.

In [None]:
!pip install --upgrade git+https://github.com/coaxlab/explorationlib
!pip install --upgrade git+https://github.com/MattChanTK/gym-maze.git

In [None]:
import shutil
import glob
import os

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from copy import deepcopy

import explorationlib
from explorationlib.local_gym import ScentGrid

from explorationlib.agent import DiffusionGrid
from explorationlib.agent import AccumulatorGradientGrid
from explorationlib.agent import AccumulatorInfoGrid

from explorationlib.run import experiment
from explorationlib.util import select_exp
from explorationlib.util import load
from explorationlib.util import save

from explorationlib.local_gym import uniform_targets
from explorationlib.local_gym import constant_values
from explorationlib.local_gym import ScentGrid
from explorationlib.local_gym import create_grid_scent
from explorationlib.local_gym import add_noise
from explorationlib.local_gym import create_grid_scent_patches

from explorationlib.plot import plot_position2d
from explorationlib.plot import plot_length_hist
from explorationlib.plot import plot_length
from explorationlib.plot import plot_targets2d
from explorationlib.plot import plot_scent_grid
from explorationlib.plot import plot_targets2d

from explorationlib.score import total_reward
from explorationlib.score import num_death

In [None]:
# Pretty plots
%matplotlib inline
%config InlineBackend.figure_format='retina'
%config IPCompleter.greedy=True
plt.rcParams["axes.facecolor"] = "white"
plt.rcParams["figure.facecolor"] = "white"
plt.rcParams["font.size"] = "16"

# Dev
%load_ext autoreload
%autoreload 2

## Section 1 - Simulating noisy \& missing scents


To build some intuition, let's plot the "scent" emitted by a single target. That same scent corrupts by 1/2 a standard deviation of noise. That same signal, with all but 10 percent of it deleted. That same signal corrupted by noise _and_ all but 10 percent of it deleted.

### Full Scent

Okay, let's first visualize what the scent diffusion around each target looks like in the environment using the diffusion parameters we have set up.

In [None]:
target_boundary = (10, 10)

In [None]:
amplitude = 1

coord, scent = create_grid_scent_patches(
        target_boundary, p=1.0, amplitude=amplitude, sigma=2)
        
plt.imshow(scent, interpolation=None)

### Noisy Scent

To corrupt the signal we can simply add more Gaussian noise. In this case we will use the *add_noise* function with a $\sigma=1$. 

In [None]:
amplitude = 1
noise_sigma = 1.0

coord, scent = create_grid_scent_patches(target_boundary, p=1.0, amplitude=amplitude, sigma=2)
scent = add_noise(scent, noise_sigma)

plt.imshow(scent, interpolation=None)

Doesn't look resolvable does it? If you squint, maybe you can see it?

In order to confirm that there is signal there, let's take a look at the average over 100 noisy targets.

In [None]:
amplitude = 1
noise_sigma = 1.0
num_samples = 100

scents = []
for _ in range(num_samples):
    coord, scent = create_grid_scent_patches(target_boundary, p=1.0, amplitude=1, sigma=2)
    scent = add_noise(scent, noise_sigma)
    scents.append(deepcopy(scent))

scent = np.sum(scents, axis=0)

plt.imshow(scent, interpolation=None)

### Missing Scent

We can further distort or corrup the signal by making some of the information simply missing. Imaging we're in our little agents acquatic environment and currents move some of the sent signal away. 

Here we can control the probability of a scent molecule being detected at any point in space with the *p_scent* parameter.

In [None]:
amplitude = 1000
p_scent = 0.1

coord, scent = create_grid_scent_patches(target_boundary, p=p_scent, amplitude=amplitude, sigma=2)

plt.imshow(scent, interpolation=None)

Again, let's average across 100 targets to see what the modal resolvable scent would look like over samples.

In [None]:
amplitude = 1
p_scent = 0.1
num_samples = 100

scents = []
for _ in range(num_samples):
    coord, scent = create_grid_scent_patches(target_boundary, p=p_scent, amplitude=1, sigma=2)
    scents.append(deepcopy(scent))

scent = np.sum(scents, axis=0)

plt.imshow(scent, interpolation=None)

### Noisy *and* Missing Scent

Now let's see the most distorted signal we can: one with both Gaussian noise added *and* partially observed.

In [None]:
amplitude = 1
noise_sigma = 1
p_scent = 0.1

coord, scent = create_grid_scent_patches(target_boundary, p=p_scent, amplitude=amplitude, sigma=2)
scent = add_noise(scent, noise_sigma)

plt.imshow(scent, interpolation=None)

And again, let's look over the average. But given how much noise we have added, we will need to average over more samples to see the pattern. Let's increase *num_samples* to 1000.

In [None]:
amplitude = 1
noise_sigma = 1
p_scent = 0.1
num_samples = 1000

scents = []
for _ in range(num_samples):
    coord, scent = create_grid_scent_patches(target_boundary, p=p_scent, amplitude=1, sigma=2)
    scent = add_noise(scent, noise_sigma)
    scents.append(deepcopy(scent))
    
scent = np.sum(scents, axis=0)

plt.imshow(scent, interpolation=None)

So, pretty noisy but resolvable.

### Question 1.1

Adding noise and lowering detection probability both act to increase distortion to the signal channel that will be used by our agents. Will this help or hinder the agents that use sensory signals and/or information to drive their decisions? Explain your answer.

In [None]:
# Write your answer here, as a python comment

### Question 1.2

Re-run the Noisy *AND* Missing simulations above, playing with both of the *p_scent* and *noise_sigma* terms. Do this one at a time (i.e., when changing *p_scent* keep *noise_sigma=1*, when changign *noise_sigma* keep *p_scent=0.1*). 

What are the values for each parameter that lead to a complete loss in the scent signal even when averaging across 1000 samples?

In [None]:
# Write your answer here, as a python comment

## Section 2 - Using Sensory Evidence To Explore


In this section we take on accumulating evidence as a policy for decision making. Our venue is still chemotaxis, but now our sensors are noisy. The presence of this uncertainty makes decisions--of the kind common to decision theory--a necessity. 

Let's see just how helpful the concept of information for chemotaxic search can be.

In [None]:
# Noise and missing scents
p_scent = 0.1
noise_sigma = 1

# Shared 
num_experiments = 100
num_steps = 400
seed_value = 5838

# ! (leave alone)
detection_radius = 1
max_steps = 1
min_length = 1
num_targets = 50
target_boundary = (10, 10)

# Targets
prng = np.random.RandomState(seed_value)
targets = uniform_targets(num_targets, target_boundary, prng=prng)
values = constant_values(targets, 1)

# Scents
scents = []
for _ in range(len(targets)):
    coord, scent = create_grid_scent_patches(
        target_boundary, p=1.0, amplitude=1, sigma=2)
    scents.append(scent)

# Env
env = ScentGrid(mode=None)
env.seed(seed_value)
env.add_scents(targets, values, coord, scents, noise_sigma=noise_sigma)

Again we are working a scent grid environment where each target emits noisy chemical signals (scents) according to our definitions above. 

Here's an example of our environment

In [None]:
plot_boundary = (10, 10)
num_experiment = 0
ax = None
ax = plot_targets2d(
    env,
    boundary=plot_boundary,
    color="black",
    alpha=1,
    label="Targets",
    ax=ax,
)

We will use 3 agents in these sims:

- Rando: Uses random Brownian motion search.
- Chemo: Uses only the detected scent gradient to make a decision.
- Info: Estimates how much *information* is encoded in the scent signal to make a decision.

In [None]:
# Agents

# Random search agent
diff = DiffusionGrid(min_length=min_length, scale=1)
diff.seed(seed_value)

drift_rate = 1
threshold = 3

# Chemotaxis agent
chemo = AccumulatorGradientGrid(
    min_length=min_length, 
    max_steps=max_steps, 
    drift_rate=drift_rate, 
    threshold=threshold,
    accumulate_sigma=1
)
chemo.seed(seed_value)


# Infotaxis agent
info = AccumulatorInfoGrid(
    min_length=min_length, 
    max_steps=max_steps, 
    drift_rate=drift_rate, 
    threshold=threshold,
    accumulate_sigma=1
)

info.seed(seed_value)



Now let's run the experiments.

In [None]:
# Experiments
rand_exp = experiment(
    f"rand",
    diff,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
    seed=seed_value
)
chemo_exp = experiment(
    f"chemo",
    chemo,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
    seed=seed_value
)
info_exp = experiment(
    f"info",
    info,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
    seed=seed_value
)

Let's plot an example experiment. Here I'm choosing the second run for each agent.

In [None]:
plot_boundary = (10, 10)

# -
num_experiment = 2
ax = None
ax = plot_position2d(
    select_exp(chemo_exp, num_experiment),
    boundary=plot_boundary,
    label="Chemo",
    color="blue",
    alpha=0.6,
    ax=ax,
)
ax = plot_position2d(
    select_exp(info_exp, num_experiment),
    boundary=plot_boundary,
    label="Info",
    color="green",
    alpha=0.6,
    ax=ax,
)
ax = plot_position2d(
    select_exp(rand_exp, num_experiment),
    boundary=plot_boundary,
    label="Rando",
    color="grey",
    alpha=0.8,
    ax=ax,
)
ax = plot_targets2d(
    env,
    boundary=plot_boundary,
    color="black",
    alpha=1,
    label="Targets",
    ax=ax,
)

Hard to distinguish their individual behaviors, but our agents seem to be exploring.

Now let's evaluate some metrics of performance.

In [None]:
# Results
results = [rand_exp, info_exp, chemo_exp]
names = ["Rando", "Info", "Chemo"]
colors = ["blue", "green", "grey"]

# Score by eff
scores = []
for name, res, color in zip(names, results, colors):
    l = 0.0
    for r in res:
        l += r["agent_total_l"][-1]
    scores.append(l)   

# Tabulate
m, sd = [], []
for (name, s, c) in zip(names, scores, colors):
    m.append(np.mean(s))
    sd.append(np.std(s))

# Plot means
fig = plt.figure(figsize=(4, 3))
plt.bar(names, m, yerr=sd, color="black", alpha=0.6)
plt.ylabel("Total distance")
plt.tight_layout()
sns.despine()

In [None]:
# Results
results = [rand_exp, info_exp, chemo_exp]
names = ["Rando", "Info", "Chemo"]
colors = ["blue", "green", "grey"]

# Score by eff
scores = []
for name, res, color in zip(names, results, colors):
    scores.append(num_death(res))   

# Tabulate
m, sd = [], []
for (name, s, c) in zip(names, scores, colors):
    m.append(np.mean(s))
    sd.append(np.std(s))

# Plot means
fig = plt.figure(figsize=(4, 3))
plt.bar(names, m, yerr=sd, color="black", alpha=0.6)
plt.ylabel("Deaths")
plt.tight_layout()
sns.despine()

In [None]:
# Results
results = [rand_exp, info_exp, chemo_exp]
names = ["Rando", "Info", "Chemo"]
colors = ["blue", "green", "grey"]

# Score by eff
scores = []
for name, res, color in zip(names, results, colors):
    r = total_reward(res)
    scores.append(r)   

# Tabulate
m, sd = [], []
for (name, s, c) in zip(names, scores, colors):
    m.append(np.mean(s))
    sd.append(np.std(s))

# Plot means
fig = plt.figure(figsize=(5, 4))
plt.bar(names, m, yerr=sd, color="black", alpha=0.6)
plt.ylabel("Total reward")
plt.tight_layout()
sns.despine()

# Dists
fig = plt.figure(figsize=(5, 4))
for (name, s, c) in zip(names, scores, colors):
    plt.hist(s, label=name, color=c, alpha=0.5, bins=np.linspace(0, np.max(scores), 50))
    plt.legend()
    plt.xlabel("Score")
    plt.tight_layout()
    sns.despine()

### Question 2.1

How does each of our agents perform across the performance measures we have chosen? 

In [None]:
# Write your answer here as a comment. Explain yourself.

### Question 2.2

Is having a concept of information (i.e., the Info agent)helpful in these sorts of noisy environments? Why or why not based on how the agents performed? Compare to both Rando and Chemo.

In [None]:
# Write your answer here as a comment. Explain yourself.

## Section 3 - Robustness of information searching

In this final section we will see how the distortion in the channel driven by missing information influences the efficiency of our Info agent. 

Here we will test a range of *p_scent* values. Essentially we will be turning *down* the distortion as *p_scent* increases. For these experiments we will hold the *noise_sigma* constant at 1.

In [None]:
# Our parameters 
p_scents = [0.05, 0.25, .50, .75, .95]

# For plotting
colors = ["darkgreen", "seagreen", "cadetblue", "steelblue", "mediumpurple"]
names = p_scents # list(range(5))

Let's run these experiments. All of the parameters for the agent and environment (aside from *p_scent*) are specified below.

In [None]:
# Define the accumulation parameters
drift_rate = 1
threshold = 3
accumulate_sigma = 1.0

# Define non-scent probability values
noise_sigma = 1
amplitude = 1
detection_radius = 1
max_steps = 1
min_length = 1
num_targets = 50
target_boundary = (10, 10)

# How many experiments to run
num_experiments = 100

# Infotaxis agent
info = AccumulatorInfoGrid(
    min_length=min_length, 
    max_steps=max_steps, 
    drift_rate=drift_rate, 
    threshold=threshold,
    accumulate_sigma=1
)
info.seed(seed_value)

# Run
results = []

for i, p_scent in zip(names, p_scents):
  # Targets
  prng = np.random.RandomState(seed_value)
  targets = uniform_targets(num_targets, target_boundary, prng=prng)
  values = constant_values(targets, 1)

  # Scents
  scents = []
  for _ in range(len(targets)):
      coord, scent = create_grid_scent_patches(
          target_boundary, p=p_scent, amplitude=amplitude, sigma=noise_sigma)
      scents.append(scent)

  # Env
  env = ScentGrid(mode=None)
  env.seed(seed_value)
  env.add_scents(targets, values, coord, scents, noise_sigma=noise_sigma)

  exp = experiment(
    f"info",
    info,
    env,
    num_steps=num_steps,
    num_experiments=num_experiments,
    dump=False,
    split_state=True,
    seed=seed_value
  )

  results.append(exp)


Now let us take a look at the performance of our agent across runs.

In [None]:
# Score
scores = []
for result in results:  
    l = 0.0
    for r in result:
        l += r["agent_total_l"][-1]
    scores.append(l)   

# Tabulate
m, sd = [], []
for s in scores:
    m.append(np.mean(s))

# -
fig = plt.figure(figsize=(5, 5))
plt.bar([str(n) for n in names], m, color="black", alpha=0.6)
plt.ylabel("Total distance")
plt.xlabel("p_scent")
plt.tight_layout()
sns.despine()

In [None]:
# Score
scores = []
for result in results:
    scores.append(num_death(result))   

# -
fig = plt.figure(figsize=(5, 5))
plt.bar([str(n) for n in names], scores, color="black", alpha=0.6)
plt.ylabel("Deaths")
plt.xlabel("p_scent")
plt.tight_layout()
sns.despine()

In [None]:
# Max Score
scores = []
for result in results:
    r = total_reward(result)
    scores.append(r)   

# Tabulate
m = []
for s in scores:
    m.append(np.max(s))

# -
fig = plt.figure(figsize=(5, 5))
plt.bar([str(n) for n in names], m, color="black", alpha=0.6)
plt.ylabel("Best agent's score")
plt.xlabel("p_scent")
plt.tight_layout()
sns.despine()

In [None]:
# Score
scores = []
for result in results:  
    r = total_reward(result)
    scores.append(r)   

# Tabulate
m, sd = [], []
for s in scores:
    m.append(np.mean(s))
    sd.append(np.std(s))

# Plot means
fig = plt.figure(figsize=(6, 3))
plt.bar([str(n) for n in names], m, yerr=sd, color="black", alpha=0.6)
plt.ylabel("Avg. score")
plt.xlabel("p_scent")
plt.tight_layout()
sns.despine()

# Dists of means
fig = plt.figure(figsize=(6, 3))
for (i, s, c) in zip(names, scores, colors):
    plt.hist(s, label=i, color=c, alpha=0.5, bins=list(range(1,50,1)))
    plt.legend()
    plt.ylabel("Frequency")
    plt.xlabel("p_scent")
    plt.tight_layout()
    sns.despine()

### Question 3.1

How does increasing *p_scent* impact our Info agent's performance? Explain why this particular pattern emerges in the results.

In [None]:
# Write your answer here as a comment. Explain yourself.

### Question 3.2

Re-run the simulations from this section, but now change the drift-rate from 1.0 to 0.75. How and why does this influence the agent's behavior (compared to the higher drift-rate)?

In [None]:
# Write your answer here as a comment. Explain yourself.

### Question 3.3

Now set the drift-rate back to 1.0 and reduce the boundary height from 3.0 to 1.5. Re-run the simulations in again. How and why does this influence the agent's behavior (compared to the higher boundary height)?

In [None]:
# Write your answer here as a comment. Explain yourself.