# Diffusion Process Tutorial

This tutorial simulates the basic diffusion (random walk) model that predicts accuracy and RTs for a 2AFC decision.

- Written by G.M. Boynton, Summer 2008
- Translated to Python by ML Waskom, Spring 2020

For a great reference, see: Palmer, Huk & Shadlen (2008) Journal of Vision http://www.journalofvision.org/5/5/1/

In [None]:
import numpy as np
from scipy import integrate
import matplotlib.pyplot as plt

In [None]:
seed = sum(map(ord, "Diffusion tutorial"))
rng = np.random.default_rng(seed)

## What is a "diffusion process"?

The basic idea is that decisions are made by accumulating information over time. Specifically, evidence is represented as a variable that increments or decrements over time based on incoming information. I can't summarize it better than Palmer, Huk and Shadlen:

> "The internal representation of the relevant stimulus is assumed to be noisy and to vary over time. Each decision is based on repeated sampling of this representation and comparing some function of these samples to a criterion. For example, suppose samples of the noisy signal are taken at discrete times and are added together to represent the evidence accumulated over time. This accumulated evidence is compared to an upper and lower bound. Upon reaching one of these bounds, the appropriate response is initiated.
>
> If such a random walk model is modified by reducing the time steps and evidence increments to infinitesimals, then the model in continuous time is called a diffusion model (Ratcliff, 1978; Smith, 1990). For this model, the accumulated evidence has a Gaussian distribution, which makes it a natural generalization of the Gaussian version of signal detection theory (Ratcliff, 1980)."

(Diffusion processes are also known as the Wiener process or Brownian motion)

Intuitively, you can imagine how this simple model can predict error rates, and RTs for both correct and incorrect responses. In the simplest case, analytical solutions exist for accuracy and the distribution of RTs.

This tutorial simulates a simple random walk model, calculating accuracy and histograms of RTs. The results of the simulation is then compared to curves generated from the analytic solution.

## Define variables for the random walk

In [None]:
p = dict(
    a=+.1,  # Upper bound (correct)
    b=-.1,  # Lower bound (wrong)
    u=.1,   # Drift rate (units/sec),
    s=.1,   # Standard deviation of drift (units/sec)
    dt=.1,  # Step size for simulations (seconds)
)

## Quick simulatoin

First, let's do a quick simualtion of the random walk model without worrying about keeping track of RTs.

In [None]:
n_reps = 5  # Number of staircaess per simulation
n_steps = 30  # number of time steps

Generate the entire matrix of step sizes in a single line.

In this case, step sizes are pulled from a normal distribution with mean $u \Delta t$ and standard deviation $s \sqrt{\Delta t}$.  Why $\sqrt{\Delta t}$?  Because variance adds linearly over time, so standard deviation adds by $\sqrt{\Delta t}$.

In [None]:
drift = p["u"] * p["dt"]
diffusion = np.sqrt(p["dt"]) * rng.normal(0, p["s"], (n_steps, n_reps))
dy = drift + diffusion

The random walk is a cumulative sum of the steps (`dy`):

In [None]:
y = dy.cumsum(axis=0)

Plot the random walks

In [None]:
t = p["dt"] * np.arange(0, n_steps)

f, ax = plt.subplots(figsize=(4, 5))

# Plot the criterion and bounds
ax.axhline(p["a"], c=".5", ls="--")
ax.axhline(p["b"], c=".5", ls="--")
ax.axhline(0, c=".5", ls=":")

# Plot the random walks
ax.step(t, y)

ax.set(
    xlim=(0, n_steps * p["dt"]),
    xlabel="Time (s)",
)
f.tight_layout()

A second way to implement the diffusion model is to fix the up and down step sizes, and flip a biased coin to determine whether the walk goes up or down.

The up and down step size is $s\sqrt{\Delta t}$ (the same as the standard deviation of the step size above).

In [None]:
y_step = p["s"] * np.sqrt(p["dt"])

The coin is biased by the drift rate.  Let's derive it.  If $p$ is the probability of going up, then the expected $\Delta y$ of a given step is

$$
py_\mathrm{step} - (1-p)y_\mathrm{step} 
= (2p-1)y_\mathrm{step}
= s(2p - 1) \sqrt{\Delta t}.
$$

This should be equal to the drift rate, $u\Delta t$.  So $s(2p - 1)\sqrt{\Delta t} = u\Delta t$.  Solving for $p$ (prob) gives:

In [None]:
prob = .5 * (np.sqrt(p["dt"]) * p["u"] / p["s"] + 1)
dy = np.where(rng.binomial(1, prob, (n_steps, n_reps)), y_step, -y_step)
y = dy.cumsum(axis=0)

In [None]:
f, ax = plt.subplots(figsize=(4, 5))

# Plot the criterion and bounds
ax.axhline(p["a"], c=".5", ls="--")
ax.axhline(p["b"], c=".5", ls="--")
ax.axhline(0, c=".5", ls=":")

# Plot the random walks
ax.step(t, y)

ax.set(
    xlim=(0, n_steps * p["dt"]),
    xlabel="Time (s)",
)
f.tight_layout()

## Simulating RT distributions

The walk should stop when the acculating variable hits one of the decision bounds. The time step when this happens is the RT for that trial.

We'll implement this with a loop over time.  We'll only keep track of the current values of y, and only increment the walks that haven't lead to a decision.

In [None]:
n_reps = 5000
p["dt"] = .001

In [None]:
y = np.zeros(n_reps)
response = np.empty(n_reps)
rt = np.empty(n_reps)
alive = np.ones(n_reps, bool)
t = 0

while alive.any():

    t += 1

    # Take the next step
    dy = p["u"] * p["dt"] + np.sqrt(p["dt"]) * rng.normal(0, p["s"], n_reps)
    y += dy

    # Find processes hitting the upper bound
    a_bound = (y >= p["a"]) & alive
    response[a_bound] = +1
    rt[a_bound] = t * p["dt"]
    alive[a_bound] = False

    # Find processes hitting the lower bound
    b_bound = (y <= p["b"]) & alive
    response[b_bound ] = -1
    rt[b_bound] = t * p["dt"]
    alive[b_bound] = False

Now we'll plot the histograms of the correct and incorrect RTs:

In [None]:
f, ax = plt.subplots()

bins = np.linspace(0, 5, 51)
ax.hist(rt[response == +1], bins, label="Correct")
ax.hist(rt[response == -1], bins, label="Wrong")

ax.set(xlabel="RT (s)")
ax.legend()
f.tight_layout()

## Compare the simulated accuracy to the analytical solution

Here is an analytical solution for the expected probability correct (derived in the Palmer reference given above):

In [None]:
def diffusion_pc(a, b, u, s, **kws):
    A = np.exp(-2 * u * a / s ** 2)
    B = np.exp(-2 * u * b / s ** 2)
    return (B - 1) / (B - A)

In [None]:
expected = diffusion_pc(**p)
simulated = (response == +1).mean()
print(f"Expected: {expected:.2%}; simulated: {simulated:.2%}")

## Compare the simulated RTs to the analytical solution

We can also derive the shape of the reaction time distribution on correct trials:

In [None]:
def diffusion_rt_pdf(t, a, b, u, s, dt, k=20):

    A = np.exp(-(t * u - 2 * a) * u / (2 * s ** 2)) / np.sqrt(2 * np.pi * 2 ** 2 * t ** 3)
    B = 0

    for k in range(-k, k + 1):
        B += (a + 2 * k * (a + b)) * np.exp(-(a + 2 * k * (a + b)) ** 2 / (2 * t * s **2))

    pc = diffusion_pc(a, b, u ,s)
    y = A * B / pc
    y /= integrate.trapz(y, t)

    return y

In [None]:
t = np.linspace(0, 30, 1001)[1:]
pdf = diffusion_rt_pdf(t, **p)

f, ax = plt.subplots()
ax.plot(t, pdf, lw=3, label="Expected")
ax.hist(rt[response == 1], bins, density=True, alpha=.5, label="Simulated")
ax.set(xlim=(0, 5), xlabel="RT (s)")
ax.legend()
f.tight_layout()

## Excercises

1. Play around with the model parameters: drift rate, decision bounds - and see how the affect the accuracy and distribution of RTs. Think about how a given data set (either behavioral or physiological) can help constrain these variables. What is the signature of an increase in the decision bound? What is the signature for an increase in noise?

2. Generate a psychometric function by calculating the percent correct as a function of drift rate, either through simulation or through the analytical solution.  See how the slope and threshold for this function varies with the mean and standard deviations of the drift rates. Does this make sense?