# Linear 1D Filter Tutorial

- Written by G.M. Boynton for Psych 448 at the University of Washington
- Adapted for CSHL Computational Vision course in June 2010
- Translated to Python by Michael Waskom in 2018

In [None]:
%matplotlib inline
import numpy as np
from scipy import stats, ndimage
import matplotlib.pyplot as plt
from ipywidgets import interact
from IPython.display import display

## Linear filters for 1D time-series

A 1-D 'filter' is a function that takes in a 1-D vector, like a time-series and returns another vector of the same size. Filtering shows up all over the behavioral sciences, from models of physiology including neuronal responses and hemodynamic responses, to methods for analyzing and viewing time-series data.

Typical filters are low-pass and band-pass filters that attenuate specific ranges of the frequency spectrum. In this tutorial we'll focus on the time-domain by developing a simple leaky integrator filter and show that it satisfies the properties of superposition and scaling that make it a linear filter.

## The leaky integrator

A 'leaky integrator' possibly the simplest filter we can build, but it forms the basis of a wide range of physiologically plausible models of neuronal membrane potentials (such as the Hodgkin-Huxley model), hemodynamic responses (as measured with fMRI), and both neuronal and behavioral models of adaptation, such as light and contrast adaptation.

A physical example of a 'leaky integrator' is a bucket of water with a hole in the bottom. The rate that the water flows out is proportional to the depth of water in the bucket (because the pressure at the hole is proportional to the volume of water). If we let `y(t)` be the volume of water at any time `t`, then our bucket can be described by a simple differential equation:

$$\frac{dy}{dt} = -\frac{y}{k}$$

Where $k$ is the constant of proportionality. A large $k$ corresponds to a small hole where the water flows out slowly. You might know the closed-form solution to this differential equation, but hold on - we'll get to that later.

We need to add water to the bucket. Let $s(t)$ be the time-course of the flow of water into the bucket, so $s(t)$ adds directly to the rate of change of $y$:

$$\frac{dy}{dt} = s - \frac{y}{k}$$

A physiological example of a leaky integrator is if $y$ is the membrane potential of a neuron where the voltage leaks out at a rate in proportion to the voltage difference (potential) and $s$ is the current flowing into the cell. This is the basis of a whole class of models for mebrane potentials, including the famous Hodgkin-Huxley model.

We can easily simulate this leaky integrator in discrete steps of time by updating the value of $y$ on each step according to the equation above:

In [None]:
def leaky_integrator(t, s, k):
    """Simulate the leaky integration filter."""
    dx = t[1] - t[0]
    y = np.zeros(t.shape)
    for i, t_i in enumerate(t[:-1]):
        dy = s[i] - y[i] / k
        y[i + 1] = y[i] + dy * dt
    return y

def boxcar(t, dur, start=0, amp=None):
    """Define a boxcar stimulus"""
    amp = 1 / dur if amp is None else amp
    s = np.zeros(t.shape)
    s[(t >= start) & (t < (start + dur))] = amp
    return s

def plot_resp(t, s, y, f=None):
    """Plot the input and output of a filter."""
    if f is None:
        f, (ax_i, ax_o) = plt.subplots(2, sharex=True)
    else:
        ax_i, ax_o = f.axes
    ax_i.plot(t, s, lw=2, color="C0")
    ax_o.plot(t, y, lw=2, color="C1")
    ax_i.set(ylabel="Input")
    ax_o.set(xlabel="Time (s)", ylabel="Output")
    f.tight_layout()
    return f

Let's see what happens to an input that is $1$ for 1 second and then $0$ thereafter.

For our first example, we'll let $k = \infty$, so there's no hole in the bucket (or the hole is infinitely small).

In [None]:
dt = .001
maxt = 5
t = np.arange(0, maxt, dt)
s = boxcar(t, 1)
y = leaky_integrator(t, s, k=np.inf)

In [None]:
plot_resp(t, s, y);

Next we'll add the hole in the bucket by setting $k = 1$, See how the bucket fills up during the first second while $s(t) = 1$.  The level of water in the bucket is the integral of $s(t)$ over time.

In [None]:
y = leaky_integrator(t, s, k=1)
plot_resp(t, s, y);

Try this again with a bigger hole by letting $k = \frac{1}{5}$:

In [None]:
y = leaky_integrator(t, s, k=.2)
plot_resp(t, s, y);

This time the water nearly reached an asymptotic level. This is because the flow rate out the bottom eventually reached the rate of flow into the bucket. With $k=\frac{1}{5}$, can you figure out why the asymptotic level is 0.2? Notice also how the water drains out more quickly. 
The size of the hole, $k$, is called the 'time-constant' of this leaky integrator.

The following interactive widget will let you play with the time constant of the integrator and see how it interacts with the duration of the stimulus:

In [None]:
@interact
def leak_tutorial(k=(0, 2, .1), dur=(0, 2, .1)):
    s = boxcar(t, dur, amp=1)
    y = leaky_integrator(t, s, k)
    f = plot_resp(t, s, y)
    plt.setp(f.axes, ylim=(0, 1.05))

## Responses to short impulses

Our previous example had water flowing into the bucket at 1 gallon/second for one second. What happens when we splash in that gallon of water in a much shorter period of time, say within 1/10th of a second?

In [None]:
k = .2
dur = .01
s = boxcar(t, dur)
y = leaky_integrator(t, s, k)
plot_resp(t, s, y);

Here's what happens when the same amount of water splashed in 1/100th of a second.

In [None]:
dur = .001
s = boxcar(t, dur)
y = leaky_integrator(t, s, k)
plot_resp(t, s, y);

Compare these two responses - they're nearly identical. This is because for short durations compared to the time-constant, the leaky integrator doesn't leak significantly during the input, so the inputs are effectively the same.

How long in duration can you let the stimulus get before it starts significantly affecting the shape of the response? How does this depend on the time-constant of the leaky integrator?

This behavior is an explanation for "Bloch's Law", the phenomenon that brief flashes of light are equally detectable as long as they are very brief, and contain the same amount of light. Indeed, the temporal properties of the early stages of the visual system are typically modeled as a leaky integrator.

This response to a brief '1-gallon' (or 1 unit) stimulus is called the 'impulse response' and has a special meaning which we'll get to soon.

## Scaling

It should be clear by the way the stimulus feeds into the response that doubling the input doubles the peak of the response. Since the recovery falls of in proprtional to the current value, you can convince yourself that the whole response scales with the size of the input. Here's an example of the response to two brief pulses of different sizes separated in time. You'll see that the shape of the two responses are identical - they only vary by a scale-factor. This is naturally called 'scaling'. Mathematically, if $L(s(t))$ is the response of the system to a stimulus $s(t)$, then $L(ks(t)) = kL(s(t))$.



In [None]:
dur = .01
s = boxcar(t, dur, start=0, amp=3) + boxcar(t, dur, start=3, amp=1)
y = leaky_integrator(t, s, k)
plot_resp(t, s, y);

## Superposition

In the last example, the second stimulus (splash of water) occurred long after the response to the first stimulus was over. What happens when the second stimulus happens sooner, while the response to the first splash is still going on?



In [None]:
k = .2
dt = .001
maxt = 2
t = np.arange(0, maxt, dt)

dur = .01

s1 = boxcar(t, dur, start=0)
y1 = leaky_integrator(t, s1, k)

s2 = boxcar(t, dur, start=.1)
y2 = leaky_integrator(t, s2, k)

y12 = leaky_integrator(t, s1 + s2, k)

If we plot the responses to each stimulus, then the sum of the filtered outputs ($y_1 + y_2$) *and* the filtered sum of the inputs ($y_{1+2}$), we can see that the latter two responses lie on top of each other:

In [None]:
f = plot_resp(t, s1, y1)
plot_resp(t, s2, y2, f=f);

_, ax = f.axes
ax.plot(t, y1 + y2, lw=2, color="C2")
ax.plot(t, y12, lw=2, color=".1", dashes=(3, 3));

In other words,

$$L(s_1 + s_2) = L(s_1) + L(s_2).$$

This property is called **superposition**.  By the way, we're also assuming
that the time-constant is fixed so that the shape of the response to a
stimulus doesn't vary with when it occurs.  This property is called
**shift invariance**.

A system that has both the properties of scaling and superposition is
called a **linear system**, and one with shift invariance is called (wait
for it) a **shift-invariant linear system**.

Use the following interactive widget to see how the filter sums impulses that happen at different delays from each other:

In [None]:
@interact
def shift_tutorial(k=(0, 2, .1), delay=(0, 2, .1)):
    s = boxcar(t, dt, amp=1) + boxcar(t, dt, delay, amp=1)
    y = leaky_integrator(t, s, k)
    f = plot_resp(t, s, y)
    f.axes[0].set(ylim=(0, 2.1))
    f.axes[1].set(ylim=(0, dt * 2.1))

---

##  Analytical solution for the leaky integrator

The differental equation that describes the leaky integrator is very easy to solve analyitically. If

$$\frac{dy}{dt} = -\frac{y}{k},$$

then

$$\frac{dy}{y} = -\frac{dt}{k}.$$

Integrating both sides over time yields

$$\log(y) = -\frac{t}{k}+C.$$

Exponentiating:

$$y = e^{-t/k+C}.$$

If we let y(0) = 1, then C=0 so

$$y = e^{-t/k}.$$

Let's compare the simulated to the analytical impulse response:

In [None]:
dt, maxt = .01, 1
t = np.arange(0, maxt, dt)

s = boxcar(t, dt)
h = leaky_integrator(t, s, k)
hh = np.exp(-t / k)

f, ax = plt.subplots()
ax.plot(t, h, label="Simulated")
ax.plot(t, hh, dashes=(3, 2), label="Analytical")
ax.set(xlabel="Time (s)", ylabel="Response")
ax.legend()
f.tight_layout()

It's pretty close. It'd be even closer if we used a smaller discrete time
step for our simulation.  It'd be *even* closer if we used a less bozo
simulation technique, like a [Runge-Kutta method](https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods) or other more
sophisticated method of numerical approximating a differential equation.

---

## Convolution

The properties of scaling and superposition have a significant consequence - if we think of any complicated input as a sequence of scaled impulses, then the output of the system to this input can be predicted by a sum of shifted and scaled impulse response functions. Here's an example where the input has three impulses at time-points 1, 6 and 11:

In [None]:
t1, t2, t3 = tps = np.array([1, 6, 11]) - 1
s = np.zeros(t.shape)
s[[t1, t2, t3]] = 1 / dt
y = leaky_integrator(t, s, k)
f = plot_resp(t, s, y)

The response at time point, say 16, is predicted by the sum of the response to the three inputs. Each of these three inputs produces the same shaped impulse-response, so the response at time 16 is the sum of the impulse response function evaluated at three points in time.

Plot the response to the three impulses:

In [None]:
_, ax = f. axes
for i in range(t.size):
    ax.plot(t[i:-1], s[i] * h[:-(i+1)] * dt, c="C2")
display(f)

The response to the three inputs at time point 16 is the impulse response function evaluated at the time since the input occured:

In [None]:
idx = 16
for tp_i in tps:
    ax.plot(t[idx], h[idx - tp_i], c="C2", marker="o")
display(f)

Superposition means that the response to the system is the sum of these three values:

In [None]:
r = h[idx - tps].sum()
ax.plot(t[idx], r, c="C1", marker="o")
display(f)

For any stimulus, the response at time-point 16 will be the sum of shifted, scaled versions of the impulse response function. This loop should give us the same number as the calculation above:

In [None]:
rr = sum(s[i] * h[idx - i] * dt for i in range(t.size))
ax.plot(t[idx], rr, c=".1", marker="x")
display(f)

The response at all time-points can be calculated as above by looping through time:

In [None]:
rr = np.zeros(t.shape)
for j in range(t.size):
    rr[j] = sum(s[i] * h[j - i] * dt for i in range(t.size))
ax.plot(t, rr, c=".1", dashes=(3, 3))
display(f)

This operation is called 'convolution', and can be implemented by the function `numpy.convolve`. This function takes in two vectors of length `m` and `n` and returns a vector of length `m + n - 1`. If the first vector is the stimulus and the second is the impulse response, then you'd think that the output would have length `m`. It's longer because the function pads the inputs with zeros so that we get the entire response to the very last input. We'll truncate the output to the length of the input:

In [None]:
rconv = np.convolve(s, h) * dt
ax.plot(t, rconv[:t.size], c="C1", marker="x")
display(f)

## Response to arbitrary stimulus

With convolution and the analytical solution to the leaky integrator, we can predict the response to of the system to any input, like a random series of scaled impulses:

In [None]:
dt, maxt = .01, 4
t = np.arange(0, maxt, dt)
k = .2
h = np.exp(-t / k)

rng = np.random.RandomState(seed=100)
s = np.floor(rng.rand(t.size) + .05) * np.round(rng.rand(t.size) * 5) / dt
y = np.convolve(s, h)[:t.size] * dt
plot_resp(t, s, y);

## Cascades of leaky integrators

It is common to model sensory systems with a 'cascade' of leaky integrators, where the output of one integrator feeds into the input of the next one. You can think of this as a series of buckets hanging below eachother, where the flow of water out the bottom bucket is the output of the system.

In [None]:
dt, maxt = .001, 4
t = np.arange(0, maxt, dt)

s = boxcar(t, dt)
y = s.copy()
k = .1
n = 4

for _ in range(n):
    y = leaky_integrator(t, y, k) / k
f = plot_resp(t, s, y)

This response of a cascade of leaky integrators to an impulse turns out to be the PDF of the Gamma distribution with shape $n$ and scale $k$:

In [None]:
h = stats.gamma(n, scale=k).pdf(t)
_, ax = f.axes
ax.plot(t, h, dashes=(3, 3), color=".1")
display(f)

This interactive widget lets you play with the parameters of the cascade:

In [None]:
@interact
def cascade_tutorial(n=(2, 5), k=(.1, 1, .05)):
    s = boxcar(t, dt, amp=1)
    h = stats.gamma(n, scale=k).pdf(t)
    f = plot_resp(t, s, h)
    f.axes[1].set(ylim=(0, 4))

## Responses of a linear system to an arbitrary stimulus

Here's the response of the cascade of leaky integrators to a white noise stimulus:

In [None]:
s = rng.randn(t.size)
y = np.convolve(s, h)[:t.size]
plot_resp(t, s, y);

See how smooth the output is? If you think about how convolution acts, the response at each time point is the sum of the previous inputs weighted by the impulse-response function going back in time. The impulse response function of the cascade of leaky integrators (the Gamma function) is a smooth bump, so the response at any given time is a weighted average of the previous inputs. This effectively smooths out the bumps in the input. If you know something about the frequency domain, what do you think this does to the input in terms of frequencies?

---

## Response to sinusoids

Following up on this smoothing observation, we'll look at output of our cascade of leaky integrators to sinusoids of different frequencies.

Note that we're going to use a different convolution function here. The `convolve` function in numpy doesn't do the circular convolution that we want, so we need to reach for a convolution function from the `scipy` image-processing library:

In [None]:
dt, maxt = .01, 10
t = np.arange(0, maxt, dt)
h = stats.gamma(4, scale=.2).pdf(t)

freq = .4  # Hz
s = np.sin(2 * np.pi * freq * t)
y = ndimage.convolve(s, h, mode="wrap") * dt
plot_resp(t, s, y);

Try again with a higher frequency:

In [None]:
freq = 2  # Hz
s = np.sin(2 * np.pi * freq * t)
y = ndimage.convolve(s, h, mode="wrap") * dt
plot_resp(t, s, y);

This illustrates a unique property of shift-invariant linear systems: The response to any sinusoid is a sinusoid of the same frequency, scaled in amplitude and delayed in phase (The wobbly part in the beginning is because at the beginning, the input into the filter isn't a complete sinusoid until time has reached the duration of the impulse response).

This only works for sinusoids - other functions (like square waves or whatever) will change shape after being passed through the filter.

Let's calculate the amplitude of the output sinusoid for different frequencies to see how the filter attenuates higher and higher frequencies:

In [None]:
n_cycles = np.arange(1, 21)
amp = np.zeros(n_cycles.size)
for i, n in enumerate(n_cycles):
    s = np.sin(2 * np.pi * n * t / maxt)
    y = ndimage.convolve(s, h, mode="wrap") * dt
    amp[i] = (y.max() - y.min()) / 2

f, ax = plt.subplots(figsize=(5, 5))
ax.plot(n_cycles, amp, marker="o")
ax.set(xlabel="Frequence of input (cycles)",
       ylabel="Amplitude of output")
f.tight_layout()

The way a linear filter attenuates a sinusoid matches the amplitudes of the fft of the filter's impulse response function. What's the significance of this?

We know that:

1. Any time-series can be represented as a sum of scaled sinsoids
2. A linear system only scales and shifts sinusoids
3. The response to the sum of inputs is equal to the sum of the responses
4. The fft of the impulse response determines how the filter scales the sinusoids

Together, this means that there are two ways to calculate the response to a linear system: (1) convolving with the impulse response function and (2) multiplying the fft of the input with the fft of the impulse response function. Convolution in the time domain equals point-wise multiplication in the frequency domain.

This will make more sense with an example. We'll make a band-pass filter by taking the FFT of an impulse, attenuating the amplitudes within a frequency band with a Gaussian, and taking the inverse FFT to get the filter in the time domain. First, we'll define some functions to help us with this task:

In [None]:
# If these functions come up elsewhere we'll probably want
# to define them in a helper library
def complex_to_real(F, t):
    """Return real-valued amplitudes and phases from fft convention."""
    nt = t.size
    dt = t[1] - t[0]

    dc = F[0] / nt
    amp = 2 * np.abs(F) / nt
    ph = -180 * np.angle(F) / np.pi

    nf = np.ceil(nt / 2) + 1
    idx = slice(1, int(nf))
    freq = np.abs(np.fft.fftfreq(t.size, dt))
    
    Y = dict(
        dc=dc,
        ph=ph[idx],
        amp=amp[idx],
        freq=freq[idx],
    )
    return Y


def real_to_complex(Y, t):
    """Return complex-valued vector in fft convention."""
    nt = t.size
    dt = t[1] - t[0]
    F = np.zeros(nt, np.complex)

    nf = Y["freq"].size
    amp = nt * Y["amp"] / 2
    ph = -np.pi * Y["ph"] / 180
    z = amp * np.exp(-ph * 1j)
    
    F[0] = Y["dc"] * nt
    F[1:nf + 1] = z
    F[nf:] = z.conj()[::-1]

    return F


def plot_fft(t, y):
    """Plot input and positive frequency spectrum."""
    Y = complex_to_real(np.fft.fft(y), t)
    f, (ax_t, ax_f) = plt.subplots(1, 2, figsize=(8, 4))
    ax_t.plot(t, y)
    ax_f.stem(Y["freq"], Y["amp"], basefmt=" ", markerfmt=".")
    ax_t.set(xlabel="Time (s)", ylabel="Amplitude", xlim=(t.min(), t.max() + dt))
    ax_f.set(xlabel="Frequency (Hz)", ylabel="Amplitude", ylim=(0, None))
    f.tight_layout()

In [None]:
dt, maxt = .01, 1
t = np.arange(0, maxt, dt)

# Delta function at time point 50
y = (t == t[50]).astype(float)
F = np.fft.fft(y)
Y = complex_to_real(F, t)

# Attenuate the amplitudes with a Gaussian
g_center, g_width = 6, 2  # Hz
Y["amp"] *= np.exp(-(Y["freq"] - g_center) ** 2 / g_width ** 2)

# Take the inverse Fourier transform
y_recon = np.fft.ifft(real_to_complex(Y, t)).real

plot_fft(t, y_recon)

The impulse response function of this band-pass filter is a Gabor. We can
describe this filter entirely by either this impulse response function or
by it's fft (including the phase, which isn't plotted here).  Think about
what happens when you convolve a time-series with this Gabor.  At each
time step we center the Gabor on the time series and do a point-wise
mulitplication and add up the numbers.  If the time-series is a sinusoid
that modulates at the frequency of the Gabor, you can see how this leads
to a large response. This is an ideal input - anything else will lead to
a weaker outut. Hence the band-pass property of the filter.

This filter is a little strange in the time-domain because it spreads
both forward and backward in time.  In a sense, it responds to parts of
the input that haven't happened yet. A more realistic impulse response
function for the time domain only reponds to the past.  This is called a
'causal filter'. The leaky integrator is an example of a causal filter.

How do we build a causal band-pass filter?  One way is to build it in the
time domain as a difference of two low-pass filters:

In [None]:
k = 1 / 40
h1 = stats.gamma(4, scale=k).pdf(t)
h2 = stats.gamma(5, scale=k).pdf(t)
h = h1 - h2
plot_fft(t, h)

As you can see in the plot of fourier spectrum, this filter has a maximum
sesnsitivity to frequencies around 3 Hz. You can also see this from the
shape of the impulse response. It wiggles up and down one cycle in about
1/3 of a second, which is 3Hz. A convolution with a 3Hz sinusoid will
produce the largest response.

---

## An example: the hemodynamic response function

Note: this last section overlaps with beginning of the ER_fMRI tutorial.

Functional MRI (fMRI) measures changes in blood flow and oxygenation associated with the underlying neuronal response. The most common method for analyizing fMRI data uses the 'general linear model' that assumes that the 'hemodynamic coupling' process acts as a linear shift-invariant filter. Back in 1996 Boynton and Heeger tested this idea and found that the impulse response function acts like a cascade of leaky integrators with these typical parameters:

In [None]:
k = 1  # seconds
delay = 2  # seconds
n_cascades = 3

dt, maxt = .01, 15
t = np.arange(0, maxt, dt)
hdr = stats.gamma(n_cascades, loc=2, scale=k).pdf(t)

f, ax = plt.subplots()
ax.plot(t, hdr)
ax.set(xlabel="Time (s)", ylabel="Amplitude")
f.tight_layout()

One kind of fMRI experimental design is a 'blocked design' where two conditions alternate back and forth. A Typical period for a blocked design is something like 25 seconds. If we assume that the neuronal response is following the stimulus closely in time (compared to the hemodyanmic response), the neuronal response might look something like this:

In [None]:
dt, maxt = .01, 120
t = np.arange(0, maxt, dt)
period = 25
s = np.sign(np.sin(2 * np.pi * t / period))
y = np.convolve(s, hdr)[:t.size] * dt
plot_resp(t, s, y);

The output is the expected shape of the fMRI response. Many software packages will use a convolution with the stimulus design to produce an expected response like this as a template to compare to the actual fMRI data. This template is correlated with each voxel's time-series to produce a number between zero and 1, where 1 is a perfect fit. This produces a 'parameter map' that can tell us which brain areas are responding as expected to the experimental paradigm.