Peri-Stimulus Time Histograms Estimation Through Poisson Regression Without Generalized Linear Models
(require 'ox-beamer)
(setq org-beamer-outline-frame-options "")
(setq org-export-babel-evaluate nil)
Viewed “from the outside”, neurons generate brief electrical pulses: the action potentials
Left, the brain of an insect with the recording probe on which 16 electrodes (the bright spots) have been etched. Each probe’s branch has a 80
After a “rather heavy” pre-processing called spike sorting, the raster plot representing the spike trains can be built:
- A key working hypothesis in Neurosciences states that the spikes’ occurrence times, as opposed to their waveform are the only information carriers between brain region (Adrian and Zotterman, 1926).
- This hypothesis encourages the development of models whose goal is to predict the probability of occurrence of a spike at a given time, without necessarily considering the biophysical spike generation mechanisms.
- In the sequel we will identify spike trains with point process / counting process realizations.
The expected counting process of a homogeneous Poisson process—with the same mean frequency—is shown in red.
A renewal process is inadequate here: the rank of successive inter spike intervals are correlated.We split the ranks into 5 categories of equal size.
ECDF of interval k+1 rank conditioned on the class of interval k rank. 95% confidence bands would have here a width of 0.14.
Even if we focus on “isolated” neurons in the stationary / homogeneous regime, renewal processes won’t be adequate in general as models of our observed spike trains.
Neuron’s 3 spike times relative to neuron’s 2 (ref. neuron) spike times.
\alert{Consequence}: our model should be able to handle interactions.
20 stimulation with citronellal. Stimulation are delivered during 500 ms (gray background). Is neuron 2 responding to the stimulation? Cockroach (Periplaneta americana) recordings and spike sorting by Antoine Chaffiol.
Neuron 1: 20 stimulation with citronellal, terpineol and a mixture of the two. \alert{Are the reponses any different?}
Our model should give room for:
- The elapsed time since the last spike of the neuron (enough for homogeneous renewal processes).
- Variables related to the discharge history—like the duration of the last inter spike interval.
- The elapsed time since the last spike of a “functionally coupled” neuron.
- The elapsed time since the beginning of an applied stimulation.
- We want to estimate the peri-stimulus time histogram (PSTH) considered as an observation from an inhomogeneous Poisson process.
- In addition to estimation we want to:
- Test if a neuron is responding to a given stimulation.
- Test if the responses of a given neuron to two different stimulations are different.
- This implies building some sort of confidence bands around our best estimation.
We go from the raw data to an histogram built with a tiny time step (25 ms), leading to an estimator with little bias and large variance.
- We model this “averaged process” as an inhomogeneous Poisson process with intensity λ(t).
- The histogram we just built can then be seen as the observation of a collection of Poisson random variables,
$\{Y_1,\ldots,Y_k\}$ , with parameters: $$n \, ∫t_i-δ/2t_i+δ/2λ(u) \, du \; ≈ \; n \, λ(t_i) \, δ \; , \quad i = 1,\ldots,k \; ,$$ where$t_i$ is the center of a class (bin), δ is the bin width,$n$ is the number of stimulations and$k$ is the number of bins. - A piecewise constant estimator of λ(t) is then obtained with:$$\hat{λ}(t) = y_i/(n δ)\, , \quad \textrm{if} \quad t ∈ [t_i-δ/2,t_i+δ/2) \; .$$ This is the “classical” PSTH.
- We are going to assume that λ(t) is \alert{smooth}—this is a very reasonable assumption given what we know about the insect olfactory system.
- We can then attempt to improve on the “classical” PSTH by trading a little bias increase for (an hopefully large) variance decrease.
- Many nonparametric methods are available to do that: kernel regression, local polynomials, smoothing splines, wavelets, etc.
- A problem in the case of the PSTH is that the observed counts (
$\{y_1,\ldots,y_k\}$ ) follow Poisson distributions with different parameters implying that they have different variances. - We have then at least two possibilities: i) use a generalized linear model (GLM); ii) transform the data to stabilize the variance.
- We are going to use the second approach.
- Following Brown, Cai and Zhou (2010), let’s consider
$X_1,\ldots,X_n$ IID from a Poisson distribution with parameter$ν$ . - Define $X = ∑j=1n X_j$, the CLT gives us:
$$\sqrt{n}\left(X/n-ν\right) \stackrel{L}{→} \mathcal{N}(0,ν) \quad \textrm{as} \; n → ∞ \, .$$ - A variance stabilizing transformation is a function
$G : \mathbb{R} → \mathbb{R}$ , such that:$$ G’(x) = 1/\sqrt{x}\, .$$ - The delta method (or the error propagation method; a first order Taylor expansion) then yields:$$\sqrt{n}\left(G(X/n)-G(ν)\right) \stackrel{L}{→} \mathcal{N}(0,1)\, . $$
- It is known (Anscombe, 1948) that the variance stabilizing properties can be further improved by using transformation of the form:$$H_n(X) = G\left(\frac{X+a}{n+b}\right)$$ for suitable choices of
$a$ and$b$ . - In nonparametric regression we want to set
$a$ and$b$ such that$\mathrm{E}\left(H_n(X)\right)$ optimally matches$G(ν)$ . - Brown, Cai and Zhou (2010) show that in all relevant PSTH estimation problems we have: $$\mathrm{Var}\left(2 \sqrt{(X+1/4)/n}\right) = \frac{1}{n} + O(n-2) \, .$$
- They also show that: $$\mathrm{E}\left(2 \sqrt{(X+1/4)/n}\right) - 2 \sqrt{ν} = O(n-2) \, .$$
- They get similar transformations for binomial and negative binomial random variables.
- Since our knowledge of the biophysics of these neurons and of the network they form is still in its infancy, we can hardly propose a reasonable parametric from for our PSTHs (or their variance stabilized versions).
- We therefore model our stabilized PSTH by:
$$Z_i \doteq 2 \sqrt{(Y_i+1/4)/n} = r(t_i) + ε_i σ \, ,$$ where the$ε_i \stackrel{\textrm{IID}}{∼} \mathcal{N}(0,1)$ ,$r$ is assumed “smooth” and is estimated with a linear smoother (kernel regression, local polynomials, smoothing splines) or with wavelets (or with any nonparametric method you like).
- Following Larry Wasserman (All of Nonparametric Statistics, 2006) we define a linear smoother by a collection of functions
$l(t) = \left(l_1(t),\ldots,l_k(t)\right)^T$ such that: $$\hat{r}(t) = ∑i=1^k l_i(t) Z_i\, . $$ - The simplest smoother we are going to use is built from the tricube kernel:
$$K(t) = \frac{70}{81}\left(1 - \left|t\right|^3\right)^3 I(t) \, ,$$ where$I(t)$ is the indicator function of$[-1,1]$ . - The functions
$l_i$ are then defined by: $$l_i(t) = \frac{K\left(\frac{t-t_i}{h}\right)}{∑j=1^k K\left(\frac{t-t_j}{h}\right)}\, .$$
- When using this kind of approach the choice of the bandwidth
$h$ is clearly critical. - Since after variance stabilization the variance is known we can set our bandwidth by minimizing Mallows’
$C_p$ criterion instead of using cross-validation. For (soft) wavelet thresholding we use the universal threshold that requires the knowledge (or an estimation) of the variance. - More explicitly, with linear smoothers our estimations
$\left(\widehat{r}(t_1),\ldots,\widehat{r}(t_k)\right)^T$ can be written in matrix form as:$$\widehat{\mathbf{r}} = L(h) \, \mathbf{Z} \, ,$$ where$L(h)$ is the$k × k$ symmetric matrix whose element$(i,j)$ is given by$l_i(t_j)$ .
- Ideally we would like to set
$\widehat{h}$ as: $$arg\minh (1/k) ∑i=1^k \left(r(t_i) - \hat{r}(t_i)\right)^2 \, .$$ - But we don’t know
$r$ (that’s what we want to estimate!) so we minimize Mallows’$C_p$ criterion: $$ (1/k) ∑i=1^k \left(Z_i - \hat{r}(t_i)\right)^2 + 2 σ^2 \mathrm{tr}\left(L(h)\right)/k \, ,$$ where$\mathrm{tr}\left(L(h)\right)$ stands for the trace of$L(h)$ . - If we don’t know
$σ^2$ , we minimize the cross-validation criterion: $$\frac{1}{k} ∑i=1^k \frac{\left(Z_i - \hat{r}(t_i)\right)^2}{1-Lii(h)} \, .$$
Left: CV score in black, Cp score in red. Right: Variance stabilized data (black) with Nadaraya-Watson estimator (red) with “best” bandwidth.
Residuals obtained with the Nadaraya-Watson estimator. The red dashed lines correspond to
Nadaraya-Watson estimator (red), smoothing splines estimator (blue) and wavelet estimator (black; Haar wavelets, soft thresholding, universal threshold).
- Keeping in line with Wasserman (2006), we consider that providing an estimate
$\hat{r}$ of a curve$r$ is not sufficient for drawing scientific conclusions. - We would like to provide a \alert{confidence set} for
$r$ in the form of a band:$$\mathcal{B}=\left\{s : l(t) ≤ s(t) ≤ u(t), \; ∀ t ∈ [a,b]\right\}\, $$ based on a pair of functions$\left(l(t),u(t)\right)$ . - We would like to have:
$$\mathrm{Pr}\left\{r ∈ \mathcal{B} \right\} ≥ 1 - α $$ for all$r ∈ \mathcal{R}$ where$\mathcal{R}$ is a large class of functions.
- When working with smoothers, our estimators exhibit a bias that does not disappear even with large sample sizes.
- We will therefore try to built sets around
$\overline{r} = \mathrm{E}(\hat{r})$ ; that will be sufficient to address some of the questions we started with. - For a linear smoother, $\hat{r}(t) = ∑i=1^k l_i(t) Z_i$, we have: $$\overline{r}(t) = \mathrm{E}\left(\hat{r}(t)\right) = ∑i=1^k l_i(t) r(t_i)$$ and $$\mathrm{Var}\left(\hat{r}(t)\right) = σ^2 \, ∑i=1^k l_i(t)^2 = (1/n) \|l(t)\|^2\, .$$ Remember that we stabilized the variance at
$1/n$ . - We will consider a confidence band for
$\overline{r}(t)$ of the form:$$I(t) = \left(\hat{r}(t) - c \|l(t)\|/\sqrt{n},\hat{r}(t) + c \|l(t)\|/\sqrt{n}\right) \, ,$$ for some$c > 0$ and$a ≤ t ≤ b$ .
Variance stabilized data (black) Nadaraya-Watson estimator (blue) and 0.95 confidence band (red).
20 stimulation with citronellal. Stimulation are delivered during 500 ms (gray background). Is neuron 2 responding to the stimulation?
Since the null hypothesis is a constant, there is no bias and we can increase the bandwidth (right side) if necessary.
Neuron 1: 20 stimulation with citronellal, terpineol and a mixture of the two. \alert{Are the reponses any different?}
- We start like previously by building a “classical” PSTH with very fine bins (25 ms) with the citronellal and terpineol trials to get: $\{y_1citron,\ldots,y_kcitron\}$ and $\{y_1terpi,\ldots,y_kterpi\}$.
- We stabilize the variance as we did before (
$z_i = 2 \sqrt{(y_i+0.25)/n}$ ) to get: $\{z_1citron,\ldots,z_kcitron\}$ and $\{z_1terpi,\ldots,z_kterpi\}$. - Our null hypothesis is that the two underlying inhomogeneous Poisson processes are the same, therefore: $$z_icitron = r(t_i) + ε_icitron σ \quad \textrm{and} \quad z_iterpi = r(t_i) + ε_iterpi σ \, ,$$ then $$z_iterpi - z_icitron = \sqrt{2} ε_i σ \, .$$
- We then want to test if our collection of observed differences $\{z_1terpi - z_1citron,\ldots,z_kterpi - z_kcitron\}$ is compatible with
$k$ IID draws from$\mathcal{N}(0,2σ^2$ ).
- R Durrett (2009) Probability: Theory and Examples. CUP. Sec. 7.6, pp 323-329 ;
- P Billingsley (1999) Convergence of Probability Measures. Wiley. p 121.
- Under our null hypothesis (same inhomogeneous Poisson process for citronellal and terpineol), the random variables: $$\frac{Z_iterpi - Z_icitron}{\sqrt{2} σ} \, ,$$ should correspond to the
$X_i$ of Donsker’s theorem. - We can then construct
$S_k(t)$ and check if the observed trajectory looks Brownian or not. - Ideally, we would like to define a domain in
$[0,1] × \mathbb{R}$ containing the realizations of a canonical Brownian motion with a given probability. - To have a reasonable power, we would like the surface of this domain to be minimal.
Does this look like the realization of a canonical Brownian motion?
- In a (non trivial) paper, Kendall, Marin et Robert (2007) showed that the upper boundary of this minimal surface domain is given by: $$u∗(t) ≡ \sqrt{-W-1\left(-(κ t)^2) \right)} \, \sqrt{t}, \quad \mathrm{for} \quad κ \, t ≤ 1/\sqrt{e}$$ where W-1 is the secondary real branch of the Lambert W function (defined as the solution of
$W(z) exp W(z) = z$ );$κ$ being adjusted to get the desired probability. - They also showed that a domain whose upper boundary is given by:
$u(t) = a + b \sqrt{t}$ is almost of minimal surface ($a > 0$ and$b > 0$ being adjusted to get the correct probability). - Loader and Deely (1987) give a very efficient algorithm to adjust
$a$ and$b$ or$κ$ . - The
R
packageSTAR
(Spike Train Analysis with R) provides all that (and much more) out of the box.
Almost minimal surface domains with probabilities 0.95 (dashed red) and 0.99 (red) of containing an observed canonical Brownian motion. Black: terpineol - citronellal; blue: odd terpineol trials - even terpineol trials.
## Get the part of n2citron preceeding the stimulation
np.sum(n2citron_x <= 6)
n2citron_y_b = n2citron_y[:500]
## Get the part of the SAME length coming just after
n2citron_y_r = n2citron_y[500:1000]
## Get the normalized partial sum of the difference process
n2citron_y_d = np.cumsum((n2citron_y_r - n2citron_y_b))*np.sqrt(10/500)
## Do the plot for the test
yy = np.linspace(0,1,500)
plt.plot(yy,n2citron_y_d)
plt.plot(yy,c95(yy),color='red',lw=2,linestyle='dashed')
plt.plot(yy,-c95(yy),color='red',lw=2,linestyle='dashed')
plt.plot(yy,c99(yy),color='red',lw=2)
plt.plot(yy,-c99(yy),color='red',lw=2)
- Rune, Susanne, Henrik and Massimiliano for inviting me at this wonderful workshop.
- The SynchNeuro ANR project for paying for my flight.
- Antoine Chaffiol for the data.
- Chong Gu for his
R
packagegss
. - My colleagues Yves Rozenholc and Avner Bar-Hen for discussions.
- Vilmos Prokaj, Olivier Faugeras and Jonathan Touboul for pointing Donsker’s theorem to me.
- You, for listening.
Confidence bands computed on [6,14]. 0.78 was chosen because
- Probabilists working on processes use the filtration or history: a family of increasing sigma algebras, $\left(\mathcal{F}_t\right)0\leq t \leq ∞$, such that all the information related to the process at time
$t$ can be represented by an element of$\mathcal{F}_t$ . - The conditional intensity of a counting process
$N(t)$ is then defined by: $$ λ(t \mid \mathcal{F}_t) ≡ limh ↓ 0 \frac{\mathrm{Prob}\{N(t+h)-N(t)=1 \mid \mathcal{F}_t\}}{h} \; .$$ -
$λ$ constitutes an exhaustive description of process / spike train.
As soon as we adopt a conditional intensity based formalism, we must:
- Find an estimator
$\hat{λ}$ of$λ$ . - Find goodness of fit tests.
We start by associating to
-
If our model is correct (
$\hat{λ} ≈ λ$ ), the density of successive spikes after time transformation:$$\{t_1,\ldots,t_n\} → \{Λ(t_1) = Λ_1,\ldots,Λ(t_n) = Λ_n\}$$ is exponential with parameter 1. - Stated differently, the point process
$\{Λ_1,\ldots,Λ_n\}$ is a homogeneous Poisson process with parameter 1.
The next slides illustrate this result.
- If, for a good model, the transformed sequence of spike times,
$\{\hat{Λ}_1,\ldots,\hat{Λ}_n\}$ , is the realization of a homogeneous Poisson process with rate 1, we should test$\{\hat{Λ}_1,\ldots,\hat{Λ}_n\}$ against such a process. - This is what Yosihiko Ogata proposed in 1988 (Statistical models for earthquake occurrences and residual analysis for point processes, Journal of the American Statistical Association, 83: 9-27).
- But an observation suggest nevertheless that another type of test could also be used…
- The intuition of the convergence—of a properly normalized version—of the process
$N(Λ) - Λ$ towards a Brownian motion is correct. - This is an easy consequence of Donsker’s theorem as Vilmos Prokaj explained to me on the
R
mailing and as Olivier Faugeras and Jonathan Touboul explained to me directly. - It is moreover possible to find regions of minimal area having a given probability to contain the whole trajectory of a canonical Brownian motion (Kendall, Marin et Robert, 2007; Loader et Deely, 1987).
- We get thereby a new goodness of fit test.