Skip to content

Latest commit

 

History

History
1466 lines (1276 loc) · 61.3 KB

PouzatMathStatNeuroJuly2014.org

File metadata and controls

1466 lines (1276 loc) · 61.3 KB

Peri-Stimulus Time Histograms Estimation Through Poisson Regression Without Generalized Linear Models

Setup

Setting up org

(require 'ox-beamer)
(setq org-beamer-outline-frame-options "")
(setq org-export-babel-evaluate nil)

The Data and the Questions

Data’s origin

Viewed “from the outside”, neurons generate brief electrical pulses: the action potentials figs/BrainProbeData.png

Left, the brain of an insect with the recording probe on which 16 electrodes (the bright spots) have been etched. Each probe’s branch has a 80 $μm$ width. Right, 1 sec of data from 4 electrodes. The spikes are the action potentials.

Spike trains

After a “rather heavy” pre-processing called spike sorting, the raster plot representing the spike trains can be built:

figs/exemple-raster.png

Modeling spike trains: Why and How?

  • A key working hypothesis in Neurosciences states that the spikes’ occurrence times, as opposed to their waveform are the only information carriers between brain region (Adrian and Zotterman, 1926).
  • This hypothesis encourages the development of models whose goal is to predict the probability of occurrence of a spike at a given time, without necessarily considering the biophysical spike generation mechanisms.
  • In the sequel we will identify spike trains with point process / counting process realizations.

A tough case in the “stationary regime”

figs/exemple-de-compteurs-en-regime-spont-complique-1.png

The expected counting process of a homogeneous Poisson process—with the same mean frequency—is shown in red.

A tough case (2)

A renewal process is inadequate here: the rank of successive inter spike intervals are correlated.

figs/exemple-de-compteurs-en-regime-spont-complique-2.png

A tough case (2’)

We split the ranks into 5 categories of equal size.

figs/exemple-de-compteurs-en-regime-spont-complique-2prime.png

A tough case (3)

ECDF of interval k+1 rank conditioned on the class of interval k rank. 95% confidence bands would have here a width of 0.14.

figs/exemple-compteur-complique-correllation-2.png

First Consequence

Even if we focus on “isolated” neurons in the stationary / homogeneous regime, renewal processes won’t be adequate in general as models of our observed spike trains.

Interactions between neurons

figs/exemple-interaction.png

Neuron’s 3 spike times relative to neuron’s 2 (ref. neuron) spike times.

\alert{Consequence}: our model should be able to handle interactions.

Non-stationary regime: odor responses

figs/exemple-exemple-citronellal.png

20 stimulation with citronellal. Stimulation are delivered during 500 ms (gray background). Is neuron 2 responding to the stimulation? Cockroach (Periplaneta americana) recordings and spike sorting by Antoine Chaffiol.

Non-stationary regime: odor responses (2)

figs/exemple-n1-odeurs.png

Neuron 1: 20 stimulation with citronellal, terpineol and a mixture of the two. \alert{Are the reponses any different?}

Model requirements

Our model should give room for:

  • The elapsed time since the last spike of the neuron (enough for homogeneous renewal processes).
  • Variables related to the discharge history—like the duration of the last inter spike interval.
  • The elapsed time since the last spike of a “functionally coupled” neuron.
  • The elapsed time since the beginning of an applied stimulation.

What do we want?

  • We want to estimate the peri-stimulus time histogram (PSTH) considered as an observation from an inhomogeneous Poisson process.
  • In addition to estimation we want to:
    • Test if a neuron is responding to a given stimulation.
    • Test if the responses of a given neuron to two different stimulations are different.
  • This implies building some sort of confidence bands around our best estimation.

First tool

The PSTH

figs/exemple-construction-estimateur-freq-moyenne.png

We go from the raw data to an histogram built with a tiny time step (25 ms), leading to an estimator with little bias and large variance.

The PSTH (2)

  • We model this “averaged process” as an inhomogeneous Poisson process with intensity λ(t).
  • The histogram we just built can then be seen as the observation of a collection of Poisson random variables, $\{Y_1,\ldots,Y_k\}$, with parameters: $$n \, ∫t_i-δ/2t_i+δ/2λ(u) \, du \; ≈ \; n \, λ(t_i) \, δ \; , \quad i = 1,\ldots,k \; ,$$ where $t_i$ is the center of a class (bin), δ is the bin width, $n$ is the number of stimulations and $k$ is the number of bins.
  • A piecewise constant estimator of λ(t) is then obtained with:$$\hat{λ}(t) = y_i/(n δ)\, , \quad \textrm{if} \quad t ∈ [t_i-δ/2,t_i+δ/2) \; .$$ This is the “classical” PSTH.

The PSTH (3)

  • We are going to assume that λ(t) is \alert{smooth}—this is a very reasonable assumption given what we know about the insect olfactory system.
  • We can then attempt to improve on the “classical” PSTH by trading a little bias increase for (an hopefully large) variance decrease.
  • Many nonparametric methods are available to do that: kernel regression, local polynomials, smoothing splines, wavelets, etc.
  • A problem in the case of the PSTH is that the observed counts ($\{y_1,\ldots,y_k\}$) follow Poisson distributions with different parameters implying that they have different variances.
  • We have then at least two possibilities: i) use a generalized linear model (GLM); ii) transform the data to stabilize the variance.
  • We are going to use the second approach.

Variance stabilisation

  • Following Brown, Cai and Zhou (2010), let’s consider $X_1,\ldots,X_n$ IID from a Poisson distribution with parameter $ν$.
  • Define $X = ∑j=1n X_j$, the CLT gives us: $$\sqrt{n}\left(X/n-ν\right) \stackrel{L}{→} \mathcal{N}(0,ν) \quad \textrm{as} \; n → ∞ \, .$$
  • A variance stabilizing transformation is a function $G : \mathbb{R} → \mathbb{R}$, such that:$$ G’(x) = 1/\sqrt{x}\, .$$
  • The delta method (or the error propagation method; a first order Taylor expansion) then yields:$$\sqrt{n}\left(G(X/n)-G(ν)\right) \stackrel{L}{→} \mathcal{N}(0,1)\, . $$

Variance stabilisation (2)

  • It is known (Anscombe, 1948) that the variance stabilizing properties can be further improved by using transformation of the form:$$H_n(X) = G\left(\frac{X+a}{n+b}\right)$$ for suitable choices of $a$ and $b$.
  • In nonparametric regression we want to set $a$ and $b$ such that $\mathrm{E}\left(H_n(X)\right)$ optimally matches $G(ν)$.
  • Brown, Cai and Zhou (2010) show that in all relevant PSTH estimation problems we have: $$\mathrm{Var}\left(2 \sqrt{(X+1/4)/n}\right) = \frac{1}{n} + O(n-2) \, .$$
  • They also show that: $$\mathrm{E}\left(2 \sqrt{(X+1/4)/n}\right) - 2 \sqrt{ν} = O(n-2) \, .$$
  • They get similar transformations for binomial and negative binomial random variables.

Example

figs/make-n1citron-histos-figure.png

Nonparametric estimation

  • Since our knowledge of the biophysics of these neurons and of the network they form is still in its infancy, we can hardly propose a reasonable parametric from for our PSTHs (or their variance stabilized versions).
  • We therefore model our stabilized PSTH by: $$Z_i \doteq 2 \sqrt{(Y_i+1/4)/n} = r(t_i) + ε_i σ \, ,$$ where the $ε_i \stackrel{\textrm{IID}}{∼} \mathcal{N}(0,1)$, $r$ is assumed “smooth” and is estimated with a linear smoother (kernel regression, local polynomials, smoothing splines) or with wavelets (or with any nonparametric method you like).

Nonparametric estimation (2)

  • Following Larry Wasserman (All of Nonparametric Statistics, 2006) we define a linear smoother by a collection of functions $l(t) = \left(l_1(t),\ldots,l_k(t)\right)^T$ such that: $$\hat{r}(t) = ∑i=1^k l_i(t) Z_i\, . $$
  • The simplest smoother we are going to use is built from the tricube kernel: $$K(t) = \frac{70}{81}\left(1 - \left|t\right|^3\right)^3 I(t) \, ,$$ where $I(t)$ is the indicator function of $[-1,1]$.
  • The functions $l_i$ are then defined by: $$l_i(t) = \frac{K\left(\frac{t-t_i}{h}\right)}{∑j=1^k K\left(\frac{t-t_j}{h}\right)}\, .$$

Nonparametric estimation (3)

  • When using this kind of approach the choice of the bandwidth $h$ is clearly critical.
  • Since after variance stabilization the variance is known we can set our bandwidth by minimizing Mallows’ $C_p$ criterion instead of using cross-validation. For (soft) wavelet thresholding we use the universal threshold that requires the knowledge (or an estimation) of the variance.
  • More explicitly, with linear smoothers our estimations $\left(\widehat{r}(t_1),\ldots,\widehat{r}(t_k)\right)^T$ can be written in matrix form as: $$\widehat{\mathbf{r}} = L(h) \, \mathbf{Z} \, ,$$ where $L(h)$ is the $k × k$ symmetric matrix whose element $(i,j)$ is given by $l_i(t_j)$.

Nonparametric estimation (4)

  • Ideally we would like to set $\widehat{h}$ as: $$arg\minh (1/k) ∑i=1^k \left(r(t_i) - \hat{r}(t_i)\right)^2 \, .$$
  • But we don’t know $r$ (that’s what we want to estimate!) so we minimize Mallows’ $C_p$ criterion: $$ (1/k) ∑i=1^k \left(Z_i - \hat{r}(t_i)\right)^2 + 2 σ^2 \mathrm{tr}\left(L(h)\right)/k \, ,$$ where $\mathrm{tr}\left(L(h)\right)$ stands for the trace of $L(h)$.
  • If we don’t know $σ^2$, we minimize the cross-validation criterion: $$\frac{1}{k} ∑i=1^k \frac{\left(Z_i - \hat{r}(t_i)\right)^2}{1-Lii(h)} \, .$$

Nonparametric estimation (5)

figs/n1citron-Nadaraya-Watson-estimator.png

Left: CV score in black, Cp score in red. Right: Variance stabilized data (black) with Nadaraya-Watson estimator (red) with “best” bandwidth.

Nonparametric estimation (6)

figs/n1citron-Nadaraya-Watson-residual.png

Residuals obtained with the Nadaraya-Watson estimator. The red dashed lines correspond to $± σ$.

Nonparametric estimation (7)

figs/n1citron-all-estimators.png

Nadaraya-Watson estimator (red), smoothing splines estimator (blue) and wavelet estimator (black; Haar wavelets, soft thresholding, universal threshold).

Confidence sets

Confidence sets

  • Keeping in line with Wasserman (2006), we consider that providing an estimate $\hat{r}$ of a curve $r$ is not sufficient for drawing scientific conclusions.
  • We would like to provide a \alert{confidence set} for $r$ in the form of a band: $$\mathcal{B}=\left\{s : l(t) ≤ s(t) ≤ u(t), \; ∀ t ∈ [a,b]\right\}\, $$ based on a pair of functions $\left(l(t),u(t)\right)$.
  • We would like to have: $$\mathrm{Pr}\left\{r ∈ \mathcal{B} \right\} ≥ 1 - α $$ for all $r ∈ \mathcal{R}$ where $\mathcal{R}$ is a large class of functions.

Confidence sets (2)

  • When working with smoothers, our estimators exhibit a bias that does not disappear even with large sample sizes.
  • We will therefore try to built sets around $\overline{r} = \mathrm{E}(\hat{r})$; that will be sufficient to address some of the questions we started with.
  • For a linear smoother, $\hat{r}(t) = ∑i=1^k l_i(t) Z_i$, we have: $$\overline{r}(t) = \mathrm{E}\left(\hat{r}(t)\right) = ∑i=1^k l_i(t) r(t_i)$$ and $$\mathrm{Var}\left(\hat{r}(t)\right) = σ^2 \, ∑i=1^k l_i(t)^2 = (1/n) \|l(t)\|^2\, .$$ Remember that we stabilized the variance at $1/n$.
  • We will consider a confidence band for $\overline{r}(t)$ of the form: $$I(t) = \left(\hat{r}(t) - c \|l(t)\|/\sqrt{n},\hat{r}(t) + c \|l(t)\|/\sqrt{n}\right) \, ,$$ for some $c > 0$ and $a ≤ t ≤ b$.

Confidence set (3)

Following Sun and Loader (1994), we have: $$\begin{array}{l l l} \mathrm{Pr}\left\{\overline{r}(t) ∉ I(t) \textrm{ for some } t ∈ [a,b]\right\} & = & \mathrm{Pr}\left\{maxt ∈ [a,b] \frac{|\hat{r}(t)-\overline{r}(t)|}{\|l(t)\|/\sqrt{n}} > c\right\} \, ,\ & = & \mathrm{Pr}\left\{maxt ∈ [a,b] \frac{|∑i=1^k (ε_i/\sqrt{n}) l_i(t)|}{\|l(t)\|/\sqrt{n}} > c\right\} \, ,\ & = & \mathrm{Pr}\left\{maxt ∈ [a,b] |W(t)| > c\right\} \, ,\end{array}$$ where $W(t) = ∑i=1^k ε_i l_i(t)/\|l(t)\|$ is a Gaussian process. To find $c$ we need to know the distribution of the maximum of a Gaussian process. Sun and Loader (1994) showed the tube formula: $$\mathrm{Pr}\left\{maxt ∈ [a,b] |∑i=1^k ε_i l_i(t)/\|l(t)\|| > c\right\} ≈ 2\left(1 - Φ(c)\right) + \frac{κ_0}{π} exp - \frac{c^2}{2} \, ,$$ for large $c$, where, in our case, $κ_0 ≈ (b-a)/h \left(∫_a^b K’(t)^2 dt\right)1/2$. We get $c$ by solving: $$2\left(1 - Φ(c)\right) + \frac{κ_0}{π} exp - \frac{c^2}{2} = α \, .$$

Confidence set (4)

figs/n1citron-Nadaraya-Watson-Confidence-Bands.png

Variance stabilized data (black) Nadaraya-Watson estimator (blue) and 0.95 confidence band (red).

Do you remember this slide?

figs/exemple-exemple-citronellal.png

20 stimulation with citronellal. Stimulation are delivered during 500 ms (gray background). Is neuron 2 responding to the stimulation?

Confidence set (5)

figs/n2citron-figure.png

Since the null hypothesis is a constant, there is no bias and we can increase the bandwidth (right side) if necessary.

Second Tool

Remember again?

figs/exemple-n1-odeurs.png

Neuron 1: 20 stimulation with citronellal, terpineol and a mixture of the two. \alert{Are the reponses any different?}

Setting the test

  • We start like previously by building a “classical” PSTH with very fine bins (25 ms) with the citronellal and terpineol trials to get: $\{y_1citron,\ldots,y_kcitron\}$ and $\{y_1terpi,\ldots,y_kterpi\}$.
  • We stabilize the variance as we did before ($z_i = 2 \sqrt{(y_i+0.25)/n}$) to get: $\{z_1citron,\ldots,z_kcitron\}$ and $\{z_1terpi,\ldots,z_kterpi\}$.
  • Our null hypothesis is that the two underlying inhomogeneous Poisson processes are the same, therefore: $$z_icitron = r(t_i) + ε_icitron σ \quad \textrm{and} \quad z_iterpi = r(t_i) + ε_iterpi σ \, ,$$ then $$z_iterpi - z_icitron = \sqrt{2} ε_i σ \, .$$
  • We then want to test if our collection of observed differences $\{z_1terpi - z_1citron,\ldots,z_kterpi - z_kcitron\}$ is compatible with $k$ IID draws from $\mathcal{N}(0,2σ^2$).

Invariance principle / Donsker theorem

Theorem

If $X_1, X_2,\ldots$ is a sequence of IID random variables such that $\mathrm{E}(X_i)=0$ and $\mathrm{E}(X_i^2)=1$, then the sequence of processes: $$ S_k(t) = \frac{1}{\sqrt{k}} ∑i=0⌊ k t ⌋ X_i, \quad 0 ≤ t ≤ 1, \quad X_0=0$$ converges in law towards a canonical Brownian motion.

Proof

You can find a proof in:
  • R Durrett (2009) Probability: Theory and Examples. CUP. Sec. 7.6, pp 323-329 ;
  • P Billingsley (1999) Convergence of Probability Measures. Wiley. p 121.

Recognizing a Brownian motion when we see one

  • Under our null hypothesis (same inhomogeneous Poisson process for citronellal and terpineol), the random variables: $$\frac{Z_iterpi - Z_icitron}{\sqrt{2} σ} \, ,$$ should correspond to the $X_i$ of Donsker’s theorem.
  • We can then construct $S_k(t)$ and check if the observed trajectory looks Brownian or not.
  • Ideally, we would like to define a domain in $[0,1] × \mathbb{R}$ containing the realizations of a canonical Brownian motion with a given probability.
  • To have a reasonable power, we would like the surface of this domain to be minimal.

Recognizing a Brownian motion when we see one (2)

figs/n1-citron-terpi-comp0.png

Does this look like the realization of a canonical Brownian motion?

Recognizing a Brownian motion when we see one (3)

  • In a (non trivial) paper, Kendall, Marin et Robert (2007) showed that the upper boundary of this minimal surface domain is given by: $$u(t) ≡ \sqrt{-W-1\left(-(κ t)^2) \right)} \, \sqrt{t}, \quad \mathrm{for} \quad κ \, t ≤ 1/\sqrt{e}$$ where W-1 is the secondary real branch of the Lambert W function (defined as the solution of $W(z) exp W(z) = z$); $κ$ being adjusted to get the desired probability.
  • They also showed that a domain whose upper boundary is given by: $u(t) = a + b \sqrt{t}$ is almost of minimal surface ($a > 0$ and $b > 0$ being adjusted to get the correct probability).
  • Loader and Deely (1987) give a very efficient algorithm to adjust $a$ and $b$ or $κ$.
  • The R package STAR (Spike Train Analysis with R) provides all that (and much more) out of the box.

Recognizing a Brownian motion when we see one (4)

figs/n1-citron-terpi-comp.png

Almost minimal surface domains with probabilities 0.95 (dashed red) and 0.99 (red) of containing an observed canonical Brownian motion. Black: terpineol - citronellal; blue: odd terpineol trials - even terpineol trials.

Alternative no-response test

## Get the part of n2citron preceeding the stimulation
np.sum(n2citron_x <= 6)
n2citron_y_b = n2citron_y[:500]
## Get the part of the SAME length coming just after
n2citron_y_r = n2citron_y[500:1000]
## Get the normalized partial sum of the difference process
n2citron_y_d = np.cumsum((n2citron_y_r - n2citron_y_b))*np.sqrt(10/500)
## Do the plot for the test
yy = np.linspace(0,1,500)
plt.plot(yy,n2citron_y_d)
plt.plot(yy,c95(yy),color='red',lw=2,linestyle='dashed')
plt.plot(yy,-c95(yy),color='red',lw=2,linestyle='dashed')
plt.plot(yy,c99(yy),color='red',lw=2)
plt.plot(yy,-c99(yy),color='red',lw=2)

Thank you!

I want to thank:
  • Rune, Susanne, Henrik and Massimiliano for inviting me at this wonderful workshop.
  • The SynchNeuro ANR project for paying for my flight.
  • Antoine Chaffiol for the data.
  • Chong Gu for his R package gss.
  • My colleagues Yves Rozenholc and Avner Bar-Hen for discussions.
  • Vilmos Prokaj, Olivier Faugeras and Jonathan Touboul for pointing Donsker’s theorem to me.
  • You, for listening.

Confidence set (6)

figs/n1-citron-terpi-mix-comp.png

Confidence bands computed on [6,14]. 0.78 was chosen because $(1-0.78)^2 ≈ 0.05$. We fail to establish a difference here.

Conditional intensity

Filtration, history and conditional intensity

  • Probabilists working on processes use the filtration or history: a family of increasing sigma algebras, $\left(\mathcal{F}_t\right)0\leq t \leq ∞$, such that all the information related to the process at time $t$ can be represented by an element of $\mathcal{F}_t$.
  • The conditional intensity of a counting process $N(t)$ is then defined by: $$ λ(t \mid \mathcal{F}_t) ≡ limh ↓ 0 \frac{\mathrm{Prob}\{N(t+h)-N(t)=1 \mid \mathcal{F}_t\}}{h} \; .$$
  • $λ$ constitutes an exhaustive description of process / spike train.

Two problems

As soon as we adopt a conditional intensity based formalism, we must:

  • Find an estimator $\hat{λ}$ of $λ$.
  • Find goodness of fit tests.

Time transformation

What to do with $λ$: A summary

We start by associating to $λ$, the integrated intensity: $$ Λ = ∫_0t λ(u \mid \mathcal{F}_u) du \, ,$$ it then easy—but a bit too long for such a brief talk—to show that:

  • If our model is correct ($\hat{λ} ≈ λ$), the density of successive spikes after time transformation: $$\{t_1,\ldots,t_n\} → \{Λ(t_1) = Λ_1,\ldots,Λ(t_n) = Λ_n\}$$ is exponential with parameter 1.
  • Stated differently, the point process $\{Λ_1,\ldots,Λ_n\}$ is a homogeneous Poisson process with parameter 1.

The next slides illustrate this result.

Time transformation illustration

figs/illustration-transformation-du-temps-1.png

Time transformation illustration (2)

figs/illustration-transformation-du-temps-2.png

Time transformation illustration (3)

figs/illustration-transformation-du-temps-3.png

Ogata’s tests

  • If, for a good model, the transformed sequence of spike times, $\{\hat{Λ}_1,\ldots,\hat{Λ}_n\}$, is the realization of a homogeneous Poisson process with rate 1, we should test $\{\hat{Λ}_1,\ldots,\hat{Λ}_n\}$ against such a process.
  • This is what Yosihiko Ogata proposed in 1988 (Statistical models for earthquake occurrences and residual analysis for point processes, Journal of the American Statistical Association, 83: 9-27).
  • But an observation suggest nevertheless that another type of test could also be used…

A Brownian motion?

figs/mouvement-brownien-1.png

A test based on Donsker’s theorem

Donsker’s theorem and minimal area region

  • The intuition of the convergence—of a properly normalized version—of the process $N(Λ) - Λ$ towards a Brownian motion is correct.
  • This is an easy consequence of Donsker’s theorem as Vilmos Prokaj explained to me on the R mailing and as Olivier Faugeras and Jonathan Touboul explained to me directly.
  • It is moreover possible to find regions of minimal area having a given probability to contain the whole trajectory of a canonical Brownian motion (Kendall, Marin et Robert, 2007; Loader et Deely, 1987).
  • We get thereby a new goodness of fit test.

Minimal area region at 95%

figs/region-de-prediction-de-Lambert.png