# Grid based models

In (1), an explicitly grid-based self-exciting point process (SEPP) model was considered.  The paper (1) is mostly concerned with two field-trials of the SEPP or epidemic-type aftershock sequence (ETAS) type prediction models.  There is no real discussion as to why a grid based model was chosen, compared to the earlier continuous space model using kernel density estimation.

### References:

1. Mohler et al. "Randomized Controlled Field Trials of Predictive Policing". Journal of the American Statistical Association (2015) DOI:10.1080/01621459.2015.1077710

# The model

The area of interest is divided into a grid; 150m squares were used in (1).  To quote from (1):

> The size of the grid cells on which μ is defined can be estimated by
maximum likelihood and in general the optimum size of the grid
cell will decrease with increasing data. However, for a fixed area
flagged for patrol, a greater number of small hotspots are more
difficult to patrol than a small number of large hotspots. The
150 × 150 m hotspots were chosen in this study to be the size
of a city block in Foothill and were then held constant across all
of the experimental regions.

The model looks _at each grid cell individually_ and uses what we might recognise as a Hawkes process (in the original form studied by Hawkes, not the more general form).  In grid cell $n$, the conditional intensity $\lambda_n(t)$ is given by

$$ \lambda_n(t) = \mu_n + \sum_{t^i_n < t} \theta \omega e^{-\omega(t-t^i_n)} $$

where here:

- $\mu_n$ is the background rate in cell $n$.
- $\omega$ and $\theta$ control the "near-repeat" behaviour.  Notice that these do not depend upon $n$.
- As always, we sum over events $t^i_n$ which have occurred before the time of interest $t$, and _which have occurred in the same grid cell_.

It seems surprising that inter-cell interactions are not considered at all in this model.

# Model fitting

I do not quite understand the paper (1) here.  The following is what I think _should_ have been written.

We make initial estimates for $\mu_n$ (perhaps setting each $\mu_n$ to be the same) and $\omega, \theta$.  Then we alternate the following two steps until convergence:

### Expectation step:

Compute the upper triangular "probability matrix":

$$ p^{n}_{ji} \propto \theta\omega e^{-\omega(t_n^i - t^n_j)}, \qquad p^n_{ii} \propto \mu_n $$

where $p^{n}_{ji}$ is the probability, in cell $n$, then event $j$ triggered event $i$.  We set $p^n_{ji} = 0$ if $j > i$ and then normalise so that each column of the matrix sums to $1$ (which is why in the above formula I only give what $p^n_{ji}$ is proportional to).

### Maximisation step:

We then compute the new parameters:

$$ \omega = \frac{\sum_n \sum_{i<j} p^n_{ij}}{\sum_n \sum_{i<j} p^n_{ij}(t_n^j - t_n^i)}, \qquad
\theta = \frac{\sum_n \sum_{i<j} p^n_{ij}}{\sum_n \sum_j 1}, \qquad \mu_n = \frac{\sum_i p^n_{ii}}{T} $$
where $T$ is the time window of data under consideration.

(Note here I have given an estimate for $\mu_n$, whereas (1) seems to sum over $n$ and give an estimate for $\mu$, the total rate across the study area, but does not specify a suitable estimate for $\mu_n$.)