vignettes/pointprocess.Rmd

---
title: "Log-Gaussian Cox processes on metric graphs"
author: "David Bolin, Alexandre B. Simas"
date: "Created: 2023-01-30. Last modified: `r Sys.Date()`."
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Log-Gaussian Cox processes on metric graphs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
references:
- id: lgcp
  title: "Going off grid: computationally efficient inference for log-Gaussian Cox processes"
  author:
  - family: Simpson
    given: D
  - family: Illian
    given: J. B.
  - family: Lindgren
    given: F.
  - family: Sørbye
    given: S. H.
  - family: Rue
    given: H.
  container-title: Biometrika
  type: article
  issued:
  year: 2016
- id: xiong22
  title: "Covariance-based rational approximations of fractional SPDEs for computationally efficient Bayesian inference"
  author:
  - family: Xiong
    given: Zhen
  - family: Simas
    given: Alexandre B.
  - family: Bolin
    given: David
  container-title: Bernoulli
  type: article
  issue: 30
  pages: 1611-1639
  issued:
    year: 2024
- id: graph_fem
  title: "Regularity and numerical approximation of fractional elliptic differential equations on compact metric graphs"
  author:
  - family: Bolin
    given: David
  - family: Kovács
    given: Mihály
  - family: Kumar
    given: Vivek
  - family: Simas
    given: Alexandre B.
  container-title: Mathematics of Computation
  type: article
  issued:
  year: 2023
- id: BSW2023PPST
  title: "Log-Cox Gaussian processes and space-time models on compact metric graphs"
  author:
  - family: Bolin
    given: David
  - family: Simas
    given: Alexandre B.
  - family: Wallin
    given: Jonas
  container-title: In preparation
  type: preprint
  issued:
    year: 2023
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
set.seed(1)
```

## Introduction
        
In this vignette we will introduce how to work with log-Gaussian Cox processes
based on Whittle--Matérn fields on metric graphs. To simplify the integration
with `R-INLA` and `inlabru` hese models are constructed using finite element 
approximations as implemented in the `rSPDE` package.  The theoretical details will be given in the forthcoming article [@BSW2023PPST].
   
## Constructing the graph and the mesh
We begin by loading the `rSPDE`, `MetricGraph` and `INLA` packages:

```{r, message=FALSE, warning=FALSE}
library(rSPDE)
library(MetricGraph)
library(INLA)
```

As an example, we consider the default graph in the package:

```{r, fig.show='hold',fig.align = "center",echo=TRUE, warning=FALSE, message=FALSE}
graph <- metric_graph$new(tolerance = list(vertex_vertex = 1e-1, vertex_edge = 1e-3, edge_edge = 1e-3),
                          remove_deg2 = TRUE)
graph$plot()
```

To construct a FEM approximation of a Whittle--Matérn field,
we must first construct a mesh on the graph.
```{r, fig.show='hold',fig.align = "center",echo=TRUE,  warning=FALSE}
graph$build_mesh(h = 0.1)
graph$plot(mesh=TRUE)
```

The next step is to build the mass and stiffness matrices for the FEM basis.
```{r}
  graph$compute_fem()
```

We are now ready to specify the and sample from a log-Gaussian Cox process model
with intensity $\lambda = \exp(\beta + u)$ where $\beta$ is an intercept and 
$u$ is a Gaussian Whittle--Matérn field specified by
$$
(\kappa^2 - \Delta)^{\alpha/2} \tau u = \mathcal{W}.
$$
For this we can use the function `graph_lgcp` as follows:
```{r}
  sigma <- 0.5
  range <- 1.5
  alpha <- 2
  lgcp_sample <- graph_lgcp(intercept = 1, sigma = sigma,
                            range = range, alpha = alpha,
                            graph = graph)
```
The object returned by the function is a list with the simulated Gaussian process and the points on the graph. We can plot the simulated intensity function as
```{r, fig.show='hold',fig.align = "center",echo=TRUE,  warning=FALSE}
graph$plot_function(X = exp(lgcp_sample$u), vertex_size = 0)
```

To plot the simulated points, we can add them to the graph and then plot:
```{r, fig.show='hold',fig.align = "center",echo=TRUE,  warning=FALSE}
graph$add_observations(data = data.frame(y=rep(1,length(lgcp_sample$edge_loc)),
                                         edge_number = lgcp_sample$edge_numbers,
                                         distance_on_edge = lgcp_sample$edge_loc),
                       normalized = TRUE)
graph$plot(vertex_size = 0, data = "y")
```


## Fitting LGCP models in R-INLA


We are now in a position to fit the model with our `R-INLA` implementation. 
When working with log-Gaussian Cox processes, the likelihood has a term 
$\int_\Gamma \exp(u(s)) ds$ that needs to be handled separately. 
This is done by using the mid-point rule as suggested for SPDE models by [@lgcp](https://academic.oup.com/biomet/article/103/1/49/2389990)
where we approximate 
$$
\int_\Gamma \exp(u(s)) ds \approx \sum_{i=1}^p \widetilde{a}_i \exp\left(u(\widetilde{s}_i)\right).
$$
Using the fact that $u(s) = \sum_{j=1}^n \varphi(s) u_i$ from the FEM approximation, 
we can write the integral as $\widetilde{\alpha}^T\exp(\widetilde{A}u)$ where
$\widetilde{A}_{ij} = \varphi_j(\widetilde{s}_i)$ and $\widetilde{a}$ is a vector
with integration weights. These quantities can be obtained as
```{r}
Atilde <- graph$fem_basis(graph$mesh$VtE)
atilde <- graph$mesh$weights
```
The weights are used as exposure terms in the Poisson likelihiood in R-INLA. Because 
of this, the easiest way to construct the model is to add the integration points 
as zero observations in the graph, with corresponding exposure weights. We also 
need to add the exposure terms (which are zero) for the actual observation locations:
```{r}
#clear the previous data in the graph
graph$clear_observations()

#Add the data together with the exposure terms
graph$add_observations(data = data.frame(y = rep(1,length(lgcp_sample$edge_loc)),
                                         e = rep(0,length(lgcp_sample$edge_loc)),
                                         edge_number = lgcp_sample$edge_number,
                                         distance_on_edge = lgcp_sample$edge_loc),
                       normalized = TRUE)

#Add integration points
graph$add_observations(data = data.frame(y = rep(0,length(atilde)),
                                         e = atilde,
                                         edge_number = graph$mesh$VtE[,1],
                                         distance_on_edge = graph$mesh$VtE[,2]),
                       normalized = TRUE)

```
We now create the `inla` model object with the
`graph_spde` function. For simplicity, we assume that $\alpha$ is known and fixed
to the true value in the model.  

```{r}
rspde_model <- rspde.metric_graph(graph, nu = alpha - 1/2)
```

Next, we compute the auxiliary data:

```{r}
data_rspde <- graph_data_rspde(rspde_model, name="field")
```

We now create the `inla.stack` object with the `inla.stack()` function. 
At this stage, it is important that the data has been added to the `graph` since
it is supplied to the stack by using the `graph_spde_data()` function. 

```{r}
stk <- inla.stack(data = data_rspde[["data"]], 
                  A = data_rspde[["basis"]], 
                  effects = c(data_rspde[["index"]], list(Intercept = 1)))
```

We can now fit the model using `R-INLA`:

```{r, message=FALSE, warning=FALSE}
spde_fit <- inla(y ~ -1 + Intercept + f(field, model = rspde_model), 
                 family = "poisson", data = inla.stack.data(stk),
                 control.predictor = list(A = inla.stack.A(stk), compute = TRUE),
                 E = inla.stack.data(stk)$e)
```

Let us extract the estimates in the original scale
by using the `spde_metric_graph_result()` function, then
taking a `summary()`:
      
```{r}
spde_result <- rspde.result(spde_fit, "field", rspde_model)

summary(spde_result)
```

We will now compare the means of the estimated values with
the true values:

```{r}
  result_df <- data.frame(
    parameter = c("std.dev", "range"),
    true = c(sigma, range),
    mean = c(
      spde_result$summary.std.dev$mean,
      spde_result$summary.range$mean
    ),
    mode = c(
      spde_result$summary.std.dev$mode,
      spde_result$summary.range$mode
    )
  )
  print(result_df)
```


We can also plot the posterior marginal densities with the 
help of the `gg_df()` function:
```{r, fig.show='hold',fig.align = "center",echo=TRUE,  warning=FALSE}
  posterior_df_fit <- gg_df(spde_result)

  library(ggplot2)

  ggplot(posterior_df_fit) + geom_line(aes(x = x, y = y)) + 
  facet_wrap(~parameter, scales = "free") + labs(y = "Density")
```

Finally, we can plot the estimated field $u$:
```{r, fig.show='hold',fig.align = "center",echo=TRUE,  warning=FALSE}
n.obs <- length(graph$get_data()$y)
n.field <- dim(graph$mesh$VtE)[1]
u_posterior <- spde_fit$summary.linear.predictor$mean[(n.obs+1):(n.obs+n.field)]
graph$plot_function(X = u_posterior, vertex_size = 0)
```

This can be compared with the field that was used to generate the data:
```{r, fig.show='hold',fig.align = "center",echo=TRUE,  warning=FALSE}
graph$plot_function(X = lgcp_sample$u, vertex_size = 0)
```

# An example with replicates
Let us now test show an example with replicates. Let us first simulate replicates of a latent field
```{r}
  n.rep <- 30
  sigma <- 0.5
  range <- 1.5
  alpha <- 2
  lgcp_sample <- graph_lgcp(n = n.rep, intercept = 1, sigma = sigma,
                            range = range, alpha = alpha,
                            graph = graph)
```
  We now clear the previous data and add the new data together with the exposure terms
```{r}
  graph$clear_observations()
  df_rep <- data.frame(y=rep(1,length(lgcp_sample[[1]]$edge_loc)),
                                             e = rep(0,length(lgcp_sample[[1]]$edge_loc)),
                                         edge_number = lgcp_sample[[1]]$edge_number,
                                         distance_on_edge = lgcp_sample[[1]]$edge_loc,
                                         rep = rep(1,length(lgcp_sample[[1]]$edge_loc)))

  df_rep <- rbind(df_rep, data.frame(y = rep(0,length(atilde)),
                                         e = atilde,
                                         edge_number = graph$mesh$VtE[,1],
                                         distance_on_edge = graph$mesh$VtE[,2],
                                         rep = rep(1,length(atilde))))
  for(i in 2:n.rep){
    df_rep <- rbind(df_rep, data.frame(y=rep(1,length(lgcp_sample[[i]]$edge_loc)),
                                             e = rep(0,length(lgcp_sample[[i]]$edge_loc)),
                                         edge_number = lgcp_sample[[i]]$edge_number,
                                         distance_on_edge = lgcp_sample[[i]]$edge_loc,
                                         rep = rep(i,length(lgcp_sample[[i]]$edge_loc))))
    df_rep <- rbind(df_rep, data.frame(y = rep(0,length(atilde)),
                                         e = atilde,
                                         edge_number = graph$mesh$VtE[,1],
                                         distance_on_edge = graph$mesh$VtE[,2],
                                         rep = rep(i,length(atilde))))                                        

  }

      graph$add_observations(data = df_rep,
                       normalized = TRUE,
                        group = "rep")

```
We can now define and fit the model as previously
```{r}
rspde_model <- rspde.metric_graph(graph, nu = alpha - 1/2)

data_rspde <- graph_data_rspde(rspde_model, name="field", repl = ".all")

stk <- inla.stack(data = data_rspde[["data"]], 
                  A = data_rspde[["basis"]], 
                  effects = c(data_rspde[["index"]], list(Intercept = 1)))

spde_fit <- inla(y ~ -1 + Intercept + f(field, model = rspde_model, replicate = field.repl), 
                 family = "poisson", data = inla.stack.data(stk),
                 control.predictor = list(A = inla.stack.A(stk), compute = TRUE),
                 E = inla.stack.data(stk)$e, verbose=TRUE)
```
Let's look at the summaries
```{r}
spde_result <- rspde.result(spde_fit, "field", rspde_model)
summary(spde_result)
result_df <- data.frame(
    parameter = c("std.dev", "range"),
    true = c(sigma, range),
    mean = c(
      spde_result$summary.std.dev$mean,
      spde_result$summary.range$mean
    ),
    mode = c(
      spde_result$summary.std.dev$mode,
      spde_result$summary.range$mode
    )
  )
  print(result_df)
```
## References