Bayesian hierarchical models for estimating spatial and temporal patterns in vegetation phenology from Landsat time series
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Data
R
Stan
.gitignore
README.Rmd
phenoBayes.Rproj

README.Rmd

# phenoBayes: Bayesian hierarchical models for estimating spatial and temporal patterns in vegetation phenology from Landsat time series

This repository describes and documents a set of models used to estimate spatial and temporal patterns in vegetation phenology from Landsat data.

**References**

Senf, C., Pflugmacher D., Heurich, M. and Krueger T. (2017) A Bayesian hierarchical model for estimating spatial and temporal variation in vegetation phenology from Landsat time series. *Remote Sensing of Environment*, 194, 155-160.

**How to cite the code**

If you use the code you might want to cite the above mentioned publication and the following DOI:

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.260099.svg)](https://doi.org/10.5281/zenodo.260099)

**License**

This work is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International][1] license.
[1]: https://creativecommons.org/licenses/by-sa/4.0/

## Introduction

Yet there is only one type of models implemented, the **base phenological model**. The base phenological model takes a sample of dense Landsat time series and estimates a set of five spring phenological parameters (season spectral minimum, season spectral magnitude, start of season, green-up rate, and summer green-down). Simultaneously, the model estimates the temporal (inter-annual) variation in the start of season parameter as random effect. That way, changes in spring phenology can be investigated over the time period 1985 to 2016 at any location in the world.

In order to run the models presented here, you need to first install `rstan`. Information on how to installing `rstan` on your machine can be found here: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started.

## Base phenological model

### Simulated data

Using the `simulate_data` function in the R folder we can simulate a set of time series resembling the properties of Landsat time series acquired over a braodleaved forest. There are several parameters that can be adjusted, yet we stick to the defaults and simulate 9 time series:

```{r}
source("R/simulate_data.R")

dat <- simulate_data(M = 9)

dat_spring <- subset(dat$data, dat$data$doy <= 200)
```

**Note:** In order for the simulation function to work properly, the `MASS` package needs to be installed!

The function returns a list containing 1) a data frame with the simulated data, 2) the parameters used for creating the simulated data, and 3) a plot of the simulated data. Following the simulation of the data, we subset the data to only include winter and spring observations. This step simply reduces the number of data points and thus the computational resources needed for sampling the posteriors. 

Using the simulated data we can set up the model via the `rstan` interface. The  base phenological model for the start of season is called *base_pheno_spring.stan* and is located in the Stan folder:

```{r}
# Load library

library(rstan)

# Set rstan settings

rstan_options(auto_write = TRUE)
options(mc.cores = 4)

# Define a centering vector for centering the mean of the five phenological parameters

mu_beta <- c(0, # Season minimum 
             0.5, # Season amplitude
             0.15, # Green-up rate
             120, # Start of season
             0.0005) # Summer green-down
mu_beta <- matrix(mu_beta, ncol = max(dat_spring$pixel), nrow = 5) # Translate into a matrix with dimension: 5*n_pixel

# Define a scaling vector for scaling the variances of the five phenological parameters

sigma_beta_scale <- c(0, # Season minimum
                      0.05, # Season amplitude
                      0.05, # Green-up rate
                      5, # Start of season
                      0.00001) # Summer green-down

# Gather the data into a list in order to pass it to Stan

datalist <- list(N = nrow(dat_spring),
                 NY = max(dat_spring$year),
                 NP = max(dat_spring$pixel),
                 doy = dat_spring$doy,
                 vi = dat_spring$vi,
                 year = dat_spring$year,
                 pixel = dat_spring$pixel,
                 mu_beta = mu_beta,
                 sigma_beta_scale = sigma_beta_scale)

# Pass the datalist and the model to Stam

fit <- stan(file = "Stan/pheno_base_spring.stan", data = datalist, iter = 2000, chains = 4)
```

First, we load the `rstan` library and tell the package to use four cores in parallel. Second, we define two vectors used for centering and scaling the five phenological parameters. This step is purely for numerical reasons and giving informative centering and scaling vectors significantly increases sampling speed. The vectors can be formulated based on prior knowledge or based on other data sources (i.e., climate data or phenological records). Here we set the vector according to the parameters used to simulate the time series. Third, we need to translate the data into a list in order to call it with `rstan` The data list is then given to the `stan` function together with the phenological model in the final line. The sampling is set to 2,000 interations (including 1,000 warm-up iterations) and four chains (run in parallel using four cores).

There are several checks you need to perform on the resulting `stanfit` object in order to trust your results (i.e., convergence of chains, mixing between chains, among others). We won't go into detail here, but we strongly recommend reading the `rstan` documentation: https://cran.r-project.org/web/packages/rstan/vignettes/rstan.html. A relativel frequent convergence issue is called **divergent transitions**. Divergent trasitions indicate a serious problem that can either be erased by adapting the sampler settings (see: http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup) or by giving more informative centering and scaling vectors.

We can inspect the posteriors using the `parameter_summary` function in the R folder. In particular, we can inspect the temporal variation in the start of season by plotting its median and 95% credible interval over the years:

```{r}
source("R/parameter_summary.R")

phi_summary <- parameter_summary(fit, type = "temporal")

# Note that we make use of the ggplot2 package, which you need to load

library(ggplot2)

ggplot(phi_summary, aes(x = year, y = Q50)) +
  geom_point() +
  geom_errorbar(aes(ymin = Q025, ymax = Q975)) +
  labs(x = "Year", y = "Variation in SOS")
```

Similarly, we can compare the spatial distribution of phenological parameters among the nine pixels:

```{r}
beta_summary <- parameter_summary(fit, type = "spatial")

ggplot(beta_summary, aes(x = factor(pixel), y = Q50)) +
  geom_point() +
  geom_errorbar(aes(ymin = Q025, ymax = Q975)) +
  labs(x = "Pixel", y = "Estimate") +
  facet_wrap(~parameter, scales = "free")
```

### Real data

We repeat the above shown exercise with real Landsat data aquired over broadleaved forest stands in the Bavarian Forest National Park in southern Germany. The data set is located in the data folder and stored as RData object. The data totals 100 Enhanced Vegetation Index (EVI) time series and are identical to the data used in the above mentioned publication.

```{r}
load("data/landsat_broadleaved_bavarian_forest.RData")

dat_spring <- subset(dat, dat$doy <= 200)

mu_beta <- matrix(c(0, 0.3, 0.15, 120, 0), ncol = max(dat$pixel), nrow = 5)

sigma_beta_scale <- c(0.1, 0.1, 0.1, 10, 0.001)

datalist <- list(N = nrow(dat_spring),
                 NY = max(dat_spring$year_index),
                 NP = max(dat_spring$pixel),
                 doy = dat_spring$doy,
                 vi = dat_spring$evi,
                 year = dat_spring$year_index,
                 pixel = dat_spring$pixel,
                 mu_beta = mu_beta,
                 sigma_beta_scale = sigma_beta_scale)

fit <- stan(file = "Stan/pheno_base_spring.stan", data = datalist, iter = 2000, chains = 4)
```

We here also can inspect the temporal and spatial patterns of phenology:

```{r}
# Temporal

phi_summary <- parameter_summary(fit, type = "temporal")

ggplot(phi_summary, aes(x = year + 1984, y = Q50)) +
  geom_point() +
  geom_errorbar(aes(ymin = Q025, ymax = Q975)) +
  labs(x = "Year", y = "Variation in SOS")

# Spatial

beta_summary <- parameter_summary(fit, type = "spatial")

ggplot(beta_summary, aes(x = factor(pixel), y = Q50)) +
  geom_point() +
  geom_errorbar(aes(ymin = Q025, ymax = Q975)) +
  labs(x = "Pixel", y = "Estimate") +
  facet_wrap(~parameter, scales = "free")
```

## Future models

We will gradually add a variety of models:

- A base autumn model
- Spring/autumn models containing temporal predictors (i.e., climate)
- Spring/autumn models containing spatial predictors (i.e., topography)