### Seasonality in MERS-CoV introductions
In this notebook we will take output from a structured coalescent and estimate per month per year introduction probabilities.

To run the stuff in here you'll need the packages **rstan** and  **parallel**.

The first step is to load in the "raw" data and discretise it into a grid.

In [None]:
## Auxiliary functions
rmNan <- function(x) x[!is.nan(x)]
#
getMonthYear <- function(x){
  strx <- sprintf("%.3f", x)
  year <- as.numeric(strsplit(strx, "\\.")[[1]][1])
  fractional <- as.numeric(paste("0.", strsplit(strx, "\\.")[[1]][2], sep = ""))
  month <- as.numeric(cut(fractional, breaks = seq(-1e-6, 1, 1/12)))
  return(data.frame(month = month, year = year))
}
#
getMonthYear <- Vectorize(getMonthYear)
#
markOccurences <- function(discdates, grid){
  apply(grid, 1, function(timepoint){
    occurrences <- subset(discdates,
                          year == as.numeric(timepoint[2]) & month == as.numeric(timepoint[1]) )
    ifelse(nrow(occurrences) > 0 , 1, 0)
  }
  )
}

In [None]:
## Loading data
raw_dates <- read.table("../data/seasonality/MERS_274_sCoal.combined.intros", header = TRUE)
dates <- raw_dates[, -1]

In [None]:
## Discretising 
NonNaNDates <- apply(dates, 1, function(x) rmNan(unlist(x)))
DiscretisedDates <- parallel::mclapply(X = NonNaNDates, FUN = getMonthYear, mc.cores = 8)
DiscretisedDates <- lapply(DiscretisedDates, function(x) data.frame(t(x)))

We will now discretise the dates. To do that we'll create a grid of years and months, and count how many introductions (in the posterior distribution) have occurred in year $i$ and month $j$ at tree $k$.

In [None]:
## Create grid
Years <- seq(2010, 2014)
Months <- 1:12
fullGrid <- expand.grid(Months, Years)
names(fullGrid) <- c("month", "year")

In [None]:
Indicators <- parallel::mclapply(DiscretisedDates, markOccurences, grid = fullGrid, mc.cores = 12)
IndMat <- matrix(unlist(Indicators), ncol = length(DiscretisedDates), nrow = nrow(fullGrid))

In [None]:
## Export
write.csv(IndMat, file = "../data/seasonality/discretised_introductions_2010-2014.csv", row.names = FALSE)

Now let's run [stan](http://mc-stan.org/) to get our estimates (see paper and the stan [script](https://github.com/blab/structured-mers/tree/master/scripts/stan/simple_binomial_seasonal.stan) for the model we employed).

In [None]:
getIndices <- function(x) match(x, sort(unique(x)))
stanData.bernoulli.seasonality <- list(
  N = nrow(IndMat),
  M = length(Months),
  J = length(Years), 
  L = ncol(IndMat),
  year_indices = getIndices(fullGrid$year),
  month_indices = getIndices(fullGrid$month),
  X = rowSums(IndMat) 
)
library(rstan)
options(mc.cores = 4)

simple <- stan(file = "stan/simple_binomial_seasonal.stan",
             data = stanData.bernoulli.seasonality, iter = 1, chains = 1)

posterior.simple <- stan(fit = simple, data = stanData.bernoulli.seasonality, iter = 5000, 
                         control = list(adapt_delta = .80))

In [None]:
## Extract and export coefficient results
Betas <- extract(posterior.simple, 'beta')$beta
colnames(Betas) <- paste("beta_", month.name, sep = "")
write.csv(Betas,
          file = "../data/seasonality/posterior_betas_months.csv", row.names = FALSE)

WARNING: if the notebook goes funny, please just copy-paste the code in an R console and things should work.