Skip to content

Commit

Permalink
minor doc changes for v0.7.3
Browse files Browse the repository at this point in the history
  • Loading branch information
doserjef committed Mar 28, 2024
1 parent 1aad71b commit bb0124e
Show file tree
Hide file tree
Showing 13 changed files with 66 additions and 55 deletions.
4 changes: 2 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

+ Fixed a problem that could arise when calculating Rhat in all models when running multiple chains (but usually only happened in multispecies models) when there was a high amount of correlation between parameter estimates. This would lead to the model running completely, but then failing after all chains have been run. This most often occurred when fitting a multispecies model with a lot of rare species. Thanks to Marc Kery for bringing this to my attention.
+ Added in a check at the top of all model fitting functions to return an error when the number of posterior samples saved based on the MCMC criteria (`n.batch`, `batch.length`, `n.samples`, `n.burn`, `n.thin`, `n.chains`) are specified in a way that leads to a non-integer value. In such situations, models would previously run and return without an error, but sometimes the last posterior sample in any given chain could have widely inaccurate values, or values that prevented subsequent functions from working. Thanks to Wendy Leuenberger and Colin Swider for bringing this to my attention.
+ Added in functionality for fitting spatially-explicit models where the spatial random effects (or spatially varying coefficients) are not specified at the individual site, but rather are specified at a larger spatial resolution. This is accomplished using a new component of the `data` list supplied to model fitting functions called `grid.index`. This is useful for data sets where there is some sort of nested structuring among the data collection protocol, such that you may wish to specify the spatial random effects at a lower resolution than each individual location. Further, it can be particularly useful for SVC models where you only want to specify nonstationarity at a lower spatial resolution (e.g., across a set of grid cells). This is currently implemented for the following functions: \texttt{sfMsPGOcc}, . See the documentation for a given model function for how to specify this. I am hoping to eventually write up a small example that shows how to do this, but for now documentation is fairly limited to just the manual pages for each function. Feel free to contact me if you want to use this functionality and have any questions.
+ Added in functionality for fitting spatially-explicit models where the spatial random effects (or spatially varying coefficients) are not specified at the individual site, but rather are specified at a larger spatial resolution. This is accomplished using a new component of the `data` list supplied to model fitting functions called `grid.index`. This is useful for data sets where there is some sort of nested structuring among the data collection protocol, such that you may wish to specify the spatial random effects at a lower resolution than each individual location. Further, it can be particularly useful for SVC models where you only want to specify nonstationarity at a lower spatial resolution (e.g., across a set of grid cells). This is currently implemented for the following functions: `spPGOcc`, `sfMsPGOcc`, `stMsPGOcc`, `stPGOcc`, `svcPGOcc`, `svcTMsPGOcc`, `svcTPGOcc`. See the documentation for a given model function for how to specify this. I am hoping to eventually write up a small example that shows how to do this, but for now documentation is fairly limited to just the manual pages for each function. Feel free to contact me if you want to use this functionality and have any questions.
+ Added in the `updateMCMC()` function. This function is in active development, but it will ultimately allow for all `spOccupancy` and `spAbundance` model objects to be updated with additional MCMC samples, instead of having to completely rerun an MCMC analysis if adequate burn-in/convergence was not reached. It currently works for the function `sfJSDM()` in `spOccupancy` and `msAbund()` in `spAbundance`.
+ Added in the ability to specify independent priors for the species-level regression coefficients for two functions: \texttt{svcTMsPGOcc} and \texttt{sfJSDM}. This is done by setting the tags \texttt{independent.betas} and \texttt{independent.alphas} to TRUE. This will fix the values of the community-level mean and variance parameters to the initial values specified in \code{inits}. This is equivalent to setting an independent Gaussian prior for each of the species-specific regression coefficients, which may potentially be useful in certian situations where the assumption of normality in the distribution of the species-level effects is not well met. This functionality will eventually be incorporated for all multi-species models.
+ Added in the ability to specify independent priors for the species-level regression coefficients for two functions: `svcTMsPGOcc` and `sfJSDM`. This is done by setting the tags `independent.betas` and `independent.alphas` to TRUE. This will fix the values of the community-level mean and variance parameters to the initial values specified in `inits`. This is equivalent to setting an independent Gaussian prior for each of the species-specific regression coefficients, which may potentially be useful in certian situations where the assumption of normality in the distribution of the species-level effects is not well met. This functionality will eventually be incorporated for all multi-species models.
+ Fixed a bug in `intMsPGOcc()` that caused the model to crash upon initialization of the MCMC algorithm when data were supplied in a way such that for a given data set, the maximum number of times a specific site was sampled was less than the total number of "replicate periods" (i.e., the third dimension of the data list). This may happen when the "replicates" are structured as specific time periods (i.e., weeks, years) instead of a specific "replicate". This was previously fixed in all other model fitting functions.
+ Wrote a new "vignette" (really more of a blog post) on some recommendations to help improve interpretability of inferences in SVC models.
+ Fixed a few typos in the MCMC sampler vignettes for factor models and SVC models.
Expand Down
4 changes: 2 additions & 2 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -154,11 +154,11 @@ The `vignette("modelFitting")` provides a more detailed description and tutorial

Doser, J. W., Finley, A. O., Kery, M., and Zipkin, E. F. (2022a). spOccupancy: An R package for single-species, multi-species, and integrated spatial occupancy models. Methods in Ecology and Evolution. 13(8) 1670-1678. https://doi.org/10.1111/2041-210X.13897.

Doser, J. W., Finley, A. O., and Banerjee, S. (2023). Joint species distribution models with imperfect detection for high-dimensional spatial data. Ecology. https://doi.org/10.1002/ecy.4137.
Doser, J. W., Finley, A. O., and Banerjee, S. (2023). Joint species distribution models with imperfect detection for high-dimensional spatial data. Ecology, 104(9), e4137. https://doi.org/10.1002/ecy.4137.

Doser, J. W., Finley, A. O., Saunders, S. P., Kéry, M., Weed, A. S., & Zipkin, E. F. (2024A). Modeling complex species-environment relationships through spatially-varying coefficient occupancy models. Journal of Agricultural, Biological and Environmental Statistics. https://doi.org/10.1007/s13253-023-00595-6.

Doser, J. W., Kéry, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, e13814. https://doi.org/10.1111/geb.13814
Doser, J. W., Kéry, M., Saunders, S. P., Finley, A. O., Bateman, B. L., Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines for the use of spatially varying coefficients in species distribution models. Global Ecology and Biogeography, 33, e13814. https://doi.org/10.1111/geb.13814



6 changes: 3 additions & 3 deletions README.html
Original file line number Diff line number Diff line change
Expand Up @@ -863,7 +863,7 @@ <h3 id="fit-a-spatial-occupancy-model-using-sppgocc">Fit a spatial
<span id="cb5-12"><a href="#cb5-12" tabindex="-1"></a><span class="co">#&gt; Thinning Rate: 4</span></span>
<span id="cb5-13"><a href="#cb5-13" tabindex="-1"></a><span class="co">#&gt; Number of Chains: 3</span></span>
<span id="cb5-14"><a href="#cb5-14" tabindex="-1"></a><span class="co">#&gt; Total Posterior Samples: 6000</span></span>
<span id="cb5-15"><a href="#cb5-15" tabindex="-1"></a><span class="co">#&gt; Run Time (min): 0.7788</span></span>
<span id="cb5-15"><a href="#cb5-15" tabindex="-1"></a><span class="co">#&gt; Run Time (min): 0.7627</span></span>
<span id="cb5-16"><a href="#cb5-16" tabindex="-1"></a><span class="co">#&gt; </span></span>
<span id="cb5-17"><a href="#cb5-17" tabindex="-1"></a><span class="co">#&gt; Occurrence (logit scale): </span></span>
<span id="cb5-18"><a href="#cb5-18" tabindex="-1"></a><span class="co">#&gt; Mean SD 2.5% 50% 97.5% Rhat ESS</span></span>
Expand Down Expand Up @@ -962,15 +962,15 @@ <h2 id="references">References</h2>
13(8) 1670-1678. <a href="https://doi.org/10.1111/2041-210X.13897">https://doi.org/10.1111/2041-210X.13897</a>.</p>
<p>Doser, J. W., Finley, A. O., and Banerjee, S. (2023). Joint species
distribution models with imperfect detection for high-dimensional
spatial data. Ecology. <a href="https://doi.org/10.1002/ecy.4137">https://doi.org/10.1002/ecy.4137</a>.</p>
spatial data. Ecology, 104(9), e4137. <a href="https://doi.org/10.1002/ecy.4137">https://doi.org/10.1002/ecy.4137</a>.</p>
<p>Doser, J. W., Finley, A. O., Saunders, S. P., Kéry, M., Weed, A. S.,
&amp; Zipkin, E. F. (2024A). Modeling complex species-environment
relationships through spatially-varying coefficient occupancy models.
Journal of Agricultural, Biological and Environmental Statistics. <a href="https://doi.org/10.1007/s13253-023-00595-6">https://doi.org/10.1007/s13253-023-00595-6</a>.</p>
<p>Doser, J. W., Kéry, M., Saunders, S. P., Finley, A. O., Bateman, B.
L., Grand, J., Reault, S., Weed, A. S., &amp; Zipkin, E. F. (2024B).
Guidelines for the use of spatially varying coefficients in species
distribution models. Global Ecology and Biogeography, e13814. <a href="https://doi.org/10.1111/geb.13814">https://doi.org/10.1111/geb.13814</a></p>
distribution models. Global Ecology and Biogeography, 33, e13814. <a href="https://doi.org/10.1111/geb.13814">https://doi.org/10.1111/geb.13814</a></p>

</body>
</html>
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ summary(out)
#> Thinning Rate: 4
#> Number of Chains: 3
#> Total Posterior Samples: 6000
#> Run Time (min): 0.7788
#> Run Time (min): 0.7627
#>
#> Occurrence (logit scale):
#> Mean SD 2.5% 50% 97.5% Rhat ESS
Expand Down Expand Up @@ -267,7 +267,8 @@ integrated spatial occupancy models. Methods in Ecology and Evolution.

Doser, J. W., Finley, A. O., and Banerjee, S. (2023). Joint species
distribution models with imperfect detection for high-dimensional
spatial data. Ecology. <https://doi.org/10.1002/ecy.4137>.
spatial data. Ecology, 104(9), e4137.
<https://doi.org/10.1002/ecy.4137>.

Doser, J. W., Finley, A. O., Saunders, S. P., Kéry, M., Weed, A. S., &
Zipkin, E. F. (2024A). Modeling complex species-environment
Expand All @@ -278,5 +279,5 @@ Journal of Agricultural, Biological and Environmental Statistics.
Doser, J. W., Kéry, M., Saunders, S. P., Finley, A. O., Bateman, B. L.,
Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines
for the use of spatially varying coefficients in species distribution
models. Global Ecology and Biogeography, e13814.
models. Global Ecology and Biogeography, 33, e13814.
<https://doi.org/10.1111/geb.13814>
9 changes: 5 additions & 4 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ reference:
- simTBinom
- simIntMsOcc
- simTMsOcc
- title: "Miscellaneous"
- contents:
- postHocLM
- getSVCSamples
- updateMCMC
- title: "Summary"
- contents:
- summary.PGOcc
Expand Down Expand Up @@ -118,10 +123,6 @@ reference:
- fitted.tMsPGOcc
- fitted.stMsPGOcc
- fitted.svcTMsPGOcc
- title: "Other"
- contents:
- postHocLM
- getSVCSamples
- title: "Data"
- contents:
- hbef2015
Expand Down
19 changes: 9 additions & 10 deletions man/postHocLM.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

\usage{
postHocLM(formula, data, inits, priors, verbose = FALSE,
n.report = 100, n.samples, n.chains = 1, ...)
n.report = 100, n.samples, n.chains = 1, ...)
}

\description{
Expand Down Expand Up @@ -146,7 +146,7 @@ if (p > 1) {
X[, i] <- rnorm(N)
} # i
}
mu <- X %*% as.matrix(beta)
mu <- X[, 1] * beta[1] + X[, 2] * beta[2] + X[, 3] * beta[3]
y <- rnorm(N, mu, sqrt(tau.sq))
# Replicate y n.samples times and add a small amount of noise that corresponds
# to uncertainty from a first stage model.
Expand All @@ -156,19 +156,18 @@ y <- y + rnorm(length(y), 0, 0.25)

# Package data for use with postHocLM -------------------------------------
colnames(X) <- c('int', 'cov.1', 'cov.2')
data.list <- list(y = y,
covs = X)
data.list <- list(y = y, covs = X)
data <- data.list
inits <- list(beta = 0, tau.sq = 1)
priors <- list(beta.normal = list(mean = 0, var = 10000),
tau.sq.ig = c(0.001, 0.001))
tau.sq.ig = c(0.001, 0.001))

# Run the model -----------------------------------------------------------
out <- postHocLM(formula = ~ cov.1 + cov.2,
inits = inits,
data = data.list,
priors = priors,
verbose = FALSE,
n.chains = 1)
inits = inits,
data = data.list,
priors = priors,
verbose = FALSE,
n.chains = 1)
summary(out)
}
31 changes: 24 additions & 7 deletions man/spOccupancy-package.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ joint likelihood framework. Details on data integration are given in
Miller, Pacifici, Sanderlin, and Reich (2019). Details on single-species and
multi-species models are found in MacKenzie et al. (2002) and Dorazio and Royle (2005),
respectively. Details on the package functionality is given in Doser et al. (2022),
Doser, Finley, Banerjee (2023), and Doser, Finley, Saunders, Kery, Weed, and Zipkin (2023).
Doser, Finley, Banerjee (2023), Doser et al. (2024a,b).
See \code{citation('spOccupancy')} for how to cite spOccupancy in publications.

\strong{Single-species models}
Expand Down Expand Up @@ -89,6 +89,14 @@ See \code{citation('spOccupancy')} for how to cite spOccupancy in publications.

\code{\link{simTMsOcc}} simulates multi-species multi-season occupancy data from multiple data sources.

\strong{Miscellaneous}

\code{\link{postHocLM}} fits post-hoc linear (mixed) models.

\code{\link{getSVCSamples}} extracts spatially varying coefficient MCMC samples.

\code{\link{updateMCMC}} updates a spOccupancy or spAbundance model object with more MCMC iterations.

All objects from model-fitting functions have support with the \code{summary} function for
displaying a concise summary of model results, the \code{fitted} function for extracting
model fitted values, and the \code{predict} function for predicting occupancy and/or detection
Expand All @@ -100,15 +108,24 @@ across an area of interest.
Doser, J. W., Finley, A. O., Kery, M., & Zipkin, E. F. (2022).
spOccupancy: An R package for single-species, multi-species, and
integrated spatial occupancy models.
Methods in Ecology and Evolution, 13, 1670-1678. \doi{10.1111/2041-210X.13897}
Methods in Ecology and Evolution, 13, 1670-1678. \doi{10.1111/2041-210X.13897}.

Doser, J. W., Finley, A. O., & Banerjee, S. (2023). Joint species
distribution models with imperfect detection for high-dimensional
spatial data. Ecology e4137. \doi{10.1002/ecy.4137}

Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed A. S., Zipkin, E. F. (2023).
Modeling complex species-environment relationships through spatially-varying coefficient
occupancy models. arXiv preprint.
spatial data. Ecology, 104(9), e4137. \doi{10.1002/ecy.4137}.

Doser, J. W., Finley, A. O., Saunders, S. P., Kery, M., Weed, A. S., &
Zipkin, E. F. (2024A). Modeling complex species-environment
relationships through spatially-varying coefficient occupancy models.
Journal of Agricultural, Biological and Environmental Statistics.
\doi{10.1007/s13253-023-00595-6}.

Doser, J. W., Kery, M., Saunders, S. P., Finley, A. O., Bateman, B. L.,
Grand, J., Reault, S., Weed, A. S., & Zipkin, E. F. (2024B). Guidelines
for the use of spatially varying coefficients in species distribution
models. Global Ecology and Biogeography, 33, e13814.
\doi{10.1111/geb.13814}.

}

\author{
Expand Down
4 changes: 2 additions & 2 deletions vignettes/factorModels.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -239,11 +239,11 @@ summary(out.lfMsPGOcc)

We see the `summary()` function displays the posterior mean, standard deviation, and posterior quantiles (2.5%, 50%, and 97.5%) for a quick summarization of model findings, with all summaries of parameters on the logit scale. Note that all `spOccupancy` `summary()` functions have a `quantiles` argument where you can supply the specific quantiles you want to be displayed in the summary output (by default, this is set to `quantiles = c(0.025, 0.5, 0.975)`). Looking at the community-level parameters, we see there is large variation in average occurrence (i.e., the occurrence intercept) across the study region, and more moderate variation in the effect of elevation on occurrence of the 12 bird species across the region. On average, bird occurrence in the community tends to peak at mid-level elevations (i.e., the community-level quadratic effect of elevation is negative).

Additionally, `summary()` returns Rhat (the Gelman-Rubin diagnostic; @brooks1998) as well as the effective sample size (ESS) for convergence assessments. Here we see most Rhat values are less than 1.1 and the ESS values are sufficiently large. For a complete analysis, we would run the model for longer to ensure all Rhat values were less than 1.1 and ESS values were sufficiently large. Further, we can use the `coda::plot()` function to plot traceplots of the individual model parameters that are contained in the resulting `lfMsPGOcc` object. All posterior samples are stored in objects that end in "samples" in the resulting `out.lfMsPGOcc` object.
Additionally, `summary()` returns Rhat (the Gelman-Rubin diagnostic; @brooks1998) as well as the effective sample size (ESS) for convergence assessments. Here we see most Rhat values are less than 1.1 and the ESS values are sufficiently large. For a complete analysis, we would run the model for longer to ensure all Rhat values were less than 1.1 and ESS values were sufficiently large. Further, we can use the `plot()` function to plot traceplots of the individual model parameters that are contained in the resulting `lfMsPGOcc` object. The `plot()` function takes three arguments: `x` (the model object), `param` (a character string denoting the parameter name), and `density` (logical value indicating whether to also plot the density of MCMC samples along with the traceplot). See `?plot.lfMsPGOcc` for more details (similar functions exist for all `spOccupancy` model objects).

```{r, fig.width = 5, fig.height = 5, fig.align = 'center', units = 'in'}
# Check out traceplot of the community-level occurrence means.
plot(out.lfMsPGOcc$beta.comm.samples, density = FALSE)
plot(out.lfMsPGOcc, 'beta.comm', density = FALSE)
```

The `summary()` function does not present any information on the latent factor loadings or latent factors, but the full posterior samples are available in the `lambda.samples` and `w.samples` tags in the `out.lfMsPGOcc` object, respectively. Below we display the posterior summaries of the latent factor loadings.
Expand Down
Loading

0 comments on commit bb0124e

Please sign in to comment.