diff --git a/DESCRIPTION b/DESCRIPTION index 6674608..33c9016 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,9 +1,9 @@ Package: virtualspecies Type: Package Title: Generation of Virtual Species Distributions -Version: 1.4-2 -Date: 2017-07-01 -Author: Boris Leroy, Christine N. Meynard, Celine Bellard and Franck Courchamp +Version: 1.4-4 +Date: 2018-09-10 +Author: Boris Leroy [cre, aut], Christine N. Meynard [ctb], Celine Bellard [ctb], Franck Courchamp [ctb], Robin Delsol [ctb], Willson Gaul [ctb] Maintainer: Boris Leroy Description: Provides a framework for generating virtual species distributions, a procedure increasingly used in ecology to improve species distribution @@ -12,11 +12,11 @@ Description: Provides a framework for generating virtual species distributions, realism. License: GPL (>= 2.0) Depends: raster -Imports: ade4, dismo, graphics, grDevices, methods, rworldmap, stats, +Imports: ade4, graphics, grDevices, methods, rworldmap, sp, stats, utils URL: http://borisleroy.com/virtualspecies -RoxygenNote: 6.0.1 +RoxygenNote: 6.1.0 NeedsCompilation: no -Packaged: 2017-07-03 11:51:34 UTC; BorisMNHN +Packaged: 2018-09-10 08:59:24 UTC; Farewell10 Repository: CRAN -Date/Publication: 2017-07-03 13:51:13 UTC +Date/Publication: 2018-09-10 15:10:02 UTC diff --git a/MD5 b/MD5 index 2c41aa1..7a120e6 100644 --- a/MD5 +++ b/MD5 @@ -1,36 +1,37 @@ -f86766f35df8a4d42e9abda1150f3088 *DESCRIPTION -96dcb550f105a0dfbc3fdd6901e8e0fa *NAMESPACE -d342159cc65142aa279d0619349e7d2b *NEWS -f558c06fca0d6fdfa8f3f4d60d9e98bc *R/convertToPA.R -ce49154d58f5e93c908eae10d79aaa2d *R/formatFunctions.R -8b4fe3586ac3e8fd8c4bfbdda5813bbf *R/generateRandomSp.R -76a95c2bf239ad84cf3cbf211344c942 *R/generateSpFromBCA.R -3d765a1ec95eabfbfbf06f8a023cf5ab *R/generateSpFromFun.R -ba6ababc5336b4b6bd76ff29c336f25e *R/generateSpFromPCA.R -853d84d2613d32b1facf0734f0943172 *R/genericfunctions.R -90fd7843a54d65934bcf714beaf75560 *R/limitDistribution.R -fcb90ae3b83402c834477df63ac2db52 *R/plotResponse.R -6fb36003db1a9094fcbe80b6a65abaa1 *R/removeCollinearity.R -9be6cc7f2b03a90c889bd026b6d2a540 *R/sampleOccurrences.R -6963ca727b7f595ee1aba35ad6d52861 *R/sp.env.functions.R +5f649a6a9f48b687ce9dddc97697a919 *DESCRIPTION +5f4101c3517305580c97031a652858e4 *NAMESPACE +ad1b5cf846b0c8770a80a0c0aeebef7a *NEWS +61162f5a2fc7c05d974960d0fcc7586d *R/convertToPA.R +1011145f8ea55c69597d1dcbd2309009 *R/formatFunctions.R +463daad993b6b4adf3415f26f69a9b1b *R/generateRandomSp.R +721f2b77b94763d39b891e72e29da1df *R/generateSpFromBCA.R +d69e74e40fb9dba9d3bfc7d0bfa0638e *R/generateSpFromFun.R +97f63384dd2ead3d4abe68f3ab63c068 *R/generateSpFromPCA.R +33b47ca44152acf48db01776e9a19ed2 *R/genericfunctions.R +24049629c3e8b6320d75a007fb021955 *R/limitDistribution.R +8ca36ba1eae858fac03c1ad1f2802ccb *R/plotResponse.R +73d567d59bd4fc06e42aa915d0b28f69 *R/removeCollinearity.R +7cd36c3439ad28bfb57a9d462674d61f *R/sampleOccurrences.R +ec22a004c2b14b46029b908bee6ce799 *R/sp.env.functions.R b15899277537256370881dc1485c9fd0 *R/synchroniseNA.R aae6ef3fc4f548f7c6ecc40d7a2e78fa *R/thermalFunctionDraft.R -293c34f8ab11dcd4b2f76c177d52663f *R/virtualspecies-package.R +a2b70eaf46bfc3bec906484369d08804 *R/utilityFunctions.R +8079f60c6704cc0e1b92f48dc0de4e98 *R/virtualspecies-package.R ef811ca3f8bf933c1ac15f303aed9ce0 *inst/CITATION -b242df4d91d9e87c20a9bcb74b5e2797 *man/betaFun.Rd -f841263ec05fcf2e98148d5e7574db74 *man/convertToPA.Rd +ada378869af7d494616910936570f04b *man/betaFun.Rd +242e89685b6baa807fe605a930d3c1a9 *man/convertToPA.Rd 6cf68e1d6295361f4b2cc08ada59678c *man/custnorm.Rd -cdcc5b66c190a20808c6137f592aefb8 *man/formatFunctions.Rd -e31f4c70496fbe367681aa4e198a5eaa *man/generateRandomSp.Rd -e69e89a4b587a46a44811570e68a3767 *man/generateSpFromBCA.Rd -a5a98a6b54f3887f7e6b81d3a4cdb107 *man/generateSpFromFun.Rd -7c94c2a500426da08b3ed7e819eb30ff *man/generateSpFromPCA.Rd -333cc6472e773547254fcf3de218036e *man/limitDistribution.Rd +4c00f7eff53e20b398a340941e359687 *man/formatFunctions.Rd +e8a93399c767b5af97e0bc735f06bbb3 *man/generateRandomSp.Rd +3895d9b4fe9d164c4b80bc985e97caa8 *man/generateSpFromBCA.Rd +b981a59621c574c855cb33482241e95c *man/generateSpFromFun.Rd +f7597b74812ed1bd1843a350c7588ab5 *man/generateSpFromPCA.Rd +365d324c599192e91855b8e5eb039f0c *man/limitDistribution.Rd cea125e4b6d7bfc151f6ff3db52a5ecb *man/linearFun.Rd 1af79f8aec422c37a3b212ed120934cb *man/logisticFun.Rd -ad12113562da0ccdefcfeeae55f626e6 *man/plotResponse.Rd +3ca9a2681d776c1f8c248fd1aa950983 *man/plotResponse.Rd 095b08a7b2bfd81c6d03235714756a12 *man/quadraticFun.Rd -30b4a69e038f3a4887a0fe6d0f470ae0 *man/removeCollinearity.Rd -10c9fe9b39e40b60daf7bafe4e1d841c *man/sampleOccurrences.Rd +173063845130f09a570fdf8c287cfc09 *man/removeCollinearity.Rd +705c2c5074d02c7eeafa31279a295d29 *man/sampleOccurrences.Rd 2b343cf59df59fc9f32bcc485381a388 *man/synchroniseNA.Rd -2770977a5e0f58328caed1eec59446c3 *man/virtualspecies-package.Rd +196ee8dfe37067d044872fb960e82921 *man/virtualspecies-package.Rd diff --git a/NAMESPACE b/NAMESPACE index 6801663..8a02cc6 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,7 +1,9 @@ # Generated by roxygen2: do not edit by hand S3method(plot,virtualspecies) +S3method(print,VSSampledPoints) S3method(print,virtualspecies) +S3method(str,VSSampledPoints) S3method(str,virtualspecies) export(betaFun) export(convertToPA) diff --git a/NEWS b/NEWS index 44ace43..0a0b58e 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,12 @@ +virtualspecies v1.4-4 (Release date: 2018-09-10) +============== + +Changes: +* Modified function sampleOccurrences (more features, better output) +* Added new generic print / str functions for sampleOccurrences outputs +* Improvements to generateSpFromBCA() +* Fixed a bug in convertToPA() where the generation of low prevalence species would generate warnings and sometimes fail to obtain a proper beta threshold + virtualspecies v1.4-1 (Release date: 2016-12-22) ============== diff --git a/R/convertToPA.R b/R/convertToPA.R index 2e44557..9244a3c 100644 --- a/R/convertToPA.R +++ b/R/convertToPA.R @@ -101,7 +101,7 @@ #' \item{\code{PA.conversion}: the parameters used to convert the suitability into presence-absence} #' \item{\code{pa.raster}: the presence-absence map, as a Raster object containing 0 (absence) / 1 (presence) / NA} #' } -#' The structure of the virtualspecies object can be seen using str() +#' The structure of the virtualspecies object can be seen using \code{str()} #' @examples #' # Create an example stack with two environmental variables #' a <- matrix(rep(dnorm(1:100, 50, sd = 25)), diff --git a/R/formatFunctions.R b/R/formatFunctions.R index 80023fa..c6b2fac 100644 --- a/R/formatFunctions.R +++ b/R/formatFunctions.R @@ -2,7 +2,7 @@ #' #' @description #' This function is a helper function to simplify the formatting of functions -#' for generateSpFromFun. +#' for \code{\link{generateSpFromFun}} #' @details #' This function formats the \code{parameters} argument of \code{\link{generateSpFromFun}}. #' For each environmental variable, provide a vector containing the function name, and its arguments. diff --git a/R/generateRandomSp.R b/R/generateRandomSp.R index 74f90dc..24ae896 100644 --- a/R/generateRandomSp.R +++ b/R/generateRandomSp.R @@ -26,7 +26,7 @@ #' (may result in environmental conditions that do not co-exist). #' @param species.type [response approach] \code{"additive"} or \code{"multiplicative"}. Defines #' how the final probability of presence is calculated: if \code{"additive"}, responses to each -#' variable are summed; if \code{"multiplicative"}, responses are multiplicated. +#' variable are summed; if \code{"multiplicative"}, responses are multiplied. #' See \code{\link{generateSpFromFun}} #' @param niche.breadth [pca approach] \code{"any"}, \code{"narrow"} or \code{"wide"}. This parameter #' defines how tolerant is the species regarding environmental conditions by adjusting @@ -51,6 +51,9 @@ #' \code{PA.method = "probability"}. The value of \code{alpha} will #' shape the logistic function transforming occurrences into presence-absences. #' See \code{\link{logisticFun}} and examples therein for the choice of \code{alpha} +#' @param adjust.alpha \code{TRUE} or \code{FALSE}. Only useful if +#' \code{rescale = FALSE}. If \code{adjust.alpha = TRUE}, then the value of \code{alpha} will +#' be adjusted to an appropriate value for the range of suitabilities. #' @param species.prevalence \code{NULL} or a numeric value between 0 and 1. #' The species prevalence is the proportion of sites actually occupied by the #' species. See \code{\link{convertToPA}} @@ -60,7 +63,7 @@ #' a response approach. In case of a response approach, only four response functions are #' currently used: gaussian, linear, logistic and quadratic functions. #' -#' Note that in case of numerouse predictor variables, the "response" approach will +#' Note that in case of numerous predictor variables, the "response" approach will #' not work well because it will often generate contradicting response functions #' (e.g., mean annual temperature optimum at degrees C and temperature of the coldest month at #' 10 degrees C). In these case, it is advised to use the PCA approach (by default, a PCA approach @@ -68,11 +71,11 @@ #' #' If \code{rescale.each.response = TRUE}, then the probability response to each #' variable will be normalised between 0 and 1 according to the following formula: -#' P.rescaled = (P - min(P)) / (max(P) - min (P)). Simlarly, if \code{rescale = TRUE}, +#' P.rescaled = (P - min(P)) / (max(P) - min (P)). Similarly, if \code{rescale = TRUE}, #' the final environmental suitability will be rescaled between 0 and 1 with the same formula. #' #' By default, the function will perform a probabilistic conversion into presence- -#' absence, with a randomly chosen beta threshold. If you want to custmose the +#' absence, with a randomly chosen beta threshold. If you want to customise the #' conversion parameters, you have to define \bold{two} of the three following parameters: #' \itemize{ #' \item{\code{beta}: the 'threshold' of the logistic function (i.e. the @@ -104,7 +107,7 @@ #' \item{\code{PA.conversion}: the parameters used to convert the suitability into presence-absence} #' \item{\code{pa.raster}: the presence-absence map, as a Raster object containing 0 (absence) / 1 (presence) / NA} #' } -#' The structure of the virtualspecies can object be seen using str() +#' The structure of the virtualspecies can object be seen using \code{str()} #' @examples #' # Create an example stack with six environmental variables #' a <- matrix(rep(dnorm(1:100, 50, sd = 25)), @@ -141,6 +144,7 @@ generateRandomSp <- function(raster.stack, nb.points = 10000, PA.method = "probability", alpha = -.1, + adjust.alpha = TRUE, beta = "random", species.prevalence = NULL, plot = TRUE) @@ -176,6 +180,7 @@ generateRandomSp <- function(raster.stack, if(approach == "pca") { results <- generateSpFromPCA(raster.stack, + rescale = rescale, niche.breadth = niche.breadth, sample.points = sample.points, nb.points = nb.points, @@ -274,26 +279,51 @@ generateRandomSp <- function(raster.stack, valid.cells <- valid.cells * (tmp.rast > 0.05) } message(" - Calculating species suitability\n") - results <- generateSpFromFun(raster.stack, parameters, species.type = species.type, plot = FALSE) + results <- generateSpFromFun(raster.stack, + parameters, + rescale = rescale, + species.type = species.type, + plot = FALSE, + rescale.each.response = rescale.each.response) } - if(convert.to.PA == TRUE) - { + if(convert.to.PA == TRUE) { message(" - Converting into Presence - Absence\n") - results <- convertToPA(results, - PA.method = PA.method, - alpha = alpha, - beta = beta, - species.prevalence = species.prevalence, - plot = FALSE) - if(plot) plot(stack(results$suitab.raster, results$pa.raster), main = c("Suitability", "Presence-absence")) - } else - { + # Need to adjust alpha to appropriate scale if rescale = FALSE + if(rescale == FALSE) { + if(adjust.alpha) + { + alpha <- diff(c(minValue(results$suitab.raster), + maxValue(results$suitab.raster))) * alpha + } + results <- convertToPA(results, + PA.method = PA.method, + alpha = alpha, + beta = beta, + species.prevalence = species.prevalence, + plot = FALSE) + + if(plot) plot(stack(results$suitab.raster, + results$pa.raster), main = c("Suitability", + "Presence-absence")) + } else { + + results <- convertToPA(results, + PA.method = PA.method, + alpha = alpha, + beta = beta, + species.prevalence = species.prevalence, + plot = FALSE) + + if(plot) plot(stack(results$suitab.raster, results$pa.raster), + main = c("Suitability", "Presence-absence")) + } + } else { if(plot) plot(results$suitab.raster, main = "Suitability") } diff --git a/R/generateSpFromBCA.R b/R/generateSpFromBCA.R index e4fd835..be4eb9a 100644 --- a/R/generateSpFromBCA.R +++ b/R/generateSpFromBCA.R @@ -63,13 +63,13 @@ #' a fraction of the axis: #' \itemize{ #' \item{\code{"any"}: the standard deviations can have values from 1\% to 50\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -#' then sd values along this axe can range from 0.1 to 5. +#' then sd values along this axis can range from 0.1 to 5. #' } #' \item{\code{"narrow"}: the standard deviations are limited between 1\% and 10\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -#' then sd values along this axe can range from 0.1 to 1. +#' then sd values along this axis can range from 0.1 to 1. #' } #' \item{\code{"wide"}: the standard deviations are limited between 10\% and 50\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -#' then sd values along this axe can range from 1 to 5. +#' then sd values along this axis can range from 1 to 5. #' } #' } #' If a \code{bca} object is provided, the output bca object will contain the new environments coordinates along the provided bca axes. @@ -86,7 +86,7 @@ #' \item{\code{suitab.raster.future}: the virtual species distribution, as a Raster object containing the #' future environmental suitability} #' } -#' The structure of the virtualspecies object can be seen using str() +#' The structure of the virtualspecies object can be seen using \code{str()} #' #' #' @@ -367,6 +367,6 @@ generateSpFromBCA <- function(raster.stack.current, raster.stack.future, rescale stack.lengths = stack.lengths), suitab.raster.current = suitab.raster.current, suitab.raster.future = suitab.raster.future) - class(results) <- append(class(results), "virtualspecies") + class(results) <- append("virtualspecies", class(results)) return(results) } diff --git a/R/generateSpFromFun.R b/R/generateSpFromFun.R index ddf2553..7c9d668 100644 --- a/R/generateSpFromFun.R +++ b/R/generateSpFromFun.R @@ -14,7 +14,7 @@ #' \code{species.type} #' @param species.type \code{"additive"} or \code{"multiplicative"}. Only used if \code{formula = NULL}. #' Defines how the final environmental suitability is calculated: if \code{"additive"}, responses to each -#' variable are summed; if \code{"multiplicative"}, responses are multiplicated. +#' variable are summed; if \code{"multiplicative"}, responses are multiplied. #' @param rescale.each.response \code{TRUE} or \code{FALSE}. If \code{TRUE}, the individual responses to #' each environmental variable are rescaled between 0 and 1 (see details). #' @param plot \code{TRUE} or \code{FALSE}. If \code{TRUE}, the generated virtual species will be plotted. @@ -24,7 +24,7 @@ #' \item{\code{details}: the details and parameters used to generate the species} #' \item{\code{suitab.raster}: the raster containing the environmental suitability of the virtual species} #' } -#' The structure of the virtualspecies object can be seen using str() +#' The structure of the virtualspecies object can be seen using \code{str()} #' @seealso \code{\link{generateSpFromPCA}} to generate a virtual species with a PCA approach #' @details #' This functions proceeds into several steps: @@ -57,8 +57,8 @@ #' Any response function that can be applied to the environmental variables can #' be chosen here. Several functions are proposed in this package: #' \code{\link{linearFun}}, \code{\link{logisticFun}} and \code{\link{quadraticFun}}. -#' Another classical example is the normal distribution: \code{\link[stats]{dnorm}}. -#' Ther users can also create and use their own functions. +#' Another classical example is the normal distribution: \code{\link[stats:Normal]{stats::dnorm()}}. +#' Users can also create and use their own functions very easily. #' #' #' If \code{rescale.each.response = TRUE}, then the probability response to each @@ -238,7 +238,7 @@ generateSpFromFun <- function(raster.stack, parameters, plot(results$suitab.raster, main = "Environmental suitability of the virtual species") } - class(results) <- append(class(results), "virtualspecies") + class(results) <- append("virtualspecies", class(results)) return(results) } diff --git a/R/generateSpFromPCA.R b/R/generateSpFromPCA.R index f7be498..152dc60 100644 --- a/R/generateSpFromPCA.R +++ b/R/generateSpFromPCA.R @@ -50,13 +50,13 @@ #' a fraction of the axis: #' \itemize{ #' \item{\code{"any"}: the standard deviations can have values from 1\% to 50\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -#' then sd values along this axe can range from 0.1 to 5. +#' then sd values along this axis can range from 0.1 to 5. #' } #' \item{\code{"narrow"}: the standard deviations are limited between 1\% and 10\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -#' then sd values along this axe can range from 0.1 to 1. +#' then sd values along this axis can range from 0.1 to 1. #' } #' \item{\code{"wide"}: the standard deviations are limited between 10\% and 50\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -#' then sd values along this axe can range from 1 to 5. +#' then sd values along this axis can range from 1 to 5. #' } #' } #' @import raster @@ -76,7 +76,7 @@ #' \item{\code{suitab.raster}: the virtual species distribution, as a Raster object containing the #' environmental suitability} #' } -#' The structure of the virtualspecies object can be seen using str() +#' The structure of the virtualspecies object can be seen using \code{str()} #' @examples #' # Create an example stack with four environmental variables #' a <- matrix(rep(dnorm(1:100, 50, sd = 25)), @@ -293,7 +293,7 @@ generateSpFromPCA <- function(raster.stack, rescale = TRUE, niche.breadth = "any means = means, sds = sds), suitab.raster = suitab.raster) - class(results) <- append(class(results), "virtualspecies") + class(results) <- append("virtualspecies", class(results)) return(results) } diff --git a/R/genericfunctions.R b/R/genericfunctions.R index fb956b1..dbfd1fc 100644 --- a/R/genericfunctions.R +++ b/R/genericfunctions.R @@ -160,4 +160,57 @@ plot.virtualspecies <- function(x, ...) } x <- y plot(x, ...) +} + +#' @export +#' @method print VSSampledPoints +print.VSSampledPoints <- function(x, ...) +{ + # Next line is to ensure retrocompatibility with earlier versions of + # virtualspecies where no print function was designed for VSSampledPoints + if(!is.list(x$detection.probability)) + { + print(x) + } else + { + cat(paste("Occurrence points sampled from a virtual species")) + cat(paste("\n\n- Number of points:", nrow(x$sample.points))) + if(length(x$bias)) + { + cat("\n- Sampling bias: ") + cat(paste("\n .Bias type:", + x$bias$bias)) + cat(paste("\n .Bias strength:", + x$bias$bias.strength)) + } else + { + cat("\n- No sampling bias") + } + cat(paste0("\n- Detection probability: ")) + cat(paste0("\n .Probability: ", x$detection.probability$detection.probability)) + cat(paste0("\n .Corrected by suitability: ", x$detection.probability$correct.by.suitability)) + cat(paste0("\n- Probability of identification error (false positive): ", x$error.probability)) + if(length(x$sample.prevalence)) + { + cat(paste0("\n- Sample prevalence: ")) + cat(paste0("\n .True:", x$sample.prevalence["true.sample.prevalence"])) + cat(paste0("\n .Observed:", x$sample.prevalence["observed.sample.prevalence"])) + } + cat(paste0("\n- Multiple samples can occur in a single cell: ", + ifelse(x$replacement, "Yes", "No"))) + cat("\n\n") + print(x$sample.points) + } +} + +#' @export +#' @method str VSSampledPoints +str.VSSampledPoints <- function(object, ...) +{ + args <- list(...) + if(is.null(args$max.level)) + { + args$max.level <- 2 + } + NextMethod("str", object = object, max.level = args$max.level) } \ No newline at end of file diff --git a/R/limitDistribution.R b/R/limitDistribution.R index c5c39a2..700a80a 100644 --- a/R/limitDistribution.R +++ b/R/limitDistribution.R @@ -83,7 +83,7 @@ #' \item{\code{occupied.area}: the area occupied by the virtual species as a #' Raster of presence-absence} #' } -#' The structure of the virtualspecies object can be seen using str() +#' The structure of the virtualspecies object can be seen using \code{str()} #' @export #' @import raster #' @importFrom utils installed.packages diff --git a/R/plotResponse.R b/R/plotResponse.R index 3ae7b2d..4c291a2 100644 --- a/R/plotResponse.R +++ b/R/plotResponse.R @@ -28,7 +28,7 @@ #' #' Otherwise, you can provide a raster layer/stack of environmental variables to #' \code{x} and a list of functions to \code{parameters} to perform the plot. -#' In that case, you have to specifiy the \code{approach}: \code{"reponse"} or +#' In that case, you have to specify the \code{approach}: \code{"reponse"} or #' \code{"PCA"}: #' \itemize{ #' \item{if \code{approach = "response"}: Provide to \code{parameters} a diff --git a/R/removeCollinearity.R b/R/removeCollinearity.R index beb7b68..82c55f9 100644 --- a/R/removeCollinearity.R +++ b/R/removeCollinearity.R @@ -1,8 +1,8 @@ #' Remove collinearity among variables of a a raster stack #' -#' This functions analyses the correlation among variables of the provideded +#' This functions analyses the correlation among variables of the provided #' stack of environmental variables (using Pearson's R), and can return a -#' vector containing names of variables that are not intercorrelated, or a list +#' vector containing names of variables that are not colinear, or a list #' containing grouping variables according to their degree of collinearity. #' #' @param raster.stack a RasterStack object, in which each layer represent an environmental @@ -34,9 +34,9 @@ #' according to their degree of correlation. #' #' If \code{select.variables = TRUE}, then the function will return a vector -#' containing variables that are not intercorrelated. +#' containing variables that are not colinear. #' The variables not correlated to any other variables are automatically included -#' in this vector. For each group of intercorrelated variables, one variable will +#' in this vector. For each group of colinear variables, one variable will #' be randomly chosen and included in this vector. #' #' diff --git a/R/sampleOccurrences.R b/R/sampleOccurrences.R index 207a5f2..774bae8 100644 --- a/R/sampleOccurrences.R +++ b/R/sampleOccurrences.R @@ -1,55 +1,68 @@ #' Sample occurrences in a virtual species distribution #' #' @description -#' This function samples presences within a species distribution, either -#' randomly or with a sampling bias. The sampling bias can be defined manually -#' or with a set of pre-defined biases. +#' This function samples occurrences/records (presence only or presence-absence) +#' within a species distribution, either randomly or with a sampling bias. +#' The sampling bias can be defined manually or with a set of predefined +#' biases. #' #' @param x a \code{rasterLayer} object or the output list from -#' \code{generateSpFromFun}, \code{generateSpFromPCA}, \code{generateRandomSp}, \code{convertToPA} +#' \code{generateSpFromFun}, \code{generateSpFromPCA}, \code{generateRandomSp}, +#' \code{convertToPA} #' or \code{limitDistribution} #' The raster must contain values of 0 or 1 (or NA). -#' @param n an integer. The number of occurrence points to sample. +#' @param n an integer. The number of occurrence points / records to sample. #' @param type \code{"presence only"} or \code{"presence-absence"}. The type of #' occurrence points to sample. -#' @param sampling.area a character string, a \code{polygon} or an \code{extent} object. +#' @param sampling.area a character string, a \code{polygon} or an \code{extent} +#' object. #' The area in which the sampling will take place. See details. -#' @param detection.probability a numeric value between 0 and 1, corresponding to the -#' probability of detection of the species. See details. -#' @param correct.by.suitability \code{TRUE} or \code{FALSE}. If \code{TRUE}, then -#' the probability of detection will be weighted by the suitability, such that -#' cells with lower suitabilities will further decrease the chance that the species -#' is detected when sampled. -#' @param error.probability \code{TRUE} or \code{FALSE}. Only useful if -#' \code{type = "presence-absence"}. Probability to attribute an erroneous presence -#' in cells where the species is absent. +#' @param detection.probability a numeric value between 0 and 1, corresponding +#' to the probability of detection of the species. See details. +#' @param correct.by.suitability \code{TRUE} or \code{FALSE}. If \code{TRUE}, +#' then the probability of detection will be weighted by the suitability, such +#' that cells with lower suitabilities will further decrease the chance that +#' the species is detected when sampled. NOTE: this will NOT increase +#' likelihood of samplings in areas of high suitability. In this case look for +#' argument weights. +#' @param error.probability \code{TRUE} or \code{FALSE}. Probability to +#' attribute an erroneous presence (False Positive) in cells where the species +#' is actually absent. #' @param bias \code{"no.bias"}, \code{"country"}, \code{"region"}, #' \code{"extent"}, \code{"polygon"} or \code{"manual"}. The method used to #' generate a sampling bias: see details. -#' @param bias.strength a positive numeric value. The strength of the bias to be applied -#' in \code{area} (as a multiplier). Above 1, \code{area} will be oversampled. Below 1, \code{area} -#' will be undersampled. -#' @param bias.area \code{NULL}, a character string, a \code{polygon} or an \code{extent} object. -#' The area in which the sampling will be biased: see details. If \code{NULL} -#' and \code{bias = "extent"}, then you will be asked to draw an -#' extent on the map. -#' @param weights \code{NULL} or a raster layer. Only used if \code{bias = "manual"}. -#' The raster of bias weights to be applied to the sampling of occurrences. -#' Higher weights mean a higher probability of sampling. -#' @param sample.prevalence \code{NULL} or a numeric value between 0 and 1. Only useful if -#' \code{type = "presence-absence"}. Defines the sample prevalence, i.e. the proportion of presences -#' sampled. Note that the probabilities of detection and error are applied AFTER this parameter, -#' so the final sample prevalence may not different if you apply probabilities of detection and/or error -#' @param plot \code{TRUE} or \code{FALSE}. If \code{TRUE}, the sampled occurrence -#' points will be plotted. +#' @param bias.strength a positive numeric value. The strength of the bias to be +#' applied in \code{area} (as a multiplier). Above 1, \code{area} will be +#' oversampled. Below 1, \code{area} will be undersampled. +#' @param bias.area \code{NULL}, a character string, a \code{polygon} or an +#' \code{extent} object. The area in which the sampling will be biased: see +#' details. If \code{NULL} and \code{bias = "extent"}, then you will be asked to +#' draw an extent on the map. +#' @param weights \code{NULL} or a raster layer. Only used if +#' \code{bias = "manual"}. The raster of bias weights to be applied to the +#' sampling of occurrences. Higher weights mean a higher probability of +#' sampling. For example, species suitability raster can be entered here to +#' increase likelihood of sampling occurrences in areas with high suitability. +#' @param sample.prevalence \code{NULL} or a numeric value between 0 and 1. +#' Only useful if \code{type = "presence-absence"}. Defines the sample +#' prevalence, i.e. the proportion of presences sampled. Note that the +#' probabilities of detection and error are applied AFTER this parameter, +#' so the final sample prevalence may not different if you apply probabilities +#' of detection and/or error +#' @param replacement \code{TRUE} or \code{FALSE}. If \code{TRUE}, multiple +#' samples can occur in the same cell. Can be useful to mimic real datasets +#' where samplings can be duplicated or repeated in time. +#' @param plot \code{TRUE} or \code{FALSE}. If \code{TRUE}, the sampled +#' occurrence points will be plotted. #' @details #' \bold{How the function works:} #' -#' The function randomly selects \code{n} cells in which samples occur. If a \code{bias} -#' is chosen, then the selection of these cells will be biased according to the type and -#' strength of bias chosen. If the sampling is of \code{type "presence only"}, then only -#' cells where the species is present will be chosen. If the sampling is of -#' \code{type "presence-absence"}, then all non-NA cells can be chosen. +#' The function randomly selects \code{n} cells in which samples occur. If a +#' \code{bias} is chosen, then the selection of these cells will be biased +#' according to the type and strength of bias chosen. If the sampling is of +#' \code{type "presence only"}, then only cells where the species is present +#' will be chosen. If the sampling is of \code{type "presence-absence"}, then +#' all non-NA cells can be chosen. #' #' The function then samples the species inside the chosen cells. In cells #' where the species is present the species will always be sampled unless @@ -57,24 +70,25 @@ #' species will be sampled with the associated probability of detection. #' #' In cells where the species is absent (in case of a \code{"presence-absence"} -#' sampling), the function will always assign absence unless \code{error.probability} -#' is greater than 1. In that case, the species can be found present with the -#' associated probability of error. Note that this step happens AFTER the detection -#' step. Hence, in cells where the species is present but not detected, it can -#' still be sampled due to a sampling error. +#' sampling), the function will always assign absence unless +#' \code{error.probability} is greater than 1. In that case, the species can be +#' found present with the associated probability of error. Note that this step +#' happens AFTER the detection step. Hence, in cells where the species is +#' present but not detected, it can still be sampled due to a sampling error. #' #' \bold{How to restrict the sampling area:} #' #' Use the argument \code{sampling.area}: #' \itemize{ -#' \item{Provide the name (s) (or a combination of names) of country(ies), region(s) or continent(s). +#' \item{Provide the name (s) (or a combination of names) of country(ies), +#' region(s) or continent(s). #' Examples: #' \itemize{ #' \item{\code{sampling.area = "Africa"}} #' \item{\code{sampling.area = c("Africa", "North America", "France")}} #' }} -#' \item{Provide a polygon (\code{SpatialPolygons} or \code{SpatialPolygonsDataFrame} -#' of package \code{sp})} +#' \item{Provide a polygon (\code{SpatialPolygons} or +#' \code{SpatialPolygonsDataFrame} of package \code{sp})} #' \item{Provide an \code{extent} object} #' } #' @@ -100,7 +114,8 @@ #' #' List of possible \code{bias.area} names: #' \itemize{ -#' \item{Countries: type \code{levels(getMap()@@data$SOVEREIGNT)} in the console} +#' \item{Countries: type \code{levels(getMap()@@data$SOVEREIGNT)} in the +#' console} #' \item{Regions: "Africa", "Antarctica", "Asia", "Australia", "Europe", #' "North America", "South America"} #' \item{Continents: "Africa", "Antarctica", "Australia", "Eurasia", @@ -114,31 +129,60 @@ #' \item{an extent object: #' #' Set \code{bias} to \code{"extent"}, and either provide your -#' extent object to \code{bias.area}, or leave it \code{NULL} to draw an extent on -#' the map.} +#' extent object to \code{bias.area}, or leave it \code{NULL} to draw an extent +#' on the map.} #' } #' -#' Otherwise you can define manually your sampling bias, \emph{e.g.} to create -#' biases along roads. In that case you have to provide to \code{weights} a raster layer in which -#' each cell contains the probability to be sampled. -#' @return a \code{list} with 3 (unbiased sampling) to 4 (biased sampling) elements: +#' Otherwise you can enter a raster of sampling probability. It can be useful +#' if you want to increase likelihood of samplings in areas of high +#' suitability (simply enter the suitability raster in weights; see examples +#' below), +#' or if you want to define sampling biases manually, \emph{e.g.} to to create +#' biases along roads. In that case you have to provide to \code{weights} a +#' raster layer in which each cell contains the probability to be sampled. +#' +#' #' The \code{\link{.Random.seed}} and \code{\link{RNGkind}} are stored as +#' \code{\link{attributes}} when the function is called, and can be used to +#' reproduce the results as shown in the examples (though +#' it is preferable to set the seed with \code{\link{set.seed}} before calling +#' \code{sampleRecords()} and to then use the same value in +#' \code{\link{set.seed}} to reproduce results later. Note that +#' reproducing the sampling will only work if the same original distribution map +#' is used. +#' +#' @return a \code{list} with 7 elements: #' \itemize{ #' \item{\code{sample.points}: the data.frame containing the coordinates of -#' samples, the real presence-absences (or presence-only) and the sampled presence- -#' absences} +#' samples, the real presence-absences (or presence-only) and the sampled +#' presence-absences} #' \item{\code{detection.probability}: the chosen probability of detection of #' the virtual species} #' \item{\code{error.probability}: the chosen probability to assign presence #' in cells where the species is absent} #' \item{\code{bias}: if a bias was chosen, then the type of bias and the #' associated \code{area} will be included.} +#' \item{\code{replacement}: indicates whether multiple samples could occur +#' in the same cells} +#' \item{\code{original.distribution.raster}: the distribution raster from +#' which samples were drawn} +#' \item{\code{sample.plot}: a recorded plot showing the sampled points +#' overlaying the original distribution.} #' } +#' @note +#' Setting \code{sample.prevalence} may at least partly +#' override \code{bias}, e.g. if \code{bias} is specified with \code{extent} to +#' an area that contains no presences, but sample prevalence is set to > 0, +#' then cells outside of the biased sampling extent will be sampled until +#' the number of presences required by \code{sample.prevalence} are obtained, +#' after which the sampling of absences will proceed according to the specified +#' bias. #' @export #' @import raster -#' @importFrom utils installed.packages #' @importFrom graphics points +#' @importFrom utils installed.packages #' @author #' Boris Leroy \email{leroy.boris@@gmail.com} +#' Willson Gaul \email{wgaul@@hotmail.com} #' #' with help from C. N. Meynard, C. Bellard & F. Courchamp #' @examples @@ -159,9 +203,10 @@ #' # Sampling of 25 presences #' sampleOccurrences(sp, n = 25) #' -#' # Sampling of 30 presences and absebces +#' # Sampling of 30 presences and absences #' sampleOccurrences(sp, n = 30, type = "presence-absence") #' +#' #' # Reducing of the probability of detection #' sampleOccurrences(sp, n = 30, type = "presence-absence", #' detection.probability = 0.5) @@ -171,6 +216,7 @@ #' detection.probability = 0.5, #' correct.by.suitability = TRUE) #' +#' #' # Creating sampling errors (far too much) #' sampleOccurrences(sp, n = 30, type = "presence-absence", #' error.probability = 0.5) @@ -190,7 +236,35 @@ #' bias.strength = 0, #' bias.area = biased.area) #' # Showing the area in which the sampling is biased -#' plot(biased.area, add = TRUE) +#' plot(biased.area, add = TRUE) +#' samps <- sampleOccurrences(sp, n = 50, +#' bias = "manual", +#' weights = sp$suitab.raster) +#' plot(sp$suitab.raster) +#' points(samps$sample.points[, c("x", "y")]) +#' +#' # Create a sampling bias so that more presences are sampled in areas with +#' # higher suitability +#' +#' +#' +#' +#' # Reproduce sampling based on the saved .Random.seed from a previous result +#' samps <- sampleOccurrences(sp, n = 100, +#' type = "presence-absence", +#' detection.probability = 0.7, +#' bias = "extent", +#' bias.strength = 50, +#' bias.area = biased.area) +#' # Reset the random seed using the value saved in the attributes +#' .Random.seed <- attr(samps, "seed") +#' reproduced_samps <- sampleOccurrences(sp, n = 100, +#' type = "presence-absence", +#' detection.probability = 0.7, +#' bias = "extent", +#' bias.strength = 50, +#' bias.area = biased.area) +#' identical(samps$sample.points, reproduced_samps$sample.points) @@ -205,9 +279,21 @@ sampleOccurrences <- function(x, n, bias.area = NULL, weights = NULL, sample.prevalence = NULL, + replacement = FALSE, plot = TRUE) { - results <- list() + results <- list(detection.probability = list( + detection.probability = detection.probability, + correct.by.suitability = correct.by.suitability), + error.probability = error.probability, + bias = NULL, + replacement = replacement, + original.distribution.raster = NULL, + sample.plot = NULL) + if(is.null(.Random.seed)) {stats::runif(1)} # initialize random seed if there is none + attr(results, "RNGkind") <- RNGkind() + attr(results, "seed") <- .Random.seed + if("virtualspecies" %in% class(x)) { @@ -217,13 +303,15 @@ sampleOccurrences <- function(x, n, } else if("RasterLayer" %in% class(x$pa.raster)) { sp.raster <- x$pa.raster - } else stop("x must be:\n- a raster layer object\nor\n- the output list from functions - generateRandomSp(), convertToPA() or limitDistribution()") + } else stop("x must be:\n- a raster layer object\nor\n- the output list from + functions generateRandomSp(), convertToPA() or + limitDistribution()") } else if ("RasterLayer" %in% class(x)) { sp.raster <- x - } else stop("x must be:\n- a raster layer object\nor\n- the output list from functions - generateRandomSp(), convertToPA() or limitDistribution()") + } else stop("x must be:\n- a raster layer object\nor\n- the output list from + functions generateRandomSp(), convertToPA() or + limitDistribution()") if(sp.raster@data@max > 1 | sp.raster@data@min < 0) { @@ -231,7 +319,8 @@ sampleOccurrences <- function(x, n, Please make sure that the provided raster is a correct P/A raster and not a suitability raster.") } - original.raster <- sp.raster + results$original.distribution.raster <- original.raster <- sp.raster + if(!is.null(sample.prevalence)) { @@ -252,10 +341,13 @@ sampleOccurrences <- function(x, n, { stop("The choosen sampling.area is incorrectly spelled.\n Type 'levels(getMap()@data$SOVEREIGNT)', 'levels(worldmap@data$REGION)' and levels(worldmap@data$continent) to obtain valid names.") } - sampling.area <- worldmap[which(worldmap@data$SOVEREIGNT %in% sampling.area | - worldmap@data$REGION %in% sampling.area | - worldmap@data$continent %in% sampling.area), ] - } else if(!(class(sampling.area) %in% c("SpatialPolygons", "SpatialPolygonsDataFrame", "Extent"))) + sampling.area <- worldmap[which( + worldmap@data$SOVEREIGNT %in% sampling.area | + worldmap@data$REGION %in% sampling.area | + worldmap@data$continent %in% sampling.area), ] + } else if(!(class(sampling.area) %in% c("SpatialPolygons", + "SpatialPolygonsDataFrame", + "Extent"))) { stop("Please provide to sampling.area either \n - the names of countries, region and/or continents in which to sample\n @@ -265,12 +357,12 @@ sampleOccurrences <- function(x, n, } sample.area.raster1 <- rasterize(sampling.area, - sp.raster, - field = 1, - background = NA, + sp.raster, + field = 1, + background = NA, silent = TRUE) sp.raster <- sp.raster * sample.area.raster1 - } + } if(correct.by.suitability) @@ -280,14 +372,16 @@ sampleOccurrences <- function(x, n, stop("If you choose to weight the probability of detection by the suitability of the species (i.e., correct.by.suitability = TRUE), then you need to provide an appropriate virtual species containing a suitability raster to x.") } - } + } - if(!is.numeric(detection.probability) | detection.probability > 1 | detection.probability < 0) + if(!is.numeric(detection.probability) | detection.probability > 1 | + detection.probability < 0) { stop("detection.probability must be a numeric value between 0 and 1") } - if(!is.numeric(error.probability) | error.probability > 1 | error.probability < 0) + if(!is.numeric(error.probability) | error.probability > 1 | + error.probability < 0) { stop("error.probability must be a numeric value between 0 and 1") } @@ -297,9 +391,11 @@ sampleOccurrences <- function(x, n, stop('Only one bias can be applied at a time') } - if (!(bias %in% c("no.bias", "country", "region", "continent", "extent", "polygon", "manual"))) + if (!(bias %in% c("no.bias", "country", "region", "continent", "extent", + "polygon", "manual"))) { - stop('Argument bias must be one of : "no.bias", "country", "region", "continent", "extent", "polygon", "manual"') + stop('Argument bias must be one of : "no.bias", "country", "region", + "continent", "extent", "polygon", "manual"') } if(!is.numeric(bias.strength) & bias != "no.bias") @@ -311,7 +407,8 @@ sampleOccurrences <- function(x, n, { if(!("rworldmap" %in% rownames(installed.packages()))) { - stop('You need to install the package "rworldmap" in order to use bias = "region" or bias = "country"') + stop('You need to install the package "rworldmap" in order to use bias = + "region" or bias = "country"') } worldmap <- rworldmap::getMap() @@ -319,7 +416,8 @@ sampleOccurrences <- function(x, n, { if (any(!(bias.area %in% levels(worldmap@data$SOVEREIGNT)))) { - stop("country name(s) must be correctly spelled. Type 'levels(getMap()@data$SOVEREIGNT)' to obtain valid names.") + stop("country name(s) must be correctly spelled. Type + 'levels(getMap()@data$SOVEREIGNT)' to obtain valid names.") } results$bias <- list(bias = bias, bias.strength = bias.strength, @@ -328,8 +426,10 @@ sampleOccurrences <- function(x, n, { if (any(!(bias.area %in% levels(worldmap@data$REGION)))) { - stop(paste("region name(s) must be correctly spelled, according to one of the following : ", - paste(levels(worldmap@data$REGION), collapse = ", "), sep = "\n")) + stop(paste("region name(s) must be correctly spelled, according to one + of the following : ", + paste(levels(worldmap@data$REGION), collapse = ", "), + sep = "\n")) } results$bias <- list(bias = bias, bias.strength = bias.strength, @@ -338,28 +438,36 @@ sampleOccurrences <- function(x, n, { if (any(!(bias.area %in% levels(worldmap@data$continent)))) { - stop(paste("region name(s) must be correctly spelled, according to one of the following : ", - paste(levels(worldmap@data$continent), collapse = ", "), sep = "\n")) + stop(paste("region name(s) must be correctly spelled, + according to one of the following : ", + paste(levels(worldmap@data$continent), collapse = ", "), + sep = "\n")) } results$bias <- list(bias = bias, bias.strength = bias.strength, bias.area = bias.area) } } - if (bias == "polygon") # Projections are not checked here. Perhaps we should add projection check between raster & polygon in the future? - # This is especially important given that randomPoints weights samplings by the cell area (because cells closer to - # the equator are larger) + if (bias == "polygon") # Projections are not checked here. Perhaps we should + # add projection check between raster & polygon in the future? + # This is especially important given that randomPoints weights samplings by + # the cell area (because cells closer to the equator are larger) { - if(!(class(bias.area) %in% c("SpatialPolygons", "SpatialPolygonsDataFrame"))) + if(!(class(bias.area) %in% c("SpatialPolygons", + "SpatialPolygonsDataFrame"))) { - stop("If you choose bias = 'polygon', please provide a polygon of class SpatialPolygons or SpatialPolygonsDataFrame to argument bias.area") + stop("If you choose bias = 'polygon', please provide a polygon of class + SpatialPolygons or SpatialPolygonsDataFrame to argument bias.area") } - warning("Polygon projection is not checked. Please make sure you have the same projections between your polygon and your presence-absence raster") + + warning("Polygon projection is not checked. Please make sure you have the + same projections between your polygon and your presence-absence + raster") results$bias <- list(bias = bias, bias.strength = bias.strength, bias.area = bias.area) } - + if (bias == "extent") { results$bias <- list(bias = bias, @@ -370,6 +478,7 @@ sampleOccurrences <- function(x, n, if(type == "presence-absence") { sample.raster <- sp.raster + # Set all non-NA cells to 1 so that they are included in p/a sampling sample.raster[!is.na(sample.raster)] <- 1 } else if (type == "presence only") { @@ -380,10 +489,12 @@ sampleOccurrences <- function(x, n, { if(!("RasterLayer" %in% class(weights))) { - stop("You must provide a raster layer of weights (to argument weights) if you choose bias == 'manual'") + stop("You must provide a raster layer of weights (to argument weights) + if you choose bias == 'manual'") } - bias.raster <- weights * sample.raster + bias.raster <- weights results$bias <- list(bias = bias, + bias.strength = "Defined by raster weights", weights = weights) } else { @@ -392,27 +503,30 @@ sampleOccurrences <- function(x, n, if(bias == "country") { - bias.raster1 <- rasterize(worldmap[which(worldmap@data$SOVEREIGNT %in% bias.area), ], - bias.raster, - field = bias.strength, - background = 1, - silent = TRUE) + bias.raster1 <- rasterize( + worldmap[which(worldmap@data$SOVEREIGNT %in% bias.area), ], + bias.raster, + field = bias.strength, + background = 1, + silent = TRUE) bias.raster <- bias.raster * bias.raster1 } else if(bias == "region") { - bias.raster1 <- rasterize(worldmap[which(worldmap@data$REGION %in% bias.area), ], - bias.raster, - field = bias.strength, - background = 1, - silent = TRUE) + bias.raster1 <- rasterize( + worldmap[which(worldmap@data$REGION %in% bias.area), ], + bias.raster, + field = bias.strength, + background = 1, + silent = TRUE) bias.raster <- bias.raster * bias.raster1 } else if(bias == "continent") { - bias.raster1 <- rasterize(worldmap[which(levels(worldmap@data$continent) %in% bias.area), ], - bias.raster, - field = bias.strength, - background = 1, - silent = TRUE) + bias.raster1 <- rasterize( + worldmap[which(levels(worldmap@data$continent) %in% bias.area), ], + bias.raster, + field = bias.strength, + background = 1, + silent = TRUE) bias.raster <- bias.raster * bias.raster1 } else if(bias == "extent") { @@ -422,8 +536,11 @@ sampleOccurrences <- function(x, n, plot(sp.raster) bias.area <- drawExtent(show = TRUE) } - bias.raster <- bias.raster * rasterize(bias.area, sp.raster, field = bias.strength, background = 1) + bias.raster <- bias.raster * rasterize(bias.area, sp.raster, + field = bias.strength, + background = 1) results$bias <- list(bias = bias, + bias.strength = bias.strength, bias.area = bias.area) } else if(bias == "polygon") { @@ -434,28 +551,53 @@ sampleOccurrences <- function(x, n, silent = TRUE) bias.raster <- bias.raster * bias.raster1 } - + if(bias != "no.bias") { if(type == "presence only") { - sample.points <- dismo::randomPoints(sample.raster * bias.raster, n = n, prob = TRUE, tryf = 1) + + number.errors <- stats::rbinom(n = 1, size = 50, prob = error.probability) + sample.points <- .randomPoints(sample.raster * bias.raster, + n = number.errors, + prob = TRUE, tryf = 1, + replaceCells = replacement) + sample.points <- rbind(sample.points, + .randomPoints(sample.raster * bias.raster, + n = n - number.errors, + prob = TRUE, tryf = 1, + replaceCells = replacement)) } else { if(is.null(sample.prevalence)) { - sample.points <- dismo::randomPoints(sample.raster * bias.raster, n = n, prob = TRUE, tryf = 1) + sample.points <- .randomPoints(sample.raster * bias.raster, n = n, + prob = TRUE, tryf = 1, + replaceCells = replacement) } else - { - tmp1 <- sample.points + { + if(replacement) + { + message("Argument replacement = TRUE implies that sample prevalence + may not necessarily be respected in geographical space. Sample + prevalence will be respected, but multiple samples can occur + in the same cell so that spatial prevalence will be different. + ") + } + tmp1 <- sample.raster tmp1[sp.raster != 1] <- NA - sample.points <- dismo::randomPoints(tmp1 * bias.raster, n = sample.prevalence * n, prob = TRUE, tryf = 1) + sample.points <- .randomPoints(tmp1 * bias.raster, + n = sample.prevalence * n, + prob = TRUE, tryf = 1, + replaceCells = replacement) tmp1 <- sample.raster tmp1[sp.raster != 0] <- NA sample.points <- rbind(sample.points, - dismo::randomPoints(tmp1 * bias.raster, n = (1 - sample.prevalence) * n, prob = TRUE, - tryf = 1)) + .randomPoints(tmp1 * bias.raster, + n = (1 - sample.prevalence) * n, + prob = TRUE, tryf = 1, + replaceCells = replacement)) rm(tmp1) } } @@ -463,21 +605,37 @@ sampleOccurrences <- function(x, n, { if(type == "presence only") { - sample.points <- dismo::randomPoints(sample.raster, n = n, prob = TRUE, tryf = 1) + + number.errors <- stats::rbinom(n = 1, size = n, prob = error.probability) + sample.points <- .randomPoints(sample.raster, n = number.errors, + prob = TRUE, tryf = 1, + replaceCells = replacement) + sample.points <- rbind(sample.points, + .randomPoints(sample.raster, n = n - number.errors, + prob = TRUE, tryf = 1, + replaceCells = replacement)) + } else { if(is.null(sample.prevalence)) { - sample.points <- dismo::randomPoints(sample.raster, n = n, prob = TRUE, tryf = 1) + sample.points <- .randomPoints(sample.raster, n = n, + prob = TRUE, tryf = 1, + replaceCells = replacement) } else { tmp1 <- sample.raster tmp1[sp.raster != 1] <- NA - sample.points <- dismo::randomPoints(tmp1, n = sample.prevalence * n, prob = TRUE, tryf = 1) + sample.points <- .randomPoints(tmp1, n = sample.prevalence * n, + prob = TRUE, tryf = 1, + replaceCells = replacement) tmp1 <- sample.raster tmp1[sp.raster != 0] <- NA sample.points <- rbind(sample.points, - dismo::randomPoints(tmp1, n = (1 - sample.prevalence) * n, prob = TRUE, tryf = 1)) + .randomPoints(tmp1, + n = (1 - sample.prevalence) * n, + prob = TRUE, tryf = 1, + replaceCells = replacement)) rm(tmp1) } } @@ -485,81 +643,86 @@ sampleOccurrences <- function(x, n, if(type == "presence only") { - if(error.probability != 0) - { - warning("The error probability has no impact when sampling 'presence only' - points, because these samplings occur only within the boundaries of the species range") - } sample.points <- data.frame(sample.points, - Real = 1, - Observed = sample(c(NA, 1), - size = nrow(sample.points), - prob = c(1 - detection.probability, - detection.probability), - replace = TRUE)) - } else if(type == "presence-absence") - { - sample.points <- data.frame(sample.points, - Real = extract(sp.raster, sample.points)) - - if(correct.by.suitability) - { - suitabs <- extract(x$suitab.raster, sample.points[, c("x", "y")]) - } else { suitabs <- rep(1, nrow(sample.points)) } - - sample.points$Observed <- NA - if(correct.by.suitability) - { - sample.points$Observed[which(sample.points$Real == 1)] <- - sapply(detection.probability * suitabs[which(sample.points$Real == 1)], - function(y) - { - sample(c(0, 1), - size = 1, - prob = c(1 - y, y)) - }) - } else + Real = extract(sp.raster, sample.points), + Observed = sample( + c(NA, 1), + size = nrow(sample.points), + prob = c(1 - detection.probability, + detection.probability), + replace = TRUE)) + } else if(type == "presence-absence") { - sample.points$Observed[which(sample.points$Real == 1)] <- - sample(c(0, 1), size = length(which(sample.points$Real == 1)), - prob = c(1 - detection.probability, detection.probability), + sample.points <- data.frame(sample.points, + Real = extract(sp.raster, sample.points)) + + if(correct.by.suitability) + { + suitabs <- extract(x$suitab.raster, sample.points[, c("x", "y")]) + } else { suitabs <- rep(1, nrow(sample.points)) } + + sample.points$Observed <- NA + if(correct.by.suitability) + { + sample.points$Observed[which(sample.points$Real == 1)] <- + sapply(detection.probability * suitabs[ + which(sample.points$Real == 1) + ], + function(y) + { + sample(c(0, 1), + size = 1, + prob = c(1 - y, y)) + }) + } else + { + sample.points$Observed[which(sample.points$Real == 1)] <- + sample(c(0, 1), size = length(which(sample.points$Real == 1)), + prob = c(1 - detection.probability, detection.probability), + replace = TRUE) + } + sample.points$Observed[which(sample.points$Real == 0 | + sample.points$Observed == 0)] <- + sample(c(0, 1), size = length(which(sample.points$Real == 0 | + sample.points$Observed == 0)), + prob = c(1 - error.probability, error.probability), replace = TRUE) + } - sample.points$Observed[which(sample.points$Real == 0 | sample.points$Observed == 0)] <- - sample(c(0, 1), size = length(which(sample.points$Real == 0 | sample.points$Observed == 0)), - prob = c(1 - error.probability, error.probability), - replace = TRUE) - - } if(plot) { plot(original.raster) + # par(new = TRUE) if(type == "presence only") { points(sample.points[, c("x", "y")], pch = 16, cex = .5) } else { - points(sample.points[sample.points$Observed == 1, c("x", "y")], pch = 16, cex = .8) - points(sample.points[sample.points$Observed == 0, c("x", "y")], pch = 1, cex = .8) + points(sample.points[sample.points$Observed == 1, c("x", "y")], + pch = 16, cex = .8) + # par(new = TRUE) + points(sample.points[sample.points$Observed == 0, c("x", "y")], + pch = 1, cex = .8) } + results$sample.plot <- grDevices::recordPlot() } - - + + results$sample.points <- sample.points - results$detection.probability <- detection.probability - results$error.probability <- error.probability if(type == "presence-absence") { - - true.prev <- length(sample.points$Real[which(sample.points$Real == 1)]) / nrow(sample.points) - obs.prev <- length(sample.points$Real[which(sample.points$Observed == 1)]) / nrow(sample.points) + + true.prev <- length(sample.points$Real[which( + sample.points$Real == 1)]) / nrow(sample.points) + obs.prev <- length(sample.points$Real[which( + sample.points$Observed == 1)]) / nrow(sample.points) results$sample.prevalence <- c(true.sample.prevalence = true.prev, observed.sample.prevalence = obs.prev) } - - class(results) <- append(class(results), "VSSampledPoints") + + class(results) <- append("VSSampledPoints", class(results)) return(results) } \ No newline at end of file diff --git a/R/sp.env.functions.R b/R/sp.env.functions.R index 5761a9d..1005f08 100644 --- a/R/sp.env.functions.R +++ b/R/sp.env.functions.R @@ -142,7 +142,7 @@ custnorm <- function(x, mean, diff, prob) #' @details #' p1 and p2 can be seen as the upper and lower critical threshold of the curve. #' \code{alpha} and \code{gamma} control the shape of the curve near p1 and p2, respectively. -#' When \code{alpha} = \code{gamma}, the curve is symetric. Low values of \code{alpha} and \code{gamma} +#' When \code{alpha} = \code{gamma}, the curve is symmetric. Low values of \code{alpha} and \code{gamma} #' result in smooth (< 1) to plateau (< 0.01) curves. Higher values result in #' peak (> 10) curves. #' @@ -169,28 +169,28 @@ betaFun <- function(x, p1, p2, alpha, gamma) ifelse(x > p1 & x < p2, k * ((x - p1)^alpha) * (p2 - x)^gamma, 0) } -#' Huisman-Olff-Fresco response function -#' -##' @description A Huisman-Olff-Fresco response function: -#' \deqn{P = \frac{1}{{1 + e^{a + b x}}} \frac{1}{1 + e^{c - dx}}}{ -#' P = (1 / (1 + exp(a + bx))) * (1 / (1 + exp(c -dx)))} -#' @param x a numeric value or vector -#' @param a a numeric value or vector -#' @param b a numeric value or vector -#' @param c a numeric value or vector -#' @return a numeric value or vector resulting from the function -#' @seealso \code{\link{linearFun}}, \code{\link{quadraticFun}} -#' @author -#' Boris Leroy \email{leroy.boris@@gmail.com} -#' -#' Maintainer: Boris Leroy \email{leroy.boris@@gmail.com} -#' @examples -#' temp <- seq(-10, 40, length = 100) -#' # A curve similar to a thermal performance curve -#' P <- HOFFun(x = temp, a = -200, b = 10, c = 10, d = 0.1) -#' plot(P ~ temp, type = "l") -#' -#' +# Huisman-Olff-Fresco response function +# +# @description A Huisman-Olff-Fresco response function: +# \deqn{P = \frac{1}{{1 + e^{a + b x}}} \frac{1}{1 + e^{c - dx}}}{ +# P = (1 / (1 + exp(a + bx))) * (1 / (1 + exp(c -dx)))} +# @param x a numeric value or vector +# @param a a numeric value or vector +# @param b a numeric value or vector +# @param c a numeric value or vector +# @return a numeric value or vector resulting from the function +# @seealso \code{\link{linearFun}}, \code{\link{quadraticFun}} +# @author +# Boris Leroy \email{leroy.boris@@gmail.com} +# +# Maintainer: Boris Leroy \email{leroy.boris@@gmail.com} +# @examples +# temp <- seq(-10, 40, length = 100) +# # A curve similar to a thermal performance curve +# P <- HOFFun(x = temp, a = -200, b = 10, c = 10, d = 0.1) +# plot(P ~ temp, type = "l") +# +# # .HOFFun <- function(x, a, b, c, d) # { # if (a == 0 & b == 0) diff --git a/R/utilityFunctions.R b/R/utilityFunctions.R new file mode 100644 index 0000000..c4382ed --- /dev/null +++ b/R/utilityFunctions.R @@ -0,0 +1,187 @@ +# These functions have been duplicated from package dismo (author RJ Hijmans) +# for modifications. +# We duplicated them here because we needed a change in randomPoints +# function, since we use it in a manner that was not initially intended. + + +.randomCellsLonLat <- function(r, n) { + # sampling cells with weights based on their acutal area, to avoid sampling + # too many cells when going towards the poles + # as suggested by Jane Elith + y <- yFromRow(r, 1:nrow(r)) # get y coordinates for all rows + dx <- pointDistance(cbind(0, y), cbind(xres(r), y), longlat=TRUE) # size of each cell in longitude directoin + dx <- dx / max(dx) # standardize to 1 + + row <- sample.int(nrow(r), n, replace = TRUE, prob = dx) # sample from rows using weights + col <- sample.int(ncol(r), n, replace = TRUE) # sample from cols, not using weights + rowcol <- unique(cbind(row, col)) + + maxrow <- pmax(1, round(dx * ncol(r))) + cellsel <- matrix(nrow=0, ncol=2) + for (i in unique(rowcol[,1])) { + a <- subset(rowcol, rowcol[,1]==i) + if (nrow(a) > maxrow[i]) { a <- a[1:maxrow[i]] } + cellsel <- rbind(cellsel, a) + } + + cells <- cellFromRowCol(r, cellsel[,1], cellsel[,2]) + return(cells) +} + + + +..randomCellsLonLat2 <- function(r, n) { + # only cells that are not NA + cells <- which(! is.na(getValues(r)) ) + # which rows are that? + rows <- rowFromCell(r, cells) + # what is the latitude? + y <- yFromRow(r, rows) + # what is the 'width' of a cell? + dx <- pointDistance(cbind(0, y), cbind(xres(r), y), longlat=TRUE) + + cells <- sample(cells, n, prob=dx) + return(cells) +} + + + +.randomPoints <- function(mask, n, p, ext=NULL, extf=1.1, excludep=TRUE, prob=FALSE, + cellnumbers=FALSE, tryf=3, warn=2, lonlatCorrection=TRUE, + replaceCells = FALSE) { + + if (nlayers(mask) > 1) { + mask <- raster(mask, 1) + } + + tryf <- max(round(tryf[1]), 1) + + if (missing(p)) { + excludep <- FALSE + } else { + if (inherits(p, 'SpatialPoints')) { + p <- sp::coordinates(p) + } + } + + if (class(ext)=='character') { + if (! ext %in% c('points')) { + stop("if ext is a character variable it should be 'points'") + } else if (missing(p)) { + warning("if p is missing, 'ext=points' is meaningless") + ext <- extent(mask) + } else { + ext <- extent(min(p[,1]), max(p[,1]), min(p[,2]), max(p[,2])) + } + } + + if (! is.null(ext)) { + ext <- extent(ext) + ext <- ext * extf + ext <- intersect(ext, extent(mask)) + mask2 <- crop(raster(mask), ext) + } else { + mask2 <- raster(mask) + } + + nn <- n * tryf + nn <- max(nn, 10) + + if (prob) { + stopifnot(hasValues(mask)) + cells <- crop(mask, mask2) + cells <- try( stats::na.omit(cbind(1:ncell(cells), getValues(cells)))) + if (class(cells) == 'try-error') { + stop("the raster is too large to be used with 'prob=TRUE'") + } + prob <- cells[,2] + cells <- cells[,1] + if (couldBeLonLat(mask)) { + rows <- rowFromCell(mask2, cells) + y <- yFromRow(mask2, rows) + dx <- pointDistance(cbind(0, y), cbind(xres(mask2), y), longlat=TRUE) + prob <- prob * dx + } + + cells <- sample(cells, nn, prob=prob, replace = replaceCells) + xy <- xyFromCell(mask2, cells) + cells <- cellFromXY(mask, xy) + + } else if (canProcessInMemory(mask2)) { + + cells <- crop(mask, mask2) + if (hasValues(cells)) { + cells <- which(! is.na(getValues(cells)) ) + } else { + cells <- 1:ncell(cells) + } + nn <- min(length(cells), nn) + + if (lonlatCorrection & couldBeLonLat(mask)) { + # which rows are that? + rows <- rowFromCell(mask2, cells) + # what is the latitude? + y <- yFromRow(mask2, rows) + # what is the 'width' of a cell? + dx <- pointDistance(cbind(0, y), cbind(xres(mask2), y), longlat=TRUE) + cells <- sample(cells, nn, prob=dx, replace = replaceCells) + + } else { + cells <- sample(cells, nn, replace = replaceCells) + } + xy <- xyFromCell(mask2, cells) + cells <- cellFromXY(mask, xy) + + } else { + + nn <- min(ncell(mask2), nn) + if (couldBeLonLat(mask2)) { + + cells <- .randomCellsLonLat(mask2, nn) + + } else { + if (nn >= ncell(mask2)) { + cells <- 1:ncell(mask2) + } else { + cells <- sampleInt(ncell(mask2), nn, replace = replaceCells) + } + + } + xy <- xyFromCell(mask2, cells) + cells <- cellFromXY(mask, xy) + + if (hasValues(mask)) { + + vals <- cbind(cells, extract(mask, cells)) + cells <- stats::na.omit(vals)[,1] + } + } + + if (excludep) { + pcells <- cellFromXY(mask, p) + cells <- cells[ ! (cells %in% pcells) ] + } + + if (length(cells) > n) { + + cells <- sample(cells, n) + + } else if (length(cells) < n) { + + frac <- length(cells) / n + if (frac < 0.1) { + stop("generated random points = ", frac," times requested number; Use a higher value for tryf" ) + } + if (frac < 0.5 & warn==1) { + warning("generated random points = ", frac," times requested number; Use a higher value for tryf" ) + } else if (warn > 1) { + warning("generated random points = ", frac," times requested number") + } + } + + if (cellnumbers) { + return(cells) + } else { + return(xyFromCell(mask, cells)) + } +} diff --git a/R/virtualspecies-package.R b/R/virtualspecies-package.R index 6af1f0e..012c928 100644 --- a/R/virtualspecies-package.R +++ b/R/virtualspecies-package.R @@ -2,7 +2,7 @@ #' #' This package allows generating virtual species distributions, for example #' for testing species distribution modelling protocols. For a complete tutorial, -#' see http://borisleroy.com/virtualspecies +#' see borisleroy.com/virtualspecies #' #' #' @details @@ -37,7 +37,7 @@ #' of the virtual species} #' \item{\code{\link{removeCollinearity}}: this function can be used to remove #' collinearity among variables of a stack by selecting a subset of -#' non-intercorrelated variables} +#' non-colinear variables} #' \item{\code{\link{synchroniseNA}}: this function can be used to synchronise #' NA values among layers of a stack} #' } @@ -49,13 +49,12 @@ #' \itemize{ #' \item{This package makes extensive use of the package \code{\link{raster}} to obtain spatialised #' environmental variables, and produce spatialised virtual species distributions.} -#' \item{\code{\link{dismo}} is used to sample occurrence data from the generated virtual species.} #' \item{\code{\link{ade4}} is used to generate species with a PCA approach.} #' \item{\code{\link{rworldmap}} is used to obtain free world shapefiles, in order to create #' dispersal limitations and sampling biases.} #' } #' @references -#' Leroy, B. et al. 2015. virtualspecies, an R package to generate virtual species distributions. Ecography. In press. +#' Leroy, B. et al. 2016. virtualspecies, an R package to generate virtual species distributions. Ecography. 39(6):599-607 #' @author #' Boris Leroy \email{leroy.boris@@gmail.com} #' diff --git a/man/betaFun.Rd b/man/betaFun.Rd index 0d02def..9dd8049 100644 --- a/man/betaFun.Rd +++ b/man/betaFun.Rd @@ -29,7 +29,7 @@ k is automatically estimated to have a maximum value of P equal to 1. \details{ p1 and p2 can be seen as the upper and lower critical threshold of the curve. \code{alpha} and \code{gamma} control the shape of the curve near p1 and p2, respectively. -When \code{alpha} = \code{gamma}, the curve is symetric. Low values of \code{alpha} and \code{gamma} +When \code{alpha} = \code{gamma}, the curve is symmetric. Low values of \code{alpha} and \code{gamma} result in smooth (< 1) to plateau (< 0.01) curves. Higher values result in peak (> 10) curves. diff --git a/man/convertToPA.Rd b/man/convertToPA.Rd index 71fa69d..bba70d2 100644 --- a/man/convertToPA.Rd +++ b/man/convertToPA.Rd @@ -4,8 +4,8 @@ \alias{convertToPA} \title{Convert a virtual species distribution (or a suitability raster) into presence-absence} \usage{ -convertToPA(x, PA.method = "probability", beta = "random", alpha = -0.05, - species.prevalence = NULL, plot = TRUE) +convertToPA(x, PA.method = "probability", beta = "random", + alpha = -0.05, species.prevalence = NULL, plot = TRUE) } \arguments{ \item{x}{a suitability raster, or the output from functions @@ -46,7 +46,7 @@ environmental suitability)} \item{\code{PA.conversion}: the parameters used to convert the suitability into presence-absence} \item{\code{pa.raster}: the presence-absence map, as a Raster object containing 0 (absence) / 1 (presence) / NA} } -The structure of the virtualspecies object can be seen using str() +The structure of the virtualspecies object can be seen using \code{str()} } \description{ This functions converts the probabilities of presence from the output of diff --git a/man/formatFunctions.Rd b/man/formatFunctions.Rd index 220cc72..6c33380 100644 --- a/man/formatFunctions.Rd +++ b/man/formatFunctions.Rd @@ -17,7 +17,7 @@ plots are rescaled between 0 and 1 with the formula (val - min) / (max - min).} } \description{ This function is a helper function to simplify the formatting of functions -for generateSpFromFun. +for \code{\link{generateSpFromFun}} } \details{ This function formats the \code{parameters} argument of \code{\link{generateSpFromFun}}. diff --git a/man/generateRandomSp.Rd b/man/generateRandomSp.Rd index f537b5b..37a20e3 100644 --- a/man/generateRandomSp.Rd +++ b/man/generateRandomSp.Rd @@ -8,8 +8,9 @@ generateRandomSp(raster.stack, approach = "automatic", rescale = TRUE, convert.to.PA = TRUE, relations = c("gaussian", "linear", "logistic", "quadratic"), rescale.each.response = TRUE, realistic.sp = TRUE, species.type = "multiplicative", niche.breadth = "any", - sample.points = FALSE, nb.points = 10000, PA.method = "probability", - alpha = -0.1, beta = "random", species.prevalence = NULL, plot = TRUE) + sample.points = FALSE, nb.points = 10000, + PA.method = "probability", alpha = -0.1, adjust.alpha = TRUE, + beta = "random", species.prevalence = NULL, plot = TRUE) } \arguments{ \item{raster.stack}{a RasterStack object, in which each layer represent an environmental @@ -42,7 +43,7 @@ species. If \code{FALSE}, the responses will be randomly generated \item{species.type}{[response approach] \code{"additive"} or \code{"multiplicative"}. Defines how the final probability of presence is calculated: if \code{"additive"}, responses to each -variable are summed; if \code{"multiplicative"}, responses are multiplicated. +variable are summed; if \code{"multiplicative"}, responses are multiplied. See \code{\link{generateSpFromFun}}} \item{niche.breadth}{[pca approach] \code{"any"}, \code{"narrow"} or \code{"wide"}. This parameter @@ -68,6 +69,10 @@ probabilities are converted according to a logistic function of threshold shape the logistic function transforming occurrences into presence-absences. See \code{\link{logisticFun}} and examples therein for the choice of \code{alpha}} +\item{adjust.alpha}{\code{TRUE} or \code{FALSE}. Only useful if +\code{rescale = FALSE}. If \code{adjust.alpha = TRUE}, then the value of \code{alpha} will +be adjusted to an appropriate value for the range of suitabilities.} + \item{beta}{\code{"random"}, a numeric value in the range of your probabilities or \code{NULL}. This is the threshold of conversion into presence-absence (= the inflexion point if \code{PA.method = "probability"}). @@ -90,7 +95,7 @@ environmental suitability)} \item{\code{PA.conversion}: the parameters used to convert the suitability into presence-absence} \item{\code{pa.raster}: the presence-absence map, as a Raster object containing 0 (absence) / 1 (presence) / NA} } -The structure of the virtualspecies can object be seen using str() +The structure of the virtualspecies can object be seen using \code{str()} } \description{ This function generates randomly a virtual species distribution. @@ -100,7 +105,7 @@ This function generate random virtual species, either using a PCA approach, or u a response approach. In case of a response approach, only four response functions are currently used: gaussian, linear, logistic and quadratic functions. -Note that in case of numerouse predictor variables, the "response" approach will +Note that in case of numerous predictor variables, the "response" approach will not work well because it will often generate contradicting response functions (e.g., mean annual temperature optimum at degrees C and temperature of the coldest month at 10 degrees C). In these case, it is advised to use the PCA approach (by default, a PCA approach @@ -108,11 +113,11 @@ will be used if there are more than 6 predictor variables). If \code{rescale.each.response = TRUE}, then the probability response to each variable will be normalised between 0 and 1 according to the following formula: -P.rescaled = (P - min(P)) / (max(P) - min (P)). Simlarly, if \code{rescale = TRUE}, +P.rescaled = (P - min(P)) / (max(P) - min (P)). Similarly, if \code{rescale = TRUE}, the final environmental suitability will be rescaled between 0 and 1 with the same formula. By default, the function will perform a probabilistic conversion into presence- -absence, with a randomly chosen beta threshold. If you want to custmose the +absence, with a randomly chosen beta threshold. If you want to customise the conversion parameters, you have to define \bold{two} of the three following parameters: \itemize{ \item{\code{beta}: the 'threshold' of the logistic function (i.e. the diff --git a/man/generateSpFromBCA.Rd b/man/generateSpFromBCA.Rd index 9d0fdb0..1172c87 100644 --- a/man/generateSpFromBCA.Rd +++ b/man/generateSpFromBCA.Rd @@ -4,9 +4,10 @@ \alias{generateSpFromBCA} \title{Generate a virtual species distribution from a Between Component Analysis of environmental variables} \usage{ -generateSpFromBCA(raster.stack.current, raster.stack.future, rescale = TRUE, - niche.breadth = "any", means = NULL, sds = NULL, bca = NULL, - sample.points = FALSE, nb.points = 10000, plot = TRUE) +generateSpFromBCA(raster.stack.current, raster.stack.future, + rescale = TRUE, niche.breadth = "any", means = NULL, sds = NULL, + bca = NULL, sample.points = FALSE, nb.points = 10000, + plot = TRUE) } \arguments{ \item{raster.stack.current}{a RasterStack object, in which each layer represent an environmental @@ -52,7 +53,7 @@ current environmental suitability} \item{\code{suitab.raster.future}: the virtual species distribution, as a Raster object containing the future environmental suitability} } -The structure of the virtualspecies object can be seen using str() +The structure of the virtualspecies object can be seen using \code{str()} } \description{ A Between Component Analysis is similar to a PCA, except that two sets of environmental conditions @@ -88,13 +89,13 @@ standard deviation values. This range of values is based on a fraction of the axis: \itemize{ \item{\code{"any"}: the standard deviations can have values from 1\% to 50\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -then sd values along this axe can range from 0.1 to 5. +then sd values along this axis can range from 0.1 to 5. } \item{\code{"narrow"}: the standard deviations are limited between 1\% and 10\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -then sd values along this axe can range from 0.1 to 1. +then sd values along this axis can range from 0.1 to 1. } \item{\code{"wide"}: the standard deviations are limited between 10\% and 50\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -then sd values along this axe can range from 1 to 5. +then sd values along this axis can range from 1 to 5. } } If a \code{bca} object is provided, the output bca object will contain the new environments coordinates along the provided bca axes. diff --git a/man/generateSpFromFun.Rd b/man/generateSpFromFun.Rd index 6f2bcb6..cc3f906 100644 --- a/man/generateSpFromFun.Rd +++ b/man/generateSpFromFun.Rd @@ -4,9 +4,9 @@ \alias{generateSpFromFun} \title{Generate a virtual species distributions with responses to environmental variables} \usage{ -generateSpFromFun(raster.stack, parameters, rescale = TRUE, formula = NULL, - species.type = "multiplicative", rescale.each.response = TRUE, - plot = FALSE) +generateSpFromFun(raster.stack, parameters, rescale = TRUE, + formula = NULL, species.type = "multiplicative", + rescale.each.response = TRUE, plot = FALSE) } \arguments{ \item{raster.stack}{a RasterStack object, in which each layer represent an environmental @@ -24,7 +24,7 @@ layername3 * layername4 etc."}). If \code{NULL} then partial responses will be a \item{species.type}{\code{"additive"} or \code{"multiplicative"}. Only used if \code{formula = NULL}. Defines how the final environmental suitability is calculated: if \code{"additive"}, responses to each -variable are summed; if \code{"multiplicative"}, responses are multiplicated.} +variable are summed; if \code{"multiplicative"}, responses are multiplied.} \item{rescale.each.response}{\code{TRUE} or \code{FALSE}. If \code{TRUE}, the individual responses to each environmental variable are rescaled between 0 and 1 (see details).} @@ -38,7 +38,7 @@ a \code{list} with 3 elements: \item{\code{details}: the details and parameters used to generate the species} \item{\code{suitab.raster}: the raster containing the environmental suitability of the virtual species} } -The structure of the virtualspecies object can be seen using str() +The structure of the virtualspecies object can be seen using \code{str()} } \description{ This function generates a virtual species distribution from a RasterStack of environmental @@ -75,8 +75,8 @@ See the example section below for more examples.}} Any response function that can be applied to the environmental variables can be chosen here. Several functions are proposed in this package: \code{\link{linearFun}}, \code{\link{logisticFun}} and \code{\link{quadraticFun}}. -Another classical example is the normal distribution: \code{\link[stats]{dnorm}}. -Ther users can also create and use their own functions. +Another classical example is the normal distribution: \code{\link[stats:Normal]{stats::dnorm()}}. +Users can also create and use their own functions very easily. If \code{rescale.each.response = TRUE}, then the probability response to each diff --git a/man/generateSpFromPCA.Rd b/man/generateSpFromPCA.Rd index 451e4c3..0fa4c39 100644 --- a/man/generateSpFromPCA.Rd +++ b/man/generateSpFromPCA.Rd @@ -50,7 +50,7 @@ a \code{list} with 3 elements: \item{\code{suitab.raster}: the virtual species distribution, as a Raster object containing the environmental suitability} } -The structure of the virtualspecies object can be seen using str() +The structure of the virtualspecies object can be seen using \code{str()} } \description{ This functions generates a virtual species distribution by computing a @@ -75,13 +75,13 @@ standard deviation values. This range of values is based on a fraction of the axis: \itemize{ \item{\code{"any"}: the standard deviations can have values from 1\% to 50\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -then sd values along this axe can range from 0.1 to 5. +then sd values along this axis can range from 0.1 to 5. } \item{\code{"narrow"}: the standard deviations are limited between 1\% and 10\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -then sd values along this axe can range from 0.1 to 1. +then sd values along this axis can range from 0.1 to 1. } \item{\code{"wide"}: the standard deviations are limited between 10\% and 50\% of axes' ranges. For example if the first axis of the PCA ranges from -5 to +5, -then sd values along this axe can range from 1 to 5. +then sd values along this axis can range from 1 to 5. } } } diff --git a/man/limitDistribution.Rd b/man/limitDistribution.Rd index f83eceb..625fbb2 100644 --- a/man/limitDistribution.Rd +++ b/man/limitDistribution.Rd @@ -38,7 +38,7 @@ limit the distribution and the area in which the distribution is restricted} \item{\code{occupied.area}: the area occupied by the virtual species as a Raster of presence-absence} } -The structure of the virtualspecies object can be seen using str() +The structure of the virtualspecies object can be seen using \code{str()} } \description{ This function is designed to limit species distributions to a subsample of diff --git a/man/plotResponse.Rd b/man/plotResponse.Rd index d568903..752307f 100644 --- a/man/plotResponse.Rd +++ b/man/plotResponse.Rd @@ -44,7 +44,7 @@ then the function will automatically make the appropriate plots. Otherwise, you can provide a raster layer/stack of environmental variables to \code{x} and a list of functions to \code{parameters} to perform the plot. -In that case, you have to specifiy the \code{approach}: \code{"reponse"} or +In that case, you have to specify the \code{approach}: \code{"reponse"} or \code{"PCA"}: \itemize{ \item{if \code{approach = "response"}: Provide to \code{parameters} a diff --git a/man/removeCollinearity.Rd b/man/removeCollinearity.Rd index 5ad68f3..e32b7c0 100644 --- a/man/removeCollinearity.Rd +++ b/man/removeCollinearity.Rd @@ -36,9 +36,9 @@ a vector of non correlated variables, or a list where each element is a group of non correlated variables. } \description{ -This functions analyses the correlation among variables of the provideded +This functions analyses the correlation among variables of the provided stack of environmental variables (using Pearson's R), and can return a -vector containing names of variables that are not intercorrelated, or a list +vector containing names of variables that are not colinear, or a list containing grouping variables according to their degree of collinearity. } \details{ @@ -51,9 +51,9 @@ multicollinearity.cutoff} is detected, then the variables will be grouped according to their degree of correlation. If \code{select.variables = TRUE}, then the function will return a vector -containing variables that are not intercorrelated. +containing variables that are not colinear. The variables not correlated to any other variables are automatically included -in this vector. For each group of intercorrelated variables, one variable will +in this vector. For each group of colinear variables, one variable will be randomly chosen and included in this vector. } \examples{ diff --git a/man/sampleOccurrences.Rd b/man/sampleOccurrences.Rd index 65629bf..4c23ff8 100644 --- a/man/sampleOccurrences.Rd +++ b/man/sampleOccurrences.Rd @@ -8,86 +8,106 @@ sampleOccurrences(x, n, type = "presence only", sampling.area = NULL, detection.probability = 1, correct.by.suitability = FALSE, error.probability = 0, bias = "no.bias", bias.strength = 50, bias.area = NULL, weights = NULL, sample.prevalence = NULL, - plot = TRUE) + replacement = FALSE, plot = TRUE) } \arguments{ \item{x}{a \code{rasterLayer} object or the output list from -\code{generateSpFromFun}, \code{generateSpFromPCA}, \code{generateRandomSp}, \code{convertToPA} +\code{generateSpFromFun}, \code{generateSpFromPCA}, \code{generateRandomSp}, +\code{convertToPA} or \code{limitDistribution} The raster must contain values of 0 or 1 (or NA).} -\item{n}{an integer. The number of occurrence points to sample.} +\item{n}{an integer. The number of occurrence points / records to sample.} \item{type}{\code{"presence only"} or \code{"presence-absence"}. The type of occurrence points to sample.} -\item{sampling.area}{a character string, a \code{polygon} or an \code{extent} object. +\item{sampling.area}{a character string, a \code{polygon} or an \code{extent} +object. The area in which the sampling will take place. See details.} -\item{detection.probability}{a numeric value between 0 and 1, corresponding to the -probability of detection of the species. See details.} +\item{detection.probability}{a numeric value between 0 and 1, corresponding +to the probability of detection of the species. See details.} -\item{correct.by.suitability}{\code{TRUE} or \code{FALSE}. If \code{TRUE}, then -the probability of detection will be weighted by the suitability, such that -cells with lower suitabilities will further decrease the chance that the species -is detected when sampled.} +\item{correct.by.suitability}{\code{TRUE} or \code{FALSE}. If \code{TRUE}, +then the probability of detection will be weighted by the suitability, such +that cells with lower suitabilities will further decrease the chance that +the species is detected when sampled. NOTE: this will NOT increase +likelihood of samplings in areas of high suitability. In this case look for +argument weights.} -\item{error.probability}{\code{TRUE} or \code{FALSE}. Only useful if -\code{type = "presence-absence"}. Probability to attribute an erroneous presence -in cells where the species is absent.} +\item{error.probability}{\code{TRUE} or \code{FALSE}. Probability to +attribute an erroneous presence (False Positive) in cells where the species +is actually absent.} \item{bias}{\code{"no.bias"}, \code{"country"}, \code{"region"}, \code{"extent"}, \code{"polygon"} or \code{"manual"}. The method used to generate a sampling bias: see details.} -\item{bias.strength}{a positive numeric value. The strength of the bias to be applied -in \code{area} (as a multiplier). Above 1, \code{area} will be oversampled. Below 1, \code{area} -will be undersampled.} - -\item{bias.area}{\code{NULL}, a character string, a \code{polygon} or an \code{extent} object. -The area in which the sampling will be biased: see details. If \code{NULL} -and \code{bias = "extent"}, then you will be asked to draw an -extent on the map.} - -\item{weights}{\code{NULL} or a raster layer. Only used if \code{bias = "manual"}. -The raster of bias weights to be applied to the sampling of occurrences. -Higher weights mean a higher probability of sampling.} - -\item{sample.prevalence}{\code{NULL} or a numeric value between 0 and 1. Only useful if -\code{type = "presence-absence"}. Defines the sample prevalence, i.e. the proportion of presences -sampled. Note that the probabilities of detection and error are applied AFTER this parameter, -so the final sample prevalence may not different if you apply probabilities of detection and/or error} - -\item{plot}{\code{TRUE} or \code{FALSE}. If \code{TRUE}, the sampled occurrence -points will be plotted.} +\item{bias.strength}{a positive numeric value. The strength of the bias to be +applied in \code{area} (as a multiplier). Above 1, \code{area} will be +oversampled. Below 1, \code{area} will be undersampled.} + +\item{bias.area}{\code{NULL}, a character string, a \code{polygon} or an +\code{extent} object. The area in which the sampling will be biased: see +details. If \code{NULL} and \code{bias = "extent"}, then you will be asked to +draw an extent on the map.} + +\item{weights}{\code{NULL} or a raster layer. Only used if +\code{bias = "manual"}. The raster of bias weights to be applied to the +sampling of occurrences. Higher weights mean a higher probability of +sampling. For example, species suitability raster can be entered here to +increase likelihood of sampling occurrences in areas with high suitability.} + +\item{sample.prevalence}{\code{NULL} or a numeric value between 0 and 1. +Only useful if \code{type = "presence-absence"}. Defines the sample +prevalence, i.e. the proportion of presences sampled. Note that the +probabilities of detection and error are applied AFTER this parameter, +so the final sample prevalence may not different if you apply probabilities +of detection and/or error} + +\item{replacement}{\code{TRUE} or \code{FALSE}. If \code{TRUE}, multiple +samples can occur in the same cell. Can be useful to mimic real datasets +where samplings can be duplicated or repeated in time.} + +\item{plot}{\code{TRUE} or \code{FALSE}. If \code{TRUE}, the sampled +occurrence points will be plotted.} } \value{ -a \code{list} with 3 (unbiased sampling) to 4 (biased sampling) elements: +a \code{list} with 7 elements: \itemize{ \item{\code{sample.points}: the data.frame containing the coordinates of -samples, the real presence-absences (or presence-only) and the sampled presence- -absences} +samples, the real presence-absences (or presence-only) and the sampled +presence-absences} \item{\code{detection.probability}: the chosen probability of detection of the virtual species} \item{\code{error.probability}: the chosen probability to assign presence in cells where the species is absent} \item{\code{bias}: if a bias was chosen, then the type of bias and the associated \code{area} will be included.} +\item{\code{replacement}: indicates whether multiple samples could occur +in the same cells} +\item{\code{original.distribution.raster}: the distribution raster from +which samples were drawn} +\item{\code{sample.plot}: a recorded plot showing the sampled points +overlaying the original distribution.} } } \description{ -This function samples presences within a species distribution, either -randomly or with a sampling bias. The sampling bias can be defined manually -or with a set of pre-defined biases. +This function samples occurrences/records (presence only or presence-absence) +within a species distribution, either randomly or with a sampling bias. +The sampling bias can be defined manually or with a set of predefined +biases. } \details{ \bold{How the function works:} -The function randomly selects \code{n} cells in which samples occur. If a \code{bias} -is chosen, then the selection of these cells will be biased according to the type and -strength of bias chosen. If the sampling is of \code{type "presence only"}, then only -cells where the species is present will be chosen. If the sampling is of -\code{type "presence-absence"}, then all non-NA cells can be chosen. +The function randomly selects \code{n} cells in which samples occur. If a +\code{bias} is chosen, then the selection of these cells will be biased +according to the type and strength of bias chosen. If the sampling is of +\code{type "presence only"}, then only cells where the species is present +will be chosen. If the sampling is of \code{type "presence-absence"}, then +all non-NA cells can be chosen. The function then samples the species inside the chosen cells. In cells where the species is present the species will always be sampled unless @@ -95,24 +115,25 @@ the parameter \code{detection.probability} is lower than 1. In that case the species will be sampled with the associated probability of detection. In cells where the species is absent (in case of a \code{"presence-absence"} -sampling), the function will always assign absence unless \code{error.probability} -is greater than 1. In that case, the species can be found present with the -associated probability of error. Note that this step happens AFTER the detection -step. Hence, in cells where the species is present but not detected, it can -still be sampled due to a sampling error. +sampling), the function will always assign absence unless +\code{error.probability} is greater than 1. In that case, the species can be +found present with the associated probability of error. Note that this step +happens AFTER the detection step. Hence, in cells where the species is +present but not detected, it can still be sampled due to a sampling error. \bold{How to restrict the sampling area:} Use the argument \code{sampling.area}: \itemize{ -\item{Provide the name (s) (or a combination of names) of country(ies), region(s) or continent(s). +\item{Provide the name (s) (or a combination of names) of country(ies), +region(s) or continent(s). Examples: \itemize{ \item{\code{sampling.area = "Africa"}} \item{\code{sampling.area = c("Africa", "North America", "France")}} }} -\item{Provide a polygon (\code{SpatialPolygons} or \code{SpatialPolygonsDataFrame} -of package \code{sp})} +\item{Provide a polygon (\code{SpatialPolygons} or +\code{SpatialPolygonsDataFrame} of package \code{sp})} \item{Provide an \code{extent} object} } @@ -138,7 +159,8 @@ regions or continents to \code{bias.area} (see examples). List of possible \code{bias.area} names: \itemize{ -\item{Countries: type \code{levels(getMap()@data$SOVEREIGNT)} in the console} +\item{Countries: type \code{levels(getMap()@data$SOVEREIGNT)} in the +console} \item{Regions: "Africa", "Antarctica", "Asia", "Australia", "Europe", "North America", "South America"} \item{Continents: "Africa", "Antarctica", "Australia", "Eurasia", @@ -152,13 +174,35 @@ polygon to \code{area}. \item{an extent object: Set \code{bias} to \code{"extent"}, and either provide your -extent object to \code{bias.area}, or leave it \code{NULL} to draw an extent on -the map.} +extent object to \code{bias.area}, or leave it \code{NULL} to draw an extent +on the map.} } -Otherwise you can define manually your sampling bias, \emph{e.g.} to create -biases along roads. In that case you have to provide to \code{weights} a raster layer in which -each cell contains the probability to be sampled. +Otherwise you can enter a raster of sampling probability. It can be useful +if you want to increase likelihood of samplings in areas of high +suitability (simply enter the suitability raster in weights; see examples +below), +or if you want to define sampling biases manually, \emph{e.g.} to to create +biases along roads. In that case you have to provide to \code{weights} a +raster layer in which each cell contains the probability to be sampled. + +#' The \code{\link{.Random.seed}} and \code{\link{RNGkind}} are stored as +\code{\link{attributes}} when the function is called, and can be used to +reproduce the results as shown in the examples (though +it is preferable to set the seed with \code{\link{set.seed}} before calling +\code{sampleRecords()} and to then use the same value in +\code{\link{set.seed}} to reproduce results later. Note that +reproducing the sampling will only work if the same original distribution map +is used. +} +\note{ +Setting \code{sample.prevalence} may at least partly +override \code{bias}, e.g. if \code{bias} is specified with \code{extent} to +an area that contains no presences, but sample prevalence is set to > 0, +then cells outside of the biased sampling extent will be sampled until +the number of presences required by \code{sample.prevalence} are obtained, +after which the sampling of absences will proceed according to the specified +bias. } \examples{ # Create an example stack with six environmental variables @@ -178,9 +222,10 @@ sp <- generateRandomSp(env, niche.breadth = "wide") # Sampling of 25 presences sampleOccurrences(sp, n = 25) -# Sampling of 30 presences and absebces +# Sampling of 30 presences and absences sampleOccurrences(sp, n = 30, type = "presence-absence") + # Reducing of the probability of detection sampleOccurrences(sp, n = 30, type = "presence-absence", detection.probability = 0.5) @@ -190,6 +235,7 @@ sampleOccurrences(sp, n = 30, type = "presence-absence", detection.probability = 0.5, correct.by.suitability = TRUE) + # Creating sampling errors (far too much) sampleOccurrences(sp, n = 30, type = "presence-absence", error.probability = 0.5) @@ -209,10 +255,39 @@ sampleOccurrences(sp, n = 50, type = "presence-absence", bias.strength = 0, bias.area = biased.area) # Showing the area in which the sampling is biased -plot(biased.area, add = TRUE) +plot(biased.area, add = TRUE) +samps <- sampleOccurrences(sp, n = 50, + bias = "manual", + weights = sp$suitab.raster) +plot(sp$suitab.raster) +points(samps$sample.points[, c("x", "y")]) + +# Create a sampling bias so that more presences are sampled in areas with +# higher suitability + + + + +# Reproduce sampling based on the saved .Random.seed from a previous result +samps <- sampleOccurrences(sp, n = 100, + type = "presence-absence", + detection.probability = 0.7, + bias = "extent", + bias.strength = 50, + bias.area = biased.area) +# Reset the random seed using the value saved in the attributes +.Random.seed <- attr(samps, "seed") +reproduced_samps <- sampleOccurrences(sp, n = 100, + type = "presence-absence", + detection.probability = 0.7, + bias = "extent", + bias.strength = 50, + bias.area = biased.area) +identical(samps$sample.points, reproduced_samps$sample.points) } \author{ Boris Leroy \email{leroy.boris@gmail.com} +Willson Gaul \email{wgaul@hotmail.com} with help from C. N. Meynard, C. Bellard & F. Courchamp } diff --git a/man/virtualspecies-package.Rd b/man/virtualspecies-package.Rd index 976ba43..b87cd33 100644 --- a/man/virtualspecies-package.Rd +++ b/man/virtualspecies-package.Rd @@ -7,7 +7,7 @@ \description{ This package allows generating virtual species distributions, for example for testing species distribution modelling protocols. For a complete tutorial, -see http://borisleroy.com/virtualspecies +see borisleroy.com/virtualspecies } \details{ The process of generating a virtual species distribution is divided into @@ -41,7 +41,7 @@ illustrate the response functions as a correct input for \code{\link{generateSpF of the virtual species} \item{\code{\link{removeCollinearity}}: this function can be used to remove collinearity among variables of a stack by selecting a subset of -non-intercorrelated variables} +non-colinear variables} \item{\code{\link{synchroniseNA}}: this function can be used to synchronise NA values among layers of a stack} } @@ -53,14 +53,13 @@ This packages makes use of different other packages: \itemize{ \item{This package makes extensive use of the package \code{\link{raster}} to obtain spatialised environmental variables, and produce spatialised virtual species distributions.} -\item{\code{\link{dismo}} is used to sample occurrence data from the generated virtual species.} \item{\code{\link{ade4}} is used to generate species with a PCA approach.} \item{\code{\link{rworldmap}} is used to obtain free world shapefiles, in order to create dispersal limitations and sampling biases.} } } \references{ -Leroy, B. et al. 2015. virtualspecies, an R package to generate virtual species distributions. Ecography. In press. +Leroy, B. et al. 2016. virtualspecies, an R package to generate virtual species distributions. Ecography. 39(6):599-607 } \author{ Boris Leroy \email{leroy.boris@gmail.com}