You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When generating a given number of background points (e.g. 1000), SpeciesDistributionToolkit.sample will often return fewer background sites (~970), which is counterintuitive when trying to generate the same number of background points as occurrence ones.
This is because we use StatsBase.sample(keys(replace(layer, false => nothing)), n; kwargs...) internally, which uses replace=true by default to sample with replacement. The sampling then selects the same keys multiple times, which results in fewer background sites, especially with small layers. Using replace=false fixes the issues.
@tpoisot Should we make replace=false the default? I think the users' intent will most often be to generate a given number of background sites. Otherwise, I'll add something to the function documentation and vignette to make the fix more obvious.
Here's an example adapted from the vignettes:
using SpeciesDistributionToolkit
using CairoMakie
using Random
spatial_extent = (left =3.0, bottom =55.2, right =19.7, top =64.9)
rangifer =taxon("Rangifer tarandus tarandus"; strict =false)
query = [
"occurrenceStatus"=>"PRESENT",
"hasCoordinate"=>true,
"decimalLatitude"=> (spatial_extent.bottom, spatial_extent.top),
"decimalLongitude"=> (spatial_extent.left, spatial_extent.right),
"limit"=>300,
]
presences =occurrences(rangifer, query...)
for i in1:3occurrences!(presences)
end
temperature =SimpleSDMPredictor(RasterData(WorldClim2, BioClim); spatial_extent...)
presencelayer =mask(temperature, presences, Bool)
absmask =pseudoabsencemask(SurfaceRangeEnvelope, presencelayer)
And then we have:
julia>sum(presencelayer)
265
julia> Random.seed!(42);
julia> abs = SpeciesDistributionToolkit.sample(absmask, sum(presencelayer));
julia>sum(abs) # fewer sites251
julia> Random.seed!(42);
julia> abs2 = SpeciesDistributionToolkit.sample(absmask, sum(presencelayer); replace=false);
julia>sum(abs2) # same number of sites265
The text was updated successfully, but these errors were encountered:
Two things -- I think I will rename this method to rarefy, because it's closer to what sample does conceptually.
I think the replace=true was here to use layers with fewer points that requested, but there's obviously a better way to handle this. I will open a PR later.
When generating a given number of background points (e.g. 1000),
SpeciesDistributionToolkit.sample
will often return fewer background sites (~970), which is counterintuitive when trying to generate the same number of background points as occurrence ones.This is because we use
StatsBase.sample(keys(replace(layer, false => nothing)), n; kwargs...)
internally, which usesreplace=true
by default to sample with replacement. The sampling then selects the same keys multiple times, which results in fewer background sites, especially with small layers. Usingreplace=false
fixes the issues.@tpoisot Should we make
replace=false
the default? I think the users' intent will most often be to generate a given number of background sites. Otherwise, I'll add something to the function documentation and vignette to make the fix more obvious.Here's an example adapted from the vignettes:
And then we have:
The text was updated successfully, but these errors were encountered: