# Virtual species

In this vignette, we provide a demonstration of how the different
**SpeciesDistributionToolkit** functions can be chained together to
rapidly create a virtual species, generate its range map, and sample
points from it according to the predicted suitability.

In [1]:
using SpeciesDistributionToolkit
using CairoMakie
using Statistics
CairoMakie.activate!(px_per_unit = 6.0)

We start by defining the extent in which we want to create the virtual
species. For the purpose of this example, we will use the country of
Austria, a polygon of which is available in the GADM database. Note that
the `boundingbox` function returns the coordinates *in WGS84*.

In [2]:
place = SpeciesDistributionToolkit.gadm("PRY")
extent = SpeciesDistributionToolkit.boundingbox(place)

(left = -62.642398834228516, right = -54.25859832763672, bottom = -27.60569953918457, top = -19.29520034790039)

We then download some environmental data. In this example, we use the
BioClim variables as distributed by CHELSA. In order to simplify the
code, we will only use BIO1 (mean annual temperature) and BIO12 (total
annual precipitation). Note that we collect these layers in a vector
typed as `SDMLayer{Float32}`, in order to ensure that future operations
already recevie floating point values.

In [3]:
provider = RasterData(CHELSA2, BioClim)
L = SDMLayer{Float32}[SDMLayer(provider; layer=l, extent...) for l in ["BIO1", "BIO12"]]

2-element Vector{SDMLayer{Float32}}:
 SDMLayer{Float32}(Float32[2948.0 2948.0 … 2937.0 2938.0; 2948.0 2948.0 … 2935.0 2936.0; … ; 2975.0 2975.0 … 2982.0 2981.0; 2975.0 2975.0 … 2983.0 2981.0], Bool[1 1 … 1 1; 1 1 … 1 1; … ; 1 1 … 1 1; 1 1 … 1 1], (-62.65013935825001, -54.25847272515001), (-27.60847247174999, -19.291805838349994), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[8438.0 8469.0 … 20889.0 20591.0; 8439.0 8469.0 … 20926.0 20737.0; … ; 9596.0 9575.0 … 13836.0 13811.0; 9606.0 9586.0 … 13767.0 13712.0], Bool[1 1 … 1 1; 1 1 … 1 1; … ; 1 1 … 1 1; 1 1 … 1 1], (-62.65013935825001, -54.25847272515001), (-27.60847247174999, -19.291805838349994), "+proj=longlat +datum=WGS84 +no_defs")

We now mask the layers using the polygons we downloaded initially. Here,
this is done in two steps, first the masking of the first layer, and
second the masking of all other layers. Currently unreleased versions of
the package have a shortcut for this operation.

In [4]:
rescale!.(mask!(L, place))

2-element Vector{SDMLayer{Float32}}:
 SDMLayer{Float32}(Float32[0.3857143 0.3857143 … 0.22857143 0.24285714; 0.3857143 0.3857143 … 0.2 0.21428572; … ; 0.7714286 0.7714286 … 0.87142855 0.85714287; 0.7714286 0.7714286 … 0.8857143 0.85714287], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (-62.65013935825001, -54.25847272515001), (-27.60847247174999, -19.291805838349994), "+proj=longlat +datum=WGS84 +no_defs")
 SDMLayer{Float32}(Float32[0.092917636 0.09482018 … 0.85706395 0.83877504; 0.092979014 0.09482018 … 0.8593347 0.84773535; … ; 0.16398674 0.16269793 … 0.42420524 0.42267093; 0.16460046 0.16337302 … 0.41997054 0.41659507], Bool[0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], (-62.65013935825001, -54.25847272515001), (-27.60847247174999, -19.291805838349994), "+proj=longlat +datum=WGS84 +no_defs")

In the next steps, we will generate some virtual species. These are
defined by an environmental response to each layer, linking the value of
the layer at a point to the suitability score. For the sake of
expediency, we only use logistic responses, and generate one function
for each layer (drawing $\alpha$ from a normal distribution, and $\beta$
uniformly).

In [5]:
logistic(x, α, β) = 1 / (1 + exp((x-β)/α))
logistic(α, β) = (x) -> logistic(x, α, β)
f = [logistic(randn(), rand()) for _ in eachindex(L)]

2-element Vector{var"#11#12"{Float64, Float64}}:
 #11 (generic function with 1 method)
 #11 (generic function with 1 method)

In the next step, we create a layer of suitability, by applying the
logistic function to each environmental variable layer, and taking the
product of all suitabilities:

In [6]:
S = prod([f[i].(L[i]) for i in eachindex(L)])

SDM Layer with 507665 Float64 cells
    Proj string: +proj=longlat +datum=WGS84 +no_defs
    Grid size: (998, 1007)

In order to generate the range of the species, we set a target
prevalence, and identify the quantile corresponding to this prevalence
in the suitability layer.

In [7]:
target_prevalence = 0.1626
cutoff = quantile(S, 1-target_prevalence)

0.7884535579782708

Random observations for the virtual species are generated by setting the
probability of inclusion to 0 for all values above the cutoff, and then
sampling proportionally to the suitability for all remaining points.
Note that the method is called `backgroundpoints`, as it is normally
used for pseudo-absences. The second argument of this method is the
number of points to generate.

In [8]:
presencelayer = backgroundpoints((v -> v > cutoff ? v : 0.0).(S), 59)

SDM Layer with 507665 Bool cells
    Proj string: +proj=longlat +datum=WGS84 +no_defs
    Grid size: (998, 1007)

We can finally plot the result:

In [9]:
f = Figure(size=(700, 700))
ax = Axis(f[1,1], aspect=DataAspect())
heatmap!(ax, S .> cutoff, colormap=["#cececebb", :green])
lines!(ax, place[1].geometry, color=:black)
scatter!(ax, presencelayer, color=:white, strokecolor=:black, strokewidth=2, markersize=10, label="Virtual presences")
tightlimits!(ax)
hidespines!(ax)
hidedecorations!(ax)
axislegend(ax, position=:lb, framevisible=false)
f