Improving WithinRadius performance #196

gottacatchenall · 2023-08-18T16:14:47Z

Post compilation, this takes this example for generating pseudoabsences with WithinRadius copied from that docs from ~120 seconds to ~20 seconds.

It uses a single mask for removing sites within the provided distance. This ends up being way faster than the repeated filter calls, which are particularly slow because comparing the (long,lat) float pair is much slower than comparing cartesian indices.

This scales much better, e.g. on a separate dataset with ~4000 occurrences, the old method took ~30 mins and the new one takes ~30 seconds.

using SpeciesDistributionToolkit

spatial_extent = (left = 3.0, bottom = 55.2, right = 19.7, top = 64.9)
rangifer = taxon("Rangifer tarandus tarandus"; strict = false)
query = [
    "occurrenceStatus" => "PRESENT",
    "hasCoordinate" => true,
    "decimalLatitude" => (spatial_extent.bottom, spatial_extent.top),
    "decimalLongitude" => (spatial_extent.left, spatial_extent.right),
    "limit" => 300,
]
presences = occurrences(rangifer, query...)
for i in 1:3
    occurrences!(presences)
end

dataprovider = RasterData(CHELSA1, BioClim)
temperature = 0.1SimpleSDMPredictor(dataprovider; layer = "BIO1", spatial_extent...)

presencelayer = mask(temperature, presences, Bool)
@time background = pseudoabsencemask(WithinRadius, presencelayer; distance = 120.0)

github-actions · 2023-08-18T16:15:01Z

Documentation for this PR at https://poisotlab.github.io/SpeciesDistributionToolkit.jl/previews/PR196/

tpoisot · 2023-08-18T16:24:24Z

I will go and find tomatillos for you, thank you. Gonna wait until the doc builds, and I'll merge this.

tpoisot · 2023-08-18T17:05:49Z

This doesn't seem to work: https://poisotlab.github.io/SpeciesDistributionToolkit.jl/previews/PR196/vignettes/integration/02_pseudo_absences.html

gottacatchenall · 2023-08-18T20:33:48Z

I'll take a look this weekend

codecov-commenter · 2023-08-18T21:01:30Z

Codecov Report

Patch coverage has no change and project coverage change: -1.05% ⚠️

Comparison is base (dd9248f) 53.36% compared to head (ad99421) 52.32%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #196      +/-   ##
==========================================
- Coverage   53.36%   52.32%   -1.05%     
==========================================
  Files          59       38      -21     
  Lines        1692     1034     -658     
==========================================
- Hits          903      541     -362     
+ Misses        789      493     -296

Flag	Coverage Δ
unittests	`52.32% <ø> (-1.05%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 21 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gottacatchenall · 2023-08-23T18:28:46Z

This now builds the figure, but on of the doc blocks fails:

bgpoints = SpeciesDistributionToolkit.sample(
    bgmask,
    cellsize(bgmask),
    floor(Int, 0.5sum(presencelayer)),
)

  value = 
    MethodError: no method matching StatsBase.Weights(::Vector{Union{Nothing, Float64}})
    Closest candidates are:
      StatsBase.Weights(!Matched::AbstractVector{<:Real}) at ~/.julia/packages/StatsBase/iMkPf/src/weights.jl:21
      StatsBase.Weights(!Matched::AbstractVector{var"#16#T"}, !Matched::var"#15#S") where {var"#15#S"<:Real, var"#16#T"<:Real} at ~/.julia/packages/StatsBase/iMkPf/src/weights.jl:20

tpoisot · 2023-08-24T13:28:45Z

yeah for some reasons cellsize sometimes returns nothing, can you drop this block for now and then I'll merge?

tpoisot · 2023-08-26T18:43:34Z

working on it

gottacatchenall · 2023-08-26T19:53:04Z

I think we still need OffsetArrays as a dependency, this line

SpeciesDistributionToolkit.jl/src/pseudoabsences.jl

Line 114 in b121124

radius_mask = OffsetArrays.OffsetArray(

fails when building the current docs on main

tpoisot · 2023-08-26T22:04:04Z

Ah, because it's a top level function... Yep, the dep needs to be in the main package, and so the version change needs to be there too

gottacatchenall · 2023-08-26T22:42:40Z

Off the cuff idea: pseudoabsence models included with Fauxcurrences.jl as a rebranded PA package

gottacatchenall added 2 commits August 18, 2023 11:51

sliding mask for WithinRadius

0a416ad

slightly clearer variable names

876188f

gottacatchenall requested a review from tpoisot August 18, 2023 16:17

gottacatchenall added 2 commits August 18, 2023 16:40

diagnosis 📝

9a7b97c

light (and incomplete) refactoring

9d5d48c

resolving docs bug 📝

f6631da

semver(layers): offset arrays version

ad99421

tpoisot merged commit 9120a10 into main Aug 26, 2023

tpoisot deleted the mdc/withinradius_performance branch August 26, 2023 19:07

gottacatchenall restored the mdc/withinradius_performance branch August 29, 2023 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving WithinRadius performance #196

Improving WithinRadius performance #196

gottacatchenall commented Aug 18, 2023 •

edited

Loading

github-actions bot commented Aug 18, 2023

tpoisot commented Aug 18, 2023

tpoisot commented Aug 18, 2023

gottacatchenall commented Aug 18, 2023

codecov-commenter commented Aug 18, 2023 •

edited

Loading

gottacatchenall commented Aug 23, 2023

tpoisot commented Aug 24, 2023

tpoisot commented Aug 26, 2023

gottacatchenall commented Aug 26, 2023

tpoisot commented Aug 26, 2023

gottacatchenall commented Aug 26, 2023 •

edited

Loading

Improving WithinRadius performance #196

Improving WithinRadius performance #196

Conversation

gottacatchenall commented Aug 18, 2023 • edited Loading

github-actions bot commented Aug 18, 2023

tpoisot commented Aug 18, 2023

tpoisot commented Aug 18, 2023

gottacatchenall commented Aug 18, 2023

codecov-commenter commented Aug 18, 2023 • edited Loading

Codecov Report

gottacatchenall commented Aug 23, 2023

tpoisot commented Aug 24, 2023

tpoisot commented Aug 26, 2023

gottacatchenall commented Aug 26, 2023

tpoisot commented Aug 26, 2023

gottacatchenall commented Aug 26, 2023 • edited Loading

gottacatchenall commented Aug 18, 2023 •

edited

Loading

codecov-commenter commented Aug 18, 2023 •

edited

Loading

gottacatchenall commented Aug 26, 2023 •

edited

Loading