Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving WithinRadius performance #196

Merged
merged 6 commits into from
Aug 26, 2023
Merged

Conversation

gottacatchenall
Copy link
Member

@gottacatchenall gottacatchenall commented Aug 18, 2023

Post compilation, this takes this example for generating pseudoabsences with WithinRadius copied from that docs from ~120 seconds to ~20 seconds.

It uses a single mask for removing sites within the provided distance. This ends up being way faster than the repeated filter calls, which are particularly slow because comparing the (long,lat) float pair is much slower than comparing cartesian indices.

This scales much better, e.g. on a separate dataset with ~4000 occurrences, the old method took ~30 mins and the new one takes ~30 seconds.

using SpeciesDistributionToolkit

spatial_extent = (left = 3.0, bottom = 55.2, right = 19.7, top = 64.9)
rangifer = taxon("Rangifer tarandus tarandus"; strict = false)
query = [
    "occurrenceStatus" => "PRESENT",
    "hasCoordinate" => true,
    "decimalLatitude" => (spatial_extent.bottom, spatial_extent.top),
    "decimalLongitude" => (spatial_extent.left, spatial_extent.right),
    "limit" => 300,
]
presences = occurrences(rangifer, query...)
for i in 1:3
    occurrences!(presences)
end

dataprovider = RasterData(CHELSA1, BioClim)
temperature = 0.1SimpleSDMPredictor(dataprovider; layer = "BIO1", spatial_extent...)

presencelayer = mask(temperature, presences, Bool)
@time background = pseudoabsencemask(WithinRadius, presencelayer; distance = 120.0)

@github-actions
Copy link
Contributor

@tpoisot
Copy link
Member

tpoisot commented Aug 18, 2023

I will go and find tomatillos for you, thank you. Gonna wait until the doc builds, and I'll merge this.

@tpoisot
Copy link
Member

tpoisot commented Aug 18, 2023

@gottacatchenall
Copy link
Member Author

I'll take a look this weekend

@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2023

Codecov Report

Patch coverage has no change and project coverage change: -1.05% ⚠️

Comparison is base (dd9248f) 53.36% compared to head (ad99421) 52.32%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #196      +/-   ##
==========================================
- Coverage   53.36%   52.32%   -1.05%     
==========================================
  Files          59       38      -21     
  Lines        1692     1034     -658     
==========================================
- Hits          903      541     -362     
+ Misses        789      493     -296     
Flag Coverage Δ
unittests 52.32% <ø> (-1.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 21 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gottacatchenall
Copy link
Member Author

This now builds the figure, but on of the doc blocks fails:

bgpoints = SpeciesDistributionToolkit.sample(
    bgmask,
    cellsize(bgmask),
    floor(Int, 0.5sum(presencelayer)),
)

  value = 
    MethodError: no method matching StatsBase.Weights(::Vector{Union{Nothing, Float64}})
    Closest candidates are:
      StatsBase.Weights(!Matched::AbstractVector{<:Real}) at ~/.julia/packages/StatsBase/iMkPf/src/weights.jl:21
      StatsBase.Weights(!Matched::AbstractVector{var"#16#T"}, !Matched::var"#15#S") where {var"#15#S"<:Real, var"#16#T"<:Real} at ~/.julia/packages/StatsBase/iMkPf/src/weights.jl:20

@tpoisot
Copy link
Member

tpoisot commented Aug 24, 2023

yeah for some reasons cellsize sometimes returns nothing, can you drop this block for now and then I'll merge?

@tpoisot
Copy link
Member

tpoisot commented Aug 26, 2023

working on it

@tpoisot tpoisot merged commit 9120a10 into main Aug 26, 2023
@tpoisot tpoisot deleted the mdc/withinradius_performance branch August 26, 2023 19:07
@gottacatchenall
Copy link
Member Author

I think we still need OffsetArrays as a dependency, this line

radius_mask = OffsetArrays.OffsetArray(

fails when building the current docs on main

@tpoisot
Copy link
Member

tpoisot commented Aug 26, 2023

Ah, because it's a top level function... Yep, the dep needs to be in the main package, and so the version change needs to be there too

@gottacatchenall
Copy link
Member Author

gottacatchenall commented Aug 26, 2023

Off the cuff idea: pseudoabsence models included with Fauxcurrences.jl as a rebranded PA package

@gottacatchenall gottacatchenall restored the mdc/withinradius_performance branch August 29, 2023 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants