Skip to content

extrema(skipmissing(cube)) much slower than extrema(skipmissing(readcubedata(cube))) #564

@felixcremer

Description

@felixcremer

When I am trying to get the range of the non-missing values it is much faster to read the whole dataset into memory. For only extrema without skipmissing it doesn't make a difference. These timings are on a 15000x15000 array that is on my local computer.

julia> @time extrema(skipmissing(readcubedata(clocal76)))
  0.878060 seconds (148 allocations: 891.717 MiB, 3.05% gc time)
(0x00, 0xfe)

julia> @time extrema(skipmissing(clocal76))
 56.791334 seconds (900.00 M allocations: 72.180 GiB, 18.44% gc time)
(0x00, 0xfe)

julia> @time extrema(clocal76)
  0.827030 seconds (98 allocations: 891.714 MiB, 1.68% gc time)
(missing, missing)

julia> @time extrema(readcubedata(clocal76))
  0.842818 seconds (141 allocations: 891.717 MiB, 5.14% gc time)
(missing, missing)

extrema is only the motivating example, but with a simple collect(skipmissing(cube)) I get very similar timings. I suspect that skipmissing somehow indexes into every single index.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions