Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

describe is slow #3411

Closed
jariji opened this issue Dec 30, 2023 · 3 comments
Closed

describe is slow #3411

jariji opened this issue Dec 30, 2023 · 3 comments

Comments

@jariji
Copy link
Contributor

jariji commented Dec 30, 2023

julia> let xs = rand(10_000_000)
           @time describe(xs)
           end
Summary Stats:
Length:         10000000
Missing Count:  0
Mean:           0.499959
Minimum:        0.000000
1st Quartile:   0.249912
Median:         0.499909
3rd Quartile:   0.749991
Maximum:        1.000000
Type:           Float64
  2.162965 seconds (52 allocations: 76.298 MiB, 0.31% gc time)

julia> let xs = rand(10_000_000)
           @time maximum(xs)
           end
  0.006608 seconds

That describe seems pretty slow to me.

DataFrames v1.6.1

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 3900XT 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver2)
  Threads: 35 on 24 virtual cores
@jariji jariji closed this as completed Dec 30, 2023
@jariji jariji reopened this Dec 30, 2023
@jariji
Copy link
Contributor Author

jariji commented Dec 30, 2023

It's because median and quantile are slow, but perhaps this could be parallelized so we don't have to wait for each of them sequentially.

@bkamins
Copy link
Member

bkamins commented Dec 30, 2023

This is an issue for StatsBase.jl:

julia> @which describe(xs)
describe(x)
     @ StatsBase ~\.julia\packages\StatsBase\WLz8A\src\scalarstats.jl:920

@bkamins
Copy link
Member

bkamins commented Dec 30, 2023

I opened JuliaStats/StatsBase.jl#912 as it is impossible to transfer issues across organizations

@bkamins bkamins closed this as completed Dec 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants