Skip to content

remove CategoricalArrays dependency, for performance #2321

@vtjnash

Description

@vtjnash

Because CategoricalArrays gets loaded, due to issues such as JuliaData/CategoricalArrays.jl#177, everything you do with DataFrames can become a lot slower. For example:


With CategoricalArrays:

julia> @time begin
              @time using Plots
              @time using DelimitedFiles
              @time using HTTP
              @time using DataFrames
              @time file = HTTP.get("https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv")
              @time stats = DelimitedFiles.readdlm(IOBuffer(file.body), ',', header=true)
              @time statsdf = DataFrame(stats[1], map(Symbol, vec(stats[2])))
              @time stats2 = groupby(statsdf, :Country)
              @time ukstats=stats2[179]
              @time xy = [ukstats.Confirmed,ukstats.Deaths,ukstats.Recovered]
              @time plt = plot(ukstats.Date, xy, labels=["Confirmed" "Deaths" "Recovered"])
              @time display(plt)
              end;
  4.300588 seconds (9.47 M allocations: 622.370 MiB, 5.86% gc time)
  0.329027 seconds (588.77 k allocations: 33.346 MiB, 6.31% gc time)
  0.000152 seconds (293 allocations: 18.578 KiB)
  0.253159 seconds (745.07 k allocations: 53.301 MiB, 5.43% gc time)
  3.376627 seconds (9.46 M allocations: 539.943 MiB, 9.51% gc time)
  0.529758 seconds (2.02 M allocations: 103.027 MiB, 7.79% gc time)
  0.677973 seconds (2.58 M allocations: 172.079 MiB, 6.55% gc time)
  0.961566 seconds (4.36 M allocations: 245.538 MiB, 9.20% gc time)
  0.069549 seconds (148.21 k allocations: 10.844 MiB)
  0.034701 seconds (98.86 k allocations: 5.750 MiB)
  9.646682 seconds (37.82 M allocations: 2.197 GiB, 9.66% gc time)
  3.674751 seconds (7.06 M allocations: 423.162 MiB, 3.51% gc time)
 23.888708 seconds (74.41 M allocations: 4.358 GiB, 7.71% gc time)

Without CategoricalArrays:

  4.287277 seconds (9.47 M allocations: 622.460 MiB, 6.18% gc time)
  0.303269 seconds (588.77 k allocations: 33.346 MiB)
  0.000122 seconds (293 allocations: 18.578 KiB)
  0.123709 seconds (287.40 k allocations: 20.340 MiB, 14.98% gc time)
  2.913315 seconds (8.00 M allocations: 454.598 MiB, 7.58% gc time)
  0.586269 seconds (2.00 M allocations: 101.454 MiB, 16.35% gc time)
  0.237549 seconds (715.00 k allocations: 43.179 MiB)
  0.786341 seconds (3.92 M allocations: 212.980 MiB, 7.46% gc time)
  0.076198 seconds (121.19 k allocations: 8.256 MiB, 16.47% gc time)
  0.036821 seconds (98.81 k allocations: 5.746 MiB)
  2.549315 seconds (6.63 M allocations: 386.266 MiB, 3.65% gc time)
  3.423970 seconds (6.59 M allocations: 386.271 MiB, 2.62% gc time)
 15.357997 seconds (38.48 M allocations: 2.225 GiB, 5.56% gc time)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions