Skip to content

Performance regression #97

@jmboehm

Description

@jmboehm

I have replicated the problem I was having in #95 in a smaller example.

On

DataFrames v0.19.4
FixedEffectModels v0.7.4
FixedEffects v0.1.2

running

using Random, DataFrames, FixedEffectModels, BenchmarkTools

Random.seed!(0)
n = 5_000_000
df = DataFrame(y = rand(Float64, n), x = rand(Float64, n), z = rand(Float64, n), fe1 = rand(collect(0:1_000), n), fe2 = rand(collect(0:1_000), n))
df[:fe1] = categorical(df[:fe1]);
df[:fe2] = categorical(df[:fe2]);
@btime rr1s1 = reg(df, @model(y ~ x , fe = fe1*z + fe2*z), save = :residuals)

yields an benchmark time of 2.223s

On the other hand, on

DataFrames v0.20.2
FixedEffectModels v0.10.5
FixedEffects v0.7.2

running

using Random, DataFrames, FixedEffectModels, BenchmarkTools
Random.seed!(0)
n = 5_000_000
df = DataFrame(y = rand(Float64, n), x = rand(Float64, n), z = rand(Float64, n), fe1 = rand(collect(0:1_000), n), fe2 = rand(collect(0:1_000), n))
df[:fe1] = categorical(df[:fe1]);
df[:fe2] = categorical(df[:fe2]);
@btime rr1s1 = reg(df, @formula(y ~ x + fe(fe1)*z + fe(fe2)*z), save = :residuals)

yields a benchmark time of 2.95s.

Both are with nthreads of 4 (it's similar with 1).

I've also tried this on larger examples (500_000 and 10_000 levels of fe1 and fe2 respectively) and the time differences are similar, 350s vs 495s.

Any sense of what the reason could be?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions