Skip to content

Fixed Effect Solver Incorrectly Demeaning  #92

@joe5saia

Description

@joe5saia

I think I may have stumbled across a bug. If you run the code below, you'll see that for a small number of groups for a single fixed effect, the function partial_out correctly demeans the variables within the fixed effect group and the fixed effect regression returns the correct coefficient. For large number of groups the demeaning and regression is incorrect. Interestingly the demeaning returns the same values for any number of groups. The switch over happens when there is more than 15 groups within the fixed effect. I assume this is actually a problem within FixedEffects.jl. If you have any suggestions for where to start digging I'm happy to poke around or if you can tell me if I'm using this incorrectly. Thanks!

using DataFrames, FixedEffectModels, Distributions, Random
# m is the number of groups
function test(m)
    Random.seed!(123)
    n = 1000 # number of obs per individuals
    ids = repeat(1:m, inner=n)
    fes = repeat(rand(Normal(0,1), m), inner=n)
    x = rand(Normal(), m * n) .+ fes
    y = 2*x .+ fes .+ rand(m * n)
    df = DataFrame(x=x, y=y, ids=ids, fes=fes)
    df2 = partial_out(df, @formula(y + x + fes ~ fe(ids)))[1]
    df2[!, :ids] = df[!, :ids]
    println(aggregate(groupby(df2,:ids), mean))
    println(reg(df, @formula(y~x+fe(ids))))
end

# Variables should be mean 0 and coef on x should equal 2
test(5) #correct
test(15) # correct
test(16) # wrong
test(50) # wrong but = test(16) 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions