Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulate! and thus parametricbootstrap for GLMM #418

Merged
merged 23 commits into from Oct 29, 2020
Merged

Conversation

palday
Copy link
Member

@palday palday commented Oct 13, 2020

Closes #245.

I had to add in a few accessor-to-preallocated-buffer methods for GLMM elsewhere (fixef!, etc.).
stderror! actually does a hidden allocation, but I'll fix that as soon as I've looked up StatsBase.stderror.

There's also still some debugging code in there, but it should otherwise be in a good state to check for conceptual errors.

julia> using DataFrames

julia> using MixedModels

julia> using StableRNGs

julia> contra = MixedModels.dataset(:contra)
Arrow.Table: (dist = ["D01", "D01", "D01", "D01", "D01", "D01", "D01", "D01", "D01", "D01"  …  "D61", "D61", "D61", "D61", "D61", "D61", "D61", "D61", "D61", "D61"], urban = ["Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y"  …  "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"], livch = ["3+", "0", "2", "3+", "0", "0", "3+", "3+", "1", "3+"  …  "1", "3+", "3+", "2", "2", "3+", "2", "3+", "0", "3+"], age = [18.44, -5.56, 1.44, 8.44, -13.56, -11.56, 18.44, -3.56, -5.56, 1.44  …  -5.56, 14.44, 19.44, -9.56, -2.56, 14.44, -4.56, 14.44, -13.56, 10.44], use = ["N", "N", "N", "N", "N", "N", "N", "N", "N", "N"  …  "N", "N", "N", "Y", "N", "N", "N", "N", "N", "N"])

julia> gm0 = fit(MixedModel, @formula(use ~ 1+age+abs2(age)+urban+livch+(1|urban&dist)), contra, Bernoulli(), fast=true);

julia> bs = parametricbootstrap(StableRNG(42), 1000, deepcopy(gm0));
Progress: 100%|███████████████████████████████████████ Time: 0:00:14

julia> combine(groupby(DataFrame(bs.β), :coefname), :β => first ∘ shortestcovint => :lower, :β => last ∘ shortestcovint => :upper)
7×3 DataFrame
│ Row │ coefname    │ lower       │ upper       │
│     │ Symbol      │ Float64     │ Float64     │
├─────┼─────────────┼─────────────┼─────────────┤
│ 1   │ (Intercept) │ -1.36078    │ -0.671704   │
│ 2   │ age         │ -0.01329    │ 0.0218545   │
│ 3   │ abs2(age)   │ -0.00565161 │ -0.00294269 │
│ 4   │ urban: Y    │ 0.439763    │ 1.09529     │
│ 5   │ livch: 1    │ 0.495756    │ 1.11646     │
│ 6   │ livch: 2    │ 0.493062    │ 1.23956     │
│ 7   │ livch: 3+   │ 0.549297    │ 1.23275     │

# CIs from the normal Wald approximation for comparison
julia> DataFrame(coef=fixefnames(gm0), lower=fixef(gm0) .- 2 .* stderror(gm0), upper=fixef(gm0) .+ 2 .* stderror(gm0))
7×3 DataFrame
│ Row │ coef        │ lower       │ upper       │
│     │ String      │ Float64     │ Float64     │
├─────┼─────────────┼─────────────┼─────────────┤
│ 1   │ (Intercept) │ -1.39311    │ -0.667259   │
│ 2   │ age         │ -0.0153779  │ 0.0218421   │
│ 3   │ abs2(age)   │ -0.00581783 │ -0.00289668 │
│ 4   │ urban: Y    │ 0.407463    │ 1.09011     │
│ 5   │ livch: 1    │ 0.486554    │ 1.14029     │
│ 6   │ livch: 2    │ 0.516467    │ 1.26265     │
│ 7   │ livch: 3+   │ 0.527929    │ 1.27843     │

@codecov
Copy link

codecov bot commented Oct 13, 2020

Codecov Report

Merging #418 into master will decrease coverage by 0.08%.
The diff coverage is 90.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #418      +/-   ##
==========================================
- Coverage   94.31%   94.22%   -0.09%     
==========================================
  Files          23       23              
  Lines        1637     1681      +44     
==========================================
+ Hits         1544     1584      +40     
- Misses         93       97       +4     
Impacted Files Coverage Δ
src/linearmixedmodel.jl 94.23% <ø> (-0.02%) ⬇️
src/simulate.jl 89.55% <87.50%> (-1.88%) ⬇️
src/generalizedlinearmixedmodel.jl 87.09% <91.66%> (+0.54%) ⬆️
src/bootstrap.jl 98.13% <100.00%> (+0.03%) ⬆️
src/mixedmodel.jl 81.81% <100.00%> (+0.86%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52ca2ca...137cce7. Read the comment docs.

@palday
Copy link
Member Author

palday commented Oct 14, 2020

I need to fix the Gaussian with non-identity link pathway.

@palday palday requested a review from dmbates October 15, 2020 10:30
Copy link
Collaborator

@dmbates dmbates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the StableRNGs package in the tests.

I think it will be necessary to consider the scaling in building η in the simulate! method. I put a comment there. I'm pretty sure that using sdest applied to the penalized weighted least squares structure for the working response is not correct.

test/bootstrap.jl Outdated Show resolved Hide resolved
src/simulate.jl Show resolved Hide resolved
src/simulate.jl Outdated Show resolved Hide resolved
src/simulate.jl Outdated
end

# scale by lm.σ and add fixed-effects contribution
BLAS.gemv!('N', one(T), lm.X, β, lm.σ, η)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that lm.σ is the correct multiple here? It just calls sdest(lm) which calls varest(m) which is based on pwrss(m). And that value is not necessarily related to any scale parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my deep dive into the GLMM code has highlighted this to me. It 'works' here because sigma is usually close to 1 for families without a dispersion parameter and I think unit scaling is correct.

@palday
Copy link
Member Author

palday commented Oct 18, 2020

The failure on nightly looks unrelated:

   MethodError: propertynames(::LinearMixedModel{Float64}, ::Bool) is ambiguous. Candidates:
    propertynames(m::LinearMixedModel, private) in MixedModels at /home/runner/work/MixedModels.jl/MixedModels.jl/src/linearmixedmodel.jl:564
    propertynames(x, private::Bool) in Base at reflection.jl:1530

@palday
Copy link
Member Author

palday commented Oct 18, 2020

@dmbates Care to take a second look?

src/bootstrap.jl Outdated
@@ -290,7 +304,7 @@ function tidyσs(bsamp::MixedModelBootstrap{T}) where {T}
)
for (iter, r) in enumerate(bstr)
setθ!(bsamp, iter) # install r.θ in λ
σ = r.σ
σ = r.σ === missing ? 1 : r.σ
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written as

σ = coalesce(r.σ, one(T))

to keep type consistency and more in the spirit of the missing design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for the tip!

Copy link
Collaborator

@dmbates dmbates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor suggestion of using coalesce instead of x === missing ? ...

@palday palday merged commit 0bf61a2 into master Oct 29, 2020
@palday palday deleted the pa/glmmsimulate branch October 29, 2020 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

simulate! methods for GLMM
2 participants