Skip to content

Commit

Permalink
Harmonize figures
Browse files Browse the repository at this point in the history
  • Loading branch information
dmbates committed Sep 16, 2016
1 parent ed08ed2 commit ab841ca
Show file tree
Hide file tree
Showing 8 changed files with 98 additions and 189 deletions.
86 changes: 86 additions & 0 deletions docs/src/man/bootstrap.jmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Parametric bootstrap for linear mixed-effects models

Julia is well-suited to implementing bootstrapping and other simulation-based methods for statistical models.
The `bootstrap!` function in the [MixedModels package](https://github.com/dmbates/MixedModels.jl) provides
an efficient parametric bootstrap for linear mixed-effects models, assuming that the results of interest
from each simulated response vector can be incorporated into a vector of floating-point values.

## The parametric bootstrap

[Bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) is a family of procedures
for generating sample values of a statistic, allowing for visualization of the distribution of the
statistic or for inference from this sample of values.

A _parametric bootstrap_ is used with a parametric model, `m`, that has been fitted to data.
The procedure is to simulate `n` response vectors from `m` using the estimated parameter values
and refit `m` to these responses in turn, accumulating the statistics of interest at each iteration.

The parameters of a linear mixed-effects model as fit by the `lmm` function are the fixed-effects
parameters, `β`, the standard deviation, `σ`, of the per-observation noise, and the covariance
parameter, `θ`, that defines the variance-covariance matrices of the random effects.

For example, a simple linear mixed-effects model for the `Dyestuff` data in the [`lme4`](http://github.com/lme4/lme4)
package for [`R`](https://www.r-project.org) is fit by
```{julia;term=true}
using DataFrames, Gadfly, MixedModels
```
```{julia;echo=false;results="hidden"}
include(Pkg.dir("MixedModels", "test", "data.jl"))
```
```{julia;term=true}
show(ds) # Dyestuff data set
```
```{julia;term=true}
m1 = fit!(lmm(Yield ~ 1 + (1 | Batch), ds))
```


## Using the `bootstrap!` function

This quick explanation is provided for those who only wish to use the `bootstrap!` method and do not need
detailed explanations of how it works.
The three arguments to `bootstrap!` are the matrix that will be overwritten with the results, the model to bootstrap,
and a function that overwrites a vector with the results of interest from the model.

Suppose the objective is to obtain 100,000 parametric bootstrap samples of the estimates of the "variance
components", `σ²` and `σ₁²`, in this model. In many implementations of mixed-effects models the
estimate of `σ₁²`, the variance of the scalar random effects, is reported along with a
standard error, as if the estimator could be assumed to have a Gaussian distribution.
Is this a reasonable assumption?

A suitable function to save the results is
```{julia;term=true}
function saveresults!(v, m)
v[1] = varest(m)
v[2] = abs2(getθ(m)[1]) * v[1]
end
```
The `varest` extractor function returns the estimate of `σ²`. As seen above, the estimate of the
`σ₁` is the product of `Θ` and the estimate of `σ`. The expression `abs2(getΘ(m)[1])` evaluates to
`Θ²`. The `[1]` is necessary because the value returned by `getθ` is a vector and a scalar is needed
here.

As with any simulation-based method, it is advisable to set the random number seed before calling
`bootstrap!` for reproducibility.
```{julia;term=true;}
srand(1234321);
```
```{julia;term=true;}
results = bootstrap!(zeros(2, 100000), m1, saveresults!);
```
The results for each bootstrap replication are stored in the columns of the matrix passed in as the first
argument. A density plot of the first row using the [`Gadfly`](https://github.com/dcjones/Gadfly.jl) package
is created as
```{julia;eval=false;term=true}
plot(x = sub(results, 1, :), Geom.density(), Guide.xlabel("Parametric bootstrap estimates of σ²"))
```
```{julia;echo=false;fig_cap="Density of parametric bootstrap estimates of σ² from model m1"; fig_width=8;}
plot(x = sub(results, 1, :), Geom.density(), Guide.xlabel("Parametric bootstrap estimates of σ²"))
```
```{julia;echo=false;fig_cap="Density of parametric bootstrap estimates of σ₁² from model m1"; fig_width=8;}
plot(x = sub(results, 2, :), Geom.density(), Guide.xlabel("Parametric bootstrap estimates of σ₁²"))
```

The distribution of the bootstrap samples of `σ²` is a bit skewed but not terribly so. However, the
distribution of the bootstrap samples of the estimate of `σ₁²` is highly skewed and has a spike at
zero.
15 changes: 12 additions & 3 deletions docs/src/man/bootstrap.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ As with any simulation-based method, it is advisable to set the random number se
`bootstrap!` for reproducibility.
````julia
julia> srand(1234321);
MersenneTwister(Base.dSFMT.DSFMT_state(Int32[-1066020669,1073631810,397127531,1072701603,-312796895,1073626997,1020815149,1073320576,650048908,1073512247 -352178910,1073735534,1816227101,1072823316,-1468787611,-2121692099,358864500,-310934288,382,0]),[1.09857,1.52278,1.29205,1.58248,1.76821,1.12729,1.91324,1.13434,1.86838,1.19769 1.91228,1.82615,1.801,1.58645,1.48315,1.6551,1.08701,1.22284,1.42061,1.41889],382,UInt32[0x0012d591])
MersenneTwister(Base.dSFMT.DSFMT_state(Int32[-1066020669,1073631810,397127531,1072701603,-312796895,1073626997,1020815149,1073320576,650048908,1073512247 -352178910,1073735534,1816227101,1072823316,-1468787611,-2121692099,358864500,-310934288,382,0]),[2.11393e-315,2.11394e-315,2.11394e-315,0.0,NaN,2.11399e-315,2.11332e-315,4.24399e-314,2.11278e-315,6.36599e-314 3.95253e-323,5.06417e-321,0.0,0.0,2.11329e-315,0.0,7.90505e-323,5.06911e-321,0.0,0.0],382,UInt32[0x0012d591])
````


Expand All @@ -143,10 +143,19 @@ julia> results = bootstrap!(zeros(2, 100000), m1, saveresults!);



![](figures/bootstrap_8_1.png)
The results for each bootstrap replication are stored in the columns of the matrix passed in as the first
argument. A density plot of the first row using the [`Gadfly`](https://github.com/dcjones/Gadfly.jl) package
is created as
````julia
plot(x = sub(results, 1, :), Geom.density(), Guide.xlabel("Parametric bootstrap estimates of σ²"))
````



![Density of parametric bootstrap estimates of σ² from model m1](figures/bootstrap_9_1.png)


![Density of parametric bootstrap estimates of σ₁² from model m1](figures/bootstrap_9_1.png)
![Density of parametric bootstrap estimates of σ₁² from model m1](figures/bootstrap_10_1.png)



Expand Down
Binary file added docs/src/man/figures/bootstrap_10_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/src/man/figures/bootstrap_8_1.png
Binary file not shown.
66 changes: 0 additions & 66 deletions docs/src/man/figures/bootstrap_8_1.svg

This file was deleted.

60 changes: 0 additions & 60 deletions docs/src/man/figures/bootstrap_8_2.svg

This file was deleted.

Binary file modified docs/src/man/figures/bootstrap_9_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit ab841ca

Please sign in to comment.