Harmonize figures

JuliaStats · Sep 16, 2016 · ab841ca · ab841ca
1 parent ed08ed2
commit ab841ca
Show file tree

Hide file tree

Showing 8 changed files with 98 additions and 189 deletions.
diff --git a/docs/src/man/bootstrap.jmd b/docs/src/man/bootstrap.jmd
@@ -0,0 +1,86 @@
+# Parametric bootstrap for linear mixed-effects models
+
+Julia is well-suited to implementing bootstrapping and other simulation-based methods for statistical models.
+The `bootstrap!` function in the [MixedModels package](https://github.com/dmbates/MixedModels.jl) provides
+an efficient parametric bootstrap for linear mixed-effects models, assuming that the results of interest
+from each simulated response vector can be incorporated into a vector of floating-point values.
+
+## The parametric bootstrap
+
+[Bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) is a family of procedures
+for generating sample values of a statistic, allowing for visualization of the distribution of the
+statistic or for inference from this sample of values.
+
+A _parametric bootstrap_ is used with a parametric model, `m`, that has been fitted to data.
+The procedure is to simulate `n` response vectors from `m` using the estimated parameter values
+and refit `m` to these responses in turn, accumulating the statistics of interest at each iteration.
+
+The parameters of a linear mixed-effects model as fit by the `lmm` function are the fixed-effects
+parameters, `β`, the standard deviation, `σ`, of the per-observation noise, and the covariance
+parameter, `θ`, that defines the variance-covariance matrices of the random effects.
+
+For example, a simple linear mixed-effects model for the `Dyestuff` data in the [`lme4`](http://github.com/lme4/lme4)
+package for [`R`](https://www.r-project.org) is fit by
+```{julia;term=true}
+using DataFrames, Gadfly, MixedModels
+```
+```{julia;echo=false;results="hidden"}
+include(Pkg.dir("MixedModels", "test", "data.jl"))
+```
+```{julia;term=true}
+show(ds)   # Dyestuff data set
+```
+```{julia;term=true}
+m1 = fit!(lmm(Yield ~ 1 + (1 | Batch), ds))
+```
+
+
+## Using the `bootstrap!` function
+
+This quick explanation is provided for those who only wish to use the `bootstrap!` method and do not need
+detailed explanations of how it works.
+The three arguments to `bootstrap!` are the matrix that will be overwritten with the results, the model to bootstrap,
+and a function that overwrites a vector with the results of interest from the model.
+
+Suppose the objective is to obtain 100,000 parametric bootstrap samples of the estimates of the "variance
+components", `σ²` and `σ₁²`, in this model.  In many implementations of mixed-effects models the
+estimate of `σ₁²`, the variance of the scalar random effects, is reported along with a
+standard error, as if the estimator could be assumed to have a Gaussian distribution.
+Is this a reasonable assumption?
+
+A suitable function to save the results is
+```{julia;term=true}
+function saveresults!(v, m)
+    v[1] = varest(m)
+    v[2] = abs2(getθ(m)[1]) * v[1]
+end
+```
+The `varest` extractor function returns the estimate of `σ²`.  As seen above, the estimate of the
+`σ₁` is the product of `Θ` and the estimate of `σ`.  The expression `abs2(getΘ(m)[1])` evaluates to
+`Θ²`. The `[1]` is necessary because the value returned by `getθ` is a vector and a scalar is needed
+here.
+
+As with any simulation-based method, it is advisable to set the random number seed before calling
+`bootstrap!` for reproducibility.
+```{julia;term=true;}
+srand(1234321);
+```
+```{julia;term=true;}
+results = bootstrap!(zeros(2, 100000), m1, saveresults!);
+```
+The results for each bootstrap replication are stored in the columns of the matrix passed in as the first
+argument.  A density plot of the first row using the [`Gadfly`](https://github.com/dcjones/Gadfly.jl) package
+is created as
+```{julia;eval=false;term=true}
+plot(x = sub(results, 1, :), Geom.density(), Guide.xlabel("Parametric bootstrap estimates of σ²"))
+```
+```{julia;echo=false;fig_cap="Density of parametric bootstrap estimates of σ² from model m1"; fig_width=8;}
+plot(x = sub(results, 1, :), Geom.density(), Guide.xlabel("Parametric bootstrap estimates of σ²"))
+```
+```{julia;echo=false;fig_cap="Density of parametric bootstrap estimates of σ₁² from model m1"; fig_width=8;}
+plot(x = sub(results, 2, :), Geom.density(), Guide.xlabel("Parametric bootstrap estimates of σ₁²"))
+```
+
+The distribution of the bootstrap samples of `σ²` is a bit skewed but not terribly so.  However, the
+distribution of the bootstrap samples of the estimate of `σ₁²` is highly skewed and has a spike at
+zero.
diff --git a/docs/src/man/bootstrap.md b/docs/src/man/bootstrap.md
@@ -127,7 +127,7 @@ As with any simulation-based method, it is advisable to set the random number se
 `bootstrap!` for reproducibility.
 ````julia
 julia> srand(1234321);
-MersenneTwister(Base.dSFMT.DSFMT_state(Int32[-1066020669,1073631810,397127531,1072701603,-312796895,1073626997,1020815149,1073320576,650048908,1073512247  …  -352178910,1073735534,1816227101,1072823316,-1468787611,-2121692099,358864500,-310934288,382,0]),[1.09857,1.52278,1.29205,1.58248,1.76821,1.12729,1.91324,1.13434,1.86838,1.19769  …  1.91228,1.82615,1.801,1.58645,1.48315,1.6551,1.08701,1.22284,1.42061,1.41889],382,UInt32[0x0012d591])
+MersenneTwister(Base.dSFMT.DSFMT_state(Int32[-1066020669,1073631810,397127531,1072701603,-312796895,1073626997,1020815149,1073320576,650048908,1073512247  …  -352178910,1073735534,1816227101,1072823316,-1468787611,-2121692099,358864500,-310934288,382,0]),[2.11393e-315,2.11394e-315,2.11394e-315,0.0,NaN,2.11399e-315,2.11332e-315,4.24399e-314,2.11278e-315,6.36599e-314  …  3.95253e-323,5.06417e-321,0.0,0.0,2.11329e-315,0.0,7.90505e-323,5.06911e-321,0.0,0.0],382,UInt32[0x0012d591])
 ````
 
 
@@ -143,10 +143,19 @@ julia> results = bootstrap!(zeros(2, 100000), m1, saveresults!);
 
 
 
-![](figures/bootstrap_8_1.png)
+The results for each bootstrap replication are stored in the columns of the matrix passed in as the first
+argument.  A density plot of the first row using the [`Gadfly`](https://github.com/dcjones/Gadfly.jl) package
+is created as
+````julia
+plot(x = sub(results, 1, :), Geom.density(), Guide.xlabel("Parametric bootstrap estimates of σ²"))
+````
+
+
+
+![Density of parametric bootstrap estimates of σ² from model m1](figures/bootstrap_9_1.png)
 
 
-![Density of parametric bootstrap estimates of σ₁² from model m1](figures/bootstrap_9_1.png)
+![Density of parametric bootstrap estimates of σ₁² from model m1](figures/bootstrap_10_1.png)
 
 
 

diff --git a/docs/src/man/figures/bootstrap_10_1.png b/docs/src/man/figures/bootstrap_10_1.png
diff --git a/docs/src/man/figures/bootstrap_8_1.png b/docs/src/man/figures/bootstrap_8_1.png
diff --git a/docs/src/man/figures/bootstrap_8_1.svg b/docs/src/man/figures/bootstrap_8_1.svg
diff --git a/docs/src/man/figures/bootstrap_8_2.svg b/docs/src/man/figures/bootstrap_8_2.svg
diff --git a/docs/src/man/figures/bootstrap_9_1.png b/docs/src/man/figures/bootstrap_9_1.png