|Project Status||Documentation||Build Status|
Over the next 2 months I'm planning to update StatisticalRethinking.jl to reflect the changes in the 2nd edition of the book. At the same time (but this will likely take longer) I'll also expand coverage of chapters 5 and beyond.
Version 0.9.0 introduced model simulations based on DynamicHMC, e.g. the nice explanation of HMC (and NUTS) in chapter 9.
In version 1.0 I plan to switch to predominantly use DynamicHMC but I'm still experimenting with a useful replacement for quap().
StanModels will be updated to use the new suite of packages StanSample.jl, StanOptimize.jl, StanVariational.jl, etc. (all modeled after Tamas Papp's StanDump.jl, StanRun.jl and StanSamples.jl).
Documentation will also change substantially. I no longer plan to generate and store notebook (and chapter) versions as part of the documentation. If a user is interested to use the notebook versions they can be generated. This also has consequences for testing.
Figures will be stored in the chapter directories. This is one of the reasons why I am planning this change. I have noticed that the quality of the figures as generated and included by Literate.jl are less than optimal.
Towards the end of this year I also plan to update TuringModels.jl based on the new AdvancedHMC option.
At the meantime time, Chris Fisher has made tremendous progress with MCMCBenchmarks.jl, which compares three NUTS mcmc options.
This package contains Julia versions of selected code snippets and mcmc models contained in the R package "rethinking" associated with the book Statistical Rethinking by Richard McElreath.
This package is part of the broader StatisticalRethinkingJulia Github organization.
In the book and associated R package
rethinking, statistical models are defined as illustrated below:
flist <- alist( height ~ dnorm( mu , sigma ) , mu <- a + b*weight , a ~ dnorm( 156 , 100 ) , b ~ dnorm( 0 , 10 ) , sigma ~ dunif( 0 , 50 ) )
Posterior values can be approximated by
# Simulate quadratic approximation (for simpler models) m4.31 <- quad(flist, data=d2)
or generated using Stan by:
# Generate a Stan model and run a simulation m4.32 <- ulam(flist, data=d2)
The author of the book states: "If that (the statistical model) doesn't make much sense, good. ... you're holding the right textbook, since this book teaches you how to read and write these mathematical descriptions" (page 77).
StatisticalRethinkingJulia is intended to allow experimenting with this learning process using four available mcmc options in Julia:
A secondary objective of
StatisticalRethinkingJulia is to compare definition and execution of a variety of models in the above four mcmc packages.
As stated many times by the author in his online lectures, this package is not intended to take away the hands-on component of the course. The clips are just meant to get you going but learning means experimenting, in this case using Julia.
At least one other package (Klara) is available for mcmc in Julia. Time constraints prevented this option to be included in
StatisticalRethinkingJulia. For similar reasons, the number of models implemented in MambaModels is very limited.
Layout of the package
Instead of having all snippets in a single file, the snippets are organized by chapter and grouped in clips by related snippets. E.g. chapter 0 of the R package has snippets 0.1 to 0.5. Those have been combined into 2 clips:
clip-01-03.jl- contains snippets 0.1 through 0.3
clip-04-05.jl- contains snippets 0.4 and 0.5.
These 2 files are in scripts/00 and later on processed by Literate.jl to create 3 derived versions, e.g. from
clip_01_03.jl in scripts/00:
clip-01-03.md- included in the documentation
clip-01-03.ipynb- stored in the notebooks/chapter directory
clip-01-03.jl- stored in the chapters/chapter directory
Occasionally lines in scripts are suppressed when Literate processes input source files, e.g. in Turing scripts the statement
#nb Turing.turnprogress(false); is only inserted in the generated notebook but not in the corresponding chapter .jl script. Similarly
#src ... will only be included in the .jl scripts in the chapters subdirectories.
A single snippet clip will be referred to as
Models with names such as
04/m4.5d.jl generate mcmc samples using Turing.jl, CmdStan.jl, Mamba.jl or DynamicHMC.jl respectively. In some cases the results of the mcmc chains have been stored and retrieved (or regenerated if missing) in other clips, e.g.
- STABLE — documentation of the most recently tagged version.
- DEVEL — documentation of the in-development version.
Richard Torkar has taken the lead in developing the Turing versions of the models in chapter 8 and subsequent chapters.
Tamas Papp has also been very helpful during the development og the DynamicHMC versions of the models.
The TuringLang team and #turing contributors on Slack have been extremely helpful! The Turing examples by Cameron Pfiffer are followed closely in several example scripts.
The documentation has been generated using Literate.jl and Documenter.jl based on several ideas demonstrated by Tamas Papp in DynamicHMCExamples.jl.
Questions and issues
Question and contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems or have a question.
Versions & notes
rethinking must have been an on-going process over several years,
StatisticalRethinkinh.jl will likely follow a similar path.
The initial version (v1) of
StatisticalRethinkingis really just a first attempt to capture the models and show ways of setting up those models, execute the models and post-process the results using Julia.
As mentioned above, a second objective of v1 is to experiment and compare the four selected mcmc options in Julia in terms of results, performance, ease of expressing models, etc.
The R package
rethinking, in the experimental branch on Github, contains 2 functions
map2stan) which are not in v1 of
Statisticalrethinking.jl. It is my intention to study those and possibly include something similar to
ulam(or both) in a future of
clip-02-05.jlan inital example of using the
maximum_a_posterioriestimate and associated quadratic (Normal) approximation is illustrated.
Several other interesting approaches that could become a good basis for such an endeavour are being explored in Julia, e.g. Soss.jl and Omega.jl.
Many other R functions such as precis(), link(), shade(), etc. are not in v1, although some very early versions are being tested. Expect significant refactoring of those in future versions and at the same time better integration with MCMCChains.Chains objects.
The Mamba examples should really use
@everywhere using Mambain stead of
using Mamba. This was done to get around a limitation in Literate.jl to test the notebooks when running in distributed mode.
srcdirectory of all packages is a file scriptentry.jl which defines an object
script_dictwhich is used to control the generation of documentation, notebooks and .jl scripts in chapters and testing of the notebooks. See
?ScriptEntryor enter e.g.
script_dict["02"]in the REPL. In the model packages this file is suffixed by an indication of the used mcmc option. e.g.
A utility function, generate() is part of each package to regenerate notebooks and chapter scripts, please see ?generate. Again, e.g.
generate_tin TuringModels generates all model notebooks and chapter scripts for Turing models.
In a similar fashion, borrowed from DynamicHMCExamples I define several variations on
rel_path(). By itself,
rel_path()points at the scr directory of StatisticalRethinking.jl and e.g.
rel_path_s()points to the src directory of StanModels. The
rel_path()version is typically used to read in data files. All others are used to locate directorres to read from or store generated files into.