Skip to content

datavizpyr/BDA_course_Aalto

 
 

Repository files navigation

Bayesian Data Analysis course material

This repository has course material for Bayesian Data Analysis course at Aalto (CS-E5710). Aalto students should check also MyCourses announcements.

The material will be updated during the course. Exercise instructions and slides will be updated at latest on Monday of the corresponding week. The best way to stay updated is to clone the repo and pull before checking new material. If you don't want to learn git and can't find the Download ZIP link, click here.

Prerequisites

This course has been designed so that there is strong emphasis in computational aspects of Bayesian data analysis and using the latest computational tools.

If you find BDA3 too difficult to start with, I recommend

Assessment

Exercises (67%) and a project work (33%). Minimum of 50% of points must be obtained from both the exercises and project work.

Course contents following BDA3

Bayesian Data Analysis, 3rd ed, by by Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. Home page for the book. Errata for the book. Electronic edition for non-commercial purposes only.

How to study

Recommended way to go through the material is

  • Read the reading instructions for a chapter in chapter_notes.
  • Read the chapter in BDA3 and check that you find the terms listed in the reading instructions.
  • Watch the corresponding lecture video to get explanations for most important parts.
  • Read corresponding additional information in the chapter notes.
  • Run the corresponding demos in R demos or Python demos.
  • Read the exercise instructions and make the corresponding exercises. Demo codes in R demos and Python demos have a lot of useful examples for handling data and plotting figures. If you have problems, visit TA sessions or ask in course slack channel.
  • If you want to learn more, make also self study exercises listed below

Slides and chapter notes

  • Slides
    • including code for reproducing some of the figures
  • Chapter notes
    • including reading instructions highlighting most important parts and terms

Text licensed under CC-BY-NC 4.0. Code licensed under BSD-3.

Videos

The following video motivates why computational probabilistic methods and probabilistic programming are important part of modern Bayesian data analysis.

Short video clips on selected introductory topics are available in a Panopto folder and listed below.

2019 fall lecture videos are in a Panopto folder and listed below.

  • Lecture 2.1 and Lecture 2.2 on basics of Bayesian inference, observation model, likelihood, posterior and binomial model, predictive distribution and benefit of integration, priors and prior information, and one parameter normal model.
  • Lecture 3 on multiparameter models, joint, marginal and conditional distribution, normal model, bioassay example, grid sampling and grid evaluation.
  • Lecture 4.1 on numerical issues, Monte Carlo, how many simulation draws are needed, how many digits to report, and Lecture 4.2 on direct simulation, curse of dimensionality, rejection sampling, and importance sampling.
  • Lecture 5.1 on Markov chain Monte Carlo, Gibbs sampling Metropolis algorithm, and Lecture 5.2 on warm-up, convergence diagnostics, R-hat, and effective sample size.
  • Lecture 6.1 on HMC, NUTS, dynamic HMC and HMC specific convergence diagnostics, and Lecture 6.2 on probabilistic programming and Stan.
  • Lecture 7.1 on hierarchical models, and Lecture 7.2 on exchangeability.
  • Project work info
  • Lecture 8.1 on model checking, and Lecture 8.2 on cross-validation part 1.
  • Lecture 9.1 PSIS-LOO and K-fold-CV, Lecture 9.2 model comparison and selection, and Lecture 9.3 extra lecture on variable selection with projection predictive variable selection.
  • Lecture 10.1 on decision analysis
  • Project presentation info
  • Lecture 11.1 on normal approximation (Laplace approximation) and Lecture 11.2 on large sample theory and counter examples.
  • Lecture 12.1 on frequency evaluation, hypothesis testing and variable selection and Lecture 12.2 overview of modeling data collection (Ch8), linear models (Ch. 14-18), lasso, horseshoe and Gaussian processes (Ch 21).

R and Python

We strongly recommend using R in the course as there are more packages for Stan and statistical analysis in R. If you are already fluent in Python, but not in R, then using Python may be easier, but it can still be more useful to learn also R. Unless you are already experienced and have figured out your preferred way to work with R, we recommend installing RStudio Desktop. TAs will provide brief introduction to use of RStudio during the first week TA sessions. See FAQ for frequently asked questions about R problems in this course. The demo codes linked below provide useful starting points for all the exercises. If you are interested in learning more about making nice figures in R, I recommend Kieran Healy's "Data Visualization - A practical introduction".

Demos

Self study exercises

Good self study exercises for this course are listed below. Most of these have also model solutions vailable.

  • 1.1-1.4, 1.6-1.8 (model solutions for 1.1-1.6)
  • 2.1-2.5, 2.8, 2.9, 2.14, 2.17, 2.22 (model solutions for 2.1-2.5, 2.7-2.13, 2.16, 2.17, 2.20, and 2.14 is in slides)
  • 3.2, 3.3, 3.9 (model solutions for 3.1-3.3, 3.5, 3.9, 3.10)
  • 4.2, 4.4, 4.6 (model solutions for 3.2-3.4, 3.6, 3.7, 3.9, 3.10)
  • 5.1, 5.2 (model solutions for 5.3-5.5, 5.7-5.12)
  • 6.1 (model solutions for 6.1, 6.5-6.7)
  • 9.1
  • 10.1, 10.2 (model solution for 10.4)
  • 11.1 (model solution for 11.1)

Stan

Extra reading

Finnish terms

Sanasta "bayesilainen" esiintyy Suomessa muutamaa erilaista kirjoitustapaa. Muoto "bayesilainen" on muodostettu yleisen vieraskielisten nimien taivutussääntöjen mukaan

"Jos nimi on kirjoitettuna takavokaalinen mutta äännettynä etuvokaalinen, kirjoitetaan päätteseen tavallisesti takavokaali etuvokaalin sijasta, esim. Birminghamissa, Thamesilla." Terho Itkonen, Kieliopas, 6. painos, Kirjayhtymä, 1997.

Frequently Asked Questions (FAQ)

We now have an FAQ for the exercises here. Has solutions to commonly asked questions related RStudio setup, errors during package installations, etc.

Important dates for 2019 fall

Task Topic Published Deadline Points
Assignment 1 Background 9.9 (week 37) 15.9 at 23:59 3
Assignment 2 Chapters 1 and 2 16.9 (week 38) 22.9 at 23:59 3
Assignment 3 Chapters 2 and 3 23.9 (week 39) 29.9 at 23:59 9
Assignment 4 Chapters 3 and 10 30.9 (week 40) 6.10 at 23:59 6
Assignment 5 Chapters 10 and 11 7.10 (week 41) 13.10 at 23:59 6
Assignment 6 Chapters 10-12 and Stan 14.10 (week 42) 27.10 at 23:59 6
Evaluation week (21-28.10)
Project Projects introduced: form a group of 1-3 (2 is preferred) 28.10 (week 44) 3.11 at 23:59 -
Assignment 7 Chapter 5 28.10 (week 44) 3.11 at 23:59 6
Project Decide topic and start the project (no assign. on week 45) 10.11 at 23:59 -
Assignment 8 Chapter 7 11.11 (week 46) 17.11 at 23:59 6
Assignment 9 Chapter 9 18.11 (week 47) 24.11 at 23:59 3
Project Finish the project work (no assign. on weeks 48 & 49) 8.12 at 23:59 24
Project presentation Present project work during week 50 (evaluation week)

Acknowledgements

The course material has been greatly improved by the previous and current course assistants (in alphabetical order): Michael Riis Andersen, Paul Bürkner, Akash Dakar, Alejandro Catalina, Kunal Ghosh, Joona Karjalainen, Juho Kokkala, Måns Magnusson, Janne Ojanen, Topi Paananen, Markus Paasiniemi, Juho Piironen, Jaakko Riihimäki, Eero Siivola, Tuomas Sivula, Teemu Säilynoja, Jarno Vanhatalo.

About

Bayesian Data Analysis course at Aalto

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TeX 96.6%
  • R 3.2%
  • Python 0.2%