Skip to content

Commit

Permalink
Chapter notes 14-18
Browse files Browse the repository at this point in the history
  • Loading branch information
avehtari committed Jan 2, 2020
1 parent 4c86af7 commit 0dd658a
Show file tree
Hide file tree
Showing 2 changed files with 393 additions and 0 deletions.
Binary file added chapter_notes/BDA_notes_ch14-18.pdf
Binary file not shown.
393 changes: 393 additions & 0 deletions chapter_notes/BDA_notes_ch14-18.tex
@@ -0,0 +1,393 @@
\documentclass[a4paper,11pt]{article}

% \usepackage{babel}
\usepackage[utf8]{inputenc}
% \usepackage[T1]{fontenc}
\usepackage{times}
\usepackage{amsmath}
\usepackage{microtype}
\usepackage{url}
\urlstyle{same}
\usepackage{color}

\usepackage[bookmarks=false]{hyperref}
\hypersetup{%
bookmarksopen=true,
bookmarksnumbered=true,
pdftitle={Bayesian data analysis},
pdfsubject={Comments},
pdfauthor={Aki Vehtari},
pdfkeywords={Bayesian probability theory, Bayesian inference, Bayesian data analysis},
pdfstartview={FitH -32768},
colorlinks=true,
linkcolor=navyblue,
citecolor=black,
filecolor=black,
urlcolor=blue
}


% if not draft, smaller printable area makes the paper more readable
\topmargin -4mm
\oddsidemargin 0mm
\textheight 225mm
\textwidth 160mm

%\parskip=\baselineskip
\def\eff{\mathrm{rep}}

\DeclareMathOperator{\E}{E}
\DeclareMathOperator{\Var}{Var}
\DeclareMathOperator{\var}{var}
\DeclareMathOperator{\Sd}{Sd}
\DeclareMathOperator{\sd}{sd}
\DeclareMathOperator{\Bin}{Bin}
\DeclareMathOperator{\Beta}{Beta}
\DeclareMathOperator{\Invchi2}{Inv-\chi^2}
\DeclareMathOperator{\NInvchi2}{N-Inv-\chi^2}
\DeclareMathOperator{\logit}{logit}
\DeclareMathOperator{\N}{N}
\DeclareMathOperator{\U}{U}
\DeclareMathOperator{\tr}{tr}
%\DeclareMathOperator{\Pr}{Pr}
\DeclareMathOperator{\trace}{trace}
\DeclareMathOperator{\rep}{\mathrm{rep}}

\pagestyle{empty}

\begin{document}
\thispagestyle{empty}

\section*{Bayesian data analysis -- reading instructions Part IV}
\smallskip
{\bf Aki Vehtari}
\smallskip
\bigskip

\noindent
Part IV, Chapters 14--18 discuss basics of linear and generalized
linear models with several examples. The parts discussing computation
can be useful to provide additional insight on these models or
sometimes for actual computation, it's likely that most of the readers
will use some probabilistic programming framework for
computation. Regression and other stories (ROS) by Gelman, Hill and
Vehtari discusses linear and generalized linear models from the
modeling perspective more thoroughly.

\subsection*{Chapter 14: Introduction to regression models}

Outline of the chapter 14:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[14.1] Conditional modeling
\begin{itemize}
\item formal justification of conditional modeling
\item if joint model factorizes $p(y,x|\theta,\phi)={\color{blue}p(y|x,\theta)}p(x|\phi)$\\
we can model just ${\color{blue}p(y|x,\theta)}$
\end{itemize}
\item[14.2] Bayesian analysis of classical regression
\begin{itemize}
\item uninformative prior on $\beta$ and $\sigma^2$
\item connection to multivariate normal (cf. Chapter 3) is useful to understand as it then reveals what would be the conjugate prior
\item closed form posterior and posterior predictive distribution
\item these properties are sometimes useful and thus good to know,
but with probabilistic programming less often needed
\end{itemize}
\item[14.3] Regression for causal inference: incumbency and voting
\begin{itemize}
\item Modelling example with bit of discussion on causal inference
(see more in ROS Chs. 18-21)
\end{itemize}
\item[14.4] Goals of regression analysis
\begin{itemize}
\item discussion of what we can do with regression analysis (see
more in ROS)
\end{itemize}
\item[14.5] Assembling the matrix of explanatory variables
\begin{itemize}
\item transformations, nonlinear relations, indicator variables,
interactions (see more in ROS)
\end{itemize}
\item[14.6] Regularization and dimension reduction
\begin{itemize}
\item a bit outdated and short (Bayesian Lasso is not a good idea),
see more in lecture 9.3,
\url{https://avehtari.github.io/modelselection/} and
\url{https://betanalpha.github.io/assets/case_studies/bayes_sparse_regression.html})
\end{itemize}
\item[14.7] Unequal variances and correlations
\begin{itemize}
\item useful concept, but computation is easier with probabilistic
programming frameworks
\end{itemize}
\item[14.8] Including numerical prior information
\begin{itemize}
\item useful conceptually, but easy computation with probabilistic
programming frameworks makes it easier to define prior information
as the prior doesn't need to be conjugate
\item see more about priors in \url{https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations}
\end{itemize}
\end{list}

\subsection*{Chapter 15 Hierarchical linear models}

Chapter 15 combines hierarchical models from Chapter 5 and linear
models from Chapter 14. The chapter discusses some computational
issues, but probabilistic programming frameworks make computation for
hierarchical linear models easy.

\vspace{\baselineskip}
\noindent
Outline of the chapter 15:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[15.1] Regression coefficients exchangeable in batches
\begin{itemize}
\item exchangeability of parameters
\item the discussion of fixed-, random- and mixed-effects models
is incomplete
\begin{itemize}
\item we don't recommend using these terms, but they are so
popular that it's useful to know them
\item a relevant comment is \emph{The terms ‘fixed’ and ‘random’
come from the non-Bayesian statistical tradition and are
somewhat confusing in a Bayesian context where all unknown
parameters are treated as ‘random’ or, equivalently, as
having fixed but unknown values.}
\item often fixed effects correspond to population level
coefficients, random effects correspond to group or individual
level coefficients, and mixed model has both\\
\begin{tabular}[t]{ll}
{\tt y $\sim$ 1 + x} & fixed / population effect; pooled model\\
{\tt y $\sim$ 1 + (0 + x | g) } & random / group effects \\
{\tt y $\sim$ 1 + x + (1 + x | g) } & mixed effects; hierarchical model
\end{tabular}
\end{itemize}
\end{itemize}
\item[15.2] Example: forecasting U.S. presidential elections
\begin{itemize}
\item illustrative example
\end{itemize}
\item[15.3] Interpreting a normal prior distribution as extra data
\begin{itemize}
\item includes very useful interpretation of hierarchical linear
model as a single linear model with certain design matrix
\end{itemize}
\item[15.4] Varying intercepts and slopes
\begin{itemize}
\item extends from hierarchical model for scalar parameter to
joint hierarchical model for several parameters
\end{itemize}
\item[15.5] Computation: batching and transformation
\begin{itemize}
\item Gibbs sampling part is mostly outdated
\item transformations for HMC is useful if you write your own
models, but the section is quite short and you can get more
information from Stan user guide 21.7 Reparameterization and
\url{https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html}
\end{itemize}
\item[15.6] Analysis of variance and the batching of coefficients
\begin{itemize}
\item ANOVA as Bayesian hierarchical linear model
\item rstanarm and brms packages make it easy to make ANOVA
\end{itemize}
\item[15.7] Hierarchical models for batches of variance components
\begin{itemize}
\item more variance components
\end{itemize}
\end{list}

\subsection*{Chapter 16 Generalized linear models}

Chapter 16 extends linear models to have non-normal observation
models. Model in Bioassay example in Chapter 3 is also generalized
linear model. Chapter reviews the basics and discusses some
computational issues, but probabilistic programming frameworks make
computation for generalized linear models easy (especially with
rstanarm and brms). Regression and other stories (ROS) by Gelman, Hill
and Vehtari discusses generalized linear models from the modeling
perspective more thoroughly.


\vspace{\baselineskip}
\noindent
Outline of the chapter 16:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[16 Intro:]
Parts of generalized linear model (GLM):
\begin{itemize}
\item[1.] The linear predictor $\eta = X\beta$
\item[2.] The link function $g(\cdot)$ and $\mu = g^{-1}(\eta)$
\item[3.] Outcome distribution model with location parameter $\mu$
\begin{itemize}
\item the distribution can also depend on dispersion
parameter $\phi$
\item originally just exponential family distributions
(e.g. Poisson, binomial, negative-binomial), which all have
natural location-dispersion parameterization
\item after MCMC made computation easy, GLM can refer to
models where outcome distribution is not part of exponential
family and dispersion parameter may have its own latent linear
predictor
\end{itemize}
\end{itemize}
\item[16.1] Standard generalized linear model likelihoods
\begin{itemize}
\item section title says ``likelihoods'', but it would be better to say ``observation models''
\item continuous data: normal, gamma, Weibull mentioned, but common
are also Student's $t$, log-normal, log-logistic, and various
extreme value distributions like generalized Pareto distribution
\item binomial (Bernoulli as a special case) for binary and count
data with upper limit
\begin{itemize}
\item Bioassay model uses binomial observation model
\end{itemize}
\item Poisson for count data with no upper limit
\begin{itemize}
\item Poisson is useful approximation of Binomial when the observed
counts are much smaller than the upper limit
\end{itemize}
\end{itemize}
\item[16.2] Working with generalized linear models
\begin{itemize}
\item bit of this and that information on how think about GLMs (see
ROS for more)
\item normal approximation to the likelihood is good for thinking
how much information non-normal observations provide, can be
useful for someone thinking about computation, but easy
computation with probabilistic programming frameworks means not
everyone needs this
\end{itemize}
\item[16.3] Weakly informative priors for logistic regression
\begin{itemize}
\item an excellent section although the recommendation on using
Cauchy has changed (see
\url{https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations})
\item the problem of separation is useful to understand
\item computation part is outdated as probabilistic programming
frameworks make the computation easy
\end{itemize}
\item[16.4] Overdispersed Poisson regression for police stops
\begin{itemize}
\item an example
\end{itemize}
\item[16.5] State-level opinions from national polls
\begin{itemize}
\item another example
\end{itemize}
\item[16.6] Models for multivariate and multinomial responses
\begin{itemize}
\item extension to multivariate responses
\item polychotomous data with multivariate binomial or Poisson
\item models for ordered categories
\end{itemize}
\item[16.7] Loglinear models for multivariate discrete data
\begin{itemize}
\item multinomial or Poisson as loglinear models
\end{itemize}
\end{list}

\subsection*{Chapter 17 Models for robust inference}

Chapter 17 discusses over-dispersed observation models. The discussion
is useful beyond generalized linear models. The computation is
outdated. See Regression and other stories (ROS) by Gelman, Hill
and Vehtari for more examples.

\vspace{\baselineskip}
\noindent
Outline of the chapter 17:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[17.1] Aspects of robustness
\begin{itemize}
\item overdispersed models are often connected to robustness of
inferences to outliers, but the observed data can be overdispersed
without any observation being outlier
\item outlier is sensible only in the context of the model, being
something not well modelled or something requiring extra model
component
\item switching to generic overdispersed model can help to recognize
problem in the non-robust model (sensitivity analysis), but it
can also throw away useful information in the ``outliers'' and it
would be useful to think what is the generative mechanism for
observations which are not like others
\end{itemize}
\item[17.2] Overdispersed versions of standard models\\
\begin{tabular}[t]{lcl}\small
normal & $\rightarrow$ & $t$-distribution\\
Poisson & $\rightarrow$ & negative-binomial \\
binomial & $\rightarrow$ & beta-binomial \\
probit & $\rightarrow$ & logistic / robit
\end{tabular}
\item[17.3] Posterior inference and computation
\begin{itemize}
\item computation part is outdated as probabilistic programming
frameworks and MCMC make the computation easy
\item posterior is more likely to be multimodal
\end{itemize}
\item[17.4] Robust inference for the eight schools
\begin{itemize}
\item eight schools example is too small too see much difference
\end{itemize}
\item[17.5] Robust regression using t-distributed errors
\begin{itemize}
\item computation part is outdated as probabilistic programming
frameworks and MCMC make the computation easy
\item posterior is more likely to be multimodal
\end{itemize}
\end{list}

\subsection*{Chapter 18 Models for missing data}

Chapter 18 extends the data collection modelling from Chapter 8. See
Regression and other stories (ROS) by Gelman, Hill and Vehtari for
more examples.

\vspace{\baselineskip}
\noindent
Outline of the chapter 18:
\begin{list}{$\bullet$}{\parsep=0pt\itemsep=2pt}
\item[18.1] Notation
\begin{itemize}
\item Missing completely at random (MCAR)\\
missingness does not depend on missing values or other observed
values (including covariates)
\item Missing at random (MAR)\\
missingness does not depend on missing values but may depend on
other observed values (including covariates)
\item Missing not at random (MNAR)\\
missingness depends on missing values
\end{itemize}
\item[18.2] Multiple imputation
\begin{itemize}
\item[1.] make a model predicting missing data
\item[2.] sample repeatedly from the missing data model to generate
multiple imputed data sets
\item[3.] make usual inference for each imputed data set
\item[4.] combine results
\item discussion of computation is partially outdated
\end{itemize}
\item[18.3] Missing data in the multivariate normal and $t$ models
\begin{itemize}
\item a special continuous data case computation, which can still
be useful as fast starting point
\end{itemize}
\item[18.4] Example: multiple imputation for a series of polls
\begin{itemize}
\item an example
\end{itemize}
\item[18.5] Missing values with counted data
\begin{itemize}
\item discussion of computation for count data (ie computation in
18.3 is not applicable)
\end{itemize}
\item[18.6] Example: an opinion poll in Slovenia
\begin{itemize}
\item another example
\end{itemize}
\end{list}

\end{document}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End:

0 comments on commit 0dd658a

Please sign in to comment.