/
principled_bayesian_workflow.ipynb
4087 lines (4087 loc) · 553 KB
/
principled_bayesian_workflow.ipynb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Towards A Principled Bayesian Workflow\n",
"_*Michael Betancourt*_<br>\n",
"_*October 2018*_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Given a probabilistic model and a particular observation, Bayesian inference is straightforward to implement. Inferences, and any decisions based upon them, follow immediately in the form of expectations with respect to the corresponding posterior distribution. Building a probabilistic model that is useful in a given application, however, is a far more open-ended challenge. Unfortunately the process of model building is often ignored in introductory texts, leaving practitioners to piece together their own model building workflow from potentially incomplete or even inconsistent heuristics.\n",
"\n",
"In this case study I introduce a _principled_ workflow for building\n",
"and evaluating probabilistic models in Bayesian inference. This workflow is\n",
"based on recent research in collaboration with Dan Simpson, Aki Vehtari, Sean\n",
"Talts, and others; see, for example, Gabry et al. (2017). The specific workflow\n",
"that I advocated here, however, is my particular take on this research that is \n",
"focused on the needs of modeling and inference in the physical sciences. It \n",
"does not necessarily reflect the perspectives of any of my collaborators and may \n",
"prove to be limiting in other applications.\n",
"\n",
"Moreover, as this workflow is an active topic of research it is subject to\n",
"evolution and refinement. Use it at your own risk. But at the same time _don't_\n",
"use at your own risk. Better yet build a calibrated model, infer the relative\n",
"risks, and then make a principled decision...\n",
"\n",
"We begin with a review of Bayesian inference and then a discussion of what \n",
"properties we want in a probabilistic model and in what ways we can validate\n",
"those properties. I then introduce a workflow that implements these validations\n",
"to guide the development of a useful model, emphasizing the role of each step \n",
"and the expertise needed to implement them well. Once the workflow has been \n",
"laid out I demonstrate its application on a seemingly simple analysis that hides \n",
"a few surprises."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bayesian Modeling and Inference\n",
"\n",
"In order to avoid any confusion due to the varied notations and vocabularies \n",
"that abound in the statistical literature, let's first review the foundational \n",
"concepts in Bayesian inference. Be aware that I'll be taking an experimental \n",
"perspective here, and that perspective may clash with different philosophical \n",
"perspectives on Bayesian inference, or statistics in general, that are often \n",
"emphasized in introductory texts. Fortunately, in most cases the differences \n",
"between these perspectives don't have any consequences for how the resulting\n",
"inferences are implemented in practice.\n",
"\n",
"Let's first consider some phenomenon about which we want to learn. Further \n",
"let's assume that this phenomenon manifests in some latent system with which we \n",
"can interact. Importantly, this latent system contains not only the phenomenon \n",
"of interest but also the surrounding environment that exhibits the phenomenon.\n",
"\n",
"In order to learn about this phenomenon we need to design an experiment that \n",
"implements a _measurement process_. The resulting measurements probes the \n",
"latent system, generating data that is hopefully sensitive to the phenomenon of \n",
"interest. Note that I use the term experiment generally here to include also \n",
"processes that collect, curate, and transform existing observations. More \n",
"formally we say that the realizations of the measurement process results is an \n",
"_observation_, $\\tilde{y}$, that is an element of an _observation space_, $Y$.\n",
"\n",
"The measurement process however, is not deterministic. There may be \n",
"_ontological_ variability that results in different observations every time we \n",
"probe the latent system, or _epistemic_ uncertainty that limits how accurately \n",
"we can resolve the measurement. Regardless of its interpretation, we presume\n",
"that this complication is sufficiently regular that it can be quantified in a\n",
"probability distribution over the observation space that we denote the _true \n",
"data generating process_, $\\pi^{\\dagger}$. Because of this limitation of the\n",
"measurement process, the best we can ultimately learn about the underlying \n",
"latent system, and hence the phenomenon of interest, is this true data \n",
"generating process.\n",
"\n",
"Although we presume that a true data generating process exists, we don't know\n",
"what form it will take in a given application. Unfortunately the space of all \n",
"possible data generating processes, $\\mathcal{P}$, is far too large and \n",
"mathematically ungainly to consider, so in practice we have to limit our search \n",
"to a _small world_, $\\mathcal{S}$, or _observational model_ of possible data \n",
"generating processes. \n",
"\n",
"<br><br>\n",
"![](figures/small_world/small_world/small_world.png)\n",
"<br><br>\n",
"\n",
"Each element of the observational model, or each _model configuration_, defines \n",
"a probability distribution over the observation space which quantifies a \n",
"mathematical narrative of how the data could be generated, encapsulating the \n",
"phenomenon of interest as well as any systematic phenomena encountered from the \n",
"latent system through the measurement process. We can then use the assumed\n",
"observational model to learn from observations, making inferences and informing \n",
"decisions about the latent system that we're studying. In practice we specify \n",
"an observational model with a collection of probability densities, \n",
"$\\pi_{\\mathcal{S}} (y ; \\theta)$, with each _parameter_, $\\theta$, identifying a unique\n",
"model configuration.\n",
"\n",
"<br><br>\n",
"![](figures/model_config/model_config.png)\n",
"<br><br>\n",
"\n",
"Bayesian inference encodes information about the observational model in\n",
"probability distributions -- the more probability a distribution assigns to a \n",
"neighborhood of model configurations, the more consistent those model \n",
"configurations are with our available information. From this perspective the \n",
"model configuration space is no longer just a collection of data generating \n",
"processes but also a conditional probability distribution over the product \n",
"of the observation space and the small world, $Y \\times \\mathcal{S}$,\n",
"$$\n",
"\\pi_{\\mathcal{S}}(y ; \\theta) = \\pi_{\\mathcal{S}}(y \\mid \\theta).\n",
"$$\n",
"\n",
"Introducing a _prior distribution_, $\\pi_{\\mathcal{S}}(\\theta)$, that encodes \n",
"domain expertise about the measurement process and latent system complements the \n",
"observational model to give a _Bayesian joint distribution_,\n",
"$$\n",
"\\pi_{\\mathcal{S}}(y, \\theta) = \\pi_{\\mathcal{S}}(y ; \\theta) \\, \\pi_{\\mathcal{S}}(\\theta).\n",
"$$\n",
"Given the an explicit observation, $\\tilde{y}$, this joint distribution then\n",
"defines a _posterior distribution_ that concentrates on those model\n",
"configurations consistent with both our domain expertise and the measurement,\n",
"$$\n",
"\\pi_{\\mathcal{S}}(\\theta \\mid \\tilde{y}) = \n",
"\\frac{ \\pi_{\\mathcal{S}}(\\tilde{y}, \\theta) }{ \\int \\mathrm{d} \\theta \\, \\pi_{\\mathcal{S}}(\\tilde{y}, \\theta) }\n",
"\\propto \\pi_{\\mathcal{S}}(\\tilde{y}, \\theta).\n",
"$$\n",
"\n",
"We can also interpret this process as _updating_ the prior distribution into\n",
"a posterior distribution,\n",
"$$\n",
"\\pi_{\\mathcal{S}}(\\theta \\mid \\tilde{y}) = \\left( \\frac{ \\pi_{\\mathcal{S}}(\\tilde{y} \\mid \\theta) }\n",
"{ \\pi_{\\mathcal{S}}(\\tilde{y}) } \\right) \\pi_{\\mathcal{S}}(\\theta),\n",
"$$ \n",
"which makes the inherent learning process more evident. Our information after \n",
"the observation, encoded in the posterior, is just our information before the \n",
"observation, encoded in the prior, plus the information we learn from the \n",
"observation, encoded in the ratio\n",
"$$\n",
"\\frac{ \\pi_{\\mathcal{S}}(\\tilde{y}, \\theta) }\n",
"{ \\int \\mathrm{d} \\theta \\, \\pi_{\\mathcal{S}}(\\tilde{y}, \\theta) }.\n",
"$$\n",
"\n",
"<br><br>\n",
"![](figures/bayes/bayes.png)\n",
"<br><br>\n",
"\n",
"Any inferential query we have can then be answered with an expectation value of\n",
"an appropriate function with respect to the posterior distribution. For \n",
"example, where in the small world the most consistent model configurations \n",
"concentrate can be quantified with the mean or the median of the posterior \n",
"distribution. Similarly, the breadth of this concentration can be quantified \n",
"with the standard deviation or tail quantiles of the posterior distribution."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Questioning Authority\n",
"\n",
"Given an observational model and a prior distribution, or equivalently the \n",
"joint distribution, $\\pi_{\\mathcal{S}}(y, \\theta)$, Bayesian inference is \n",
"straightforward to implement. At least conceptually, The utility of any \n",
"inferences we derive, however, are completely dependent on this assumed model.\n",
"A poor model might readily yield inferences, but those inferences may be of \n",
"little use in practice and may even be dangerous in critical applications.\n",
"\n",
"What, however, characterizes a useful probabilistic model? Ideally the model \n",
"would be consistent with our domain expertise, capturing not necessarily _all_ \n",
"of our knowledge of the experimental design but just enough to admit answers to \n",
"the scientific questions of interest. It would also help if the model didn't \n",
"obstruct our available computational tools so that we could accurately compute \n",
"posterior expectation values and faithfully implement out inferences in \n",
"practice. Finally, we would hope that our model is rich enough to capture the \n",
"structure of the true data generating process that is needed to answer our \n",
"scientific questions.\n",
"\n",
"Consequently we can verify the utility of a model in a given application by\n",
"trying to answer four questions.\n",
"\n",
"#### Question One: Domain Expertise Consistency\n",
"\n",
"_Is our model consistent with our domain expertise?_\n",
"\n",
"#### Question Two: Computational Faithfulness\n",
"\n",
"_Are our computational tools sufficient to accurately fit the model?_\n",
"\n",
"#### Question Three: Model Sensitivity\n",
"\n",
"_How do we expect our inferences to perform over the possible realizations of\n",
"the measurement process?_\n",
"\n",
"#### Question Four: Model Adequacy\n",
"\n",
"_Is our model rich enough to capture the relevant structure of the true data\n",
"generating process?_\n",
"\n",
"These questions can be challenging to answer even in simple applications, but\n",
"with careful analysis of a given model we can assemble the necessary evidence to \n",
"answer each of the four questions well enough to identify practical problems \n",
"with the model.\n",
"\n",
"## Domain Expertise Consistency \n",
"\n",
"A common refrain is to \"let the data speak for itself\". In a statistical\n",
"analysis, however, the data speak only through the observational model, and the\n",
"observational model isn't always so talkative.\n",
"\n",
"When the _likelihood function_, $\\pi_{\\mathcal{S}}(\\tilde{y} \\mid \\theta)$, \n",
"concentrates in a small neighborhood of parameter space the structure of the \n",
"prior isn't particularly important. The information we learn from the \n",
"observation dominates the form of the posterior distribution.\n",
"\n",
"<br><br>\n",
"![](figures/joint_triptych/joint_triptych_one/joint_triptych_one.png)\n",
"<br><br>\n",
"\n",
"The observational model, however, isn't always so well-behaved. Sometimes the\n",
"experimental design isn't sufficiently sensitive to the phenomena of interest,\n",
"which manifests in a _weakly-identified_ likelihood function that spans huge\n",
"regions of the model configuration space, or even a _non-identified_ likelihood\n",
"function that extends all the way to infinity. In these cases the form of the\n",
"posterior is strongly influenced by the form of the prior, and a careless prior \n",
"propagates to a careless posterior.\n",
"\n",
"<br><br>\n",
"![](figures/joint_triptych/joint_triptych_two/joint_triptych_two.png)\n",
"<br><br>\n",
"\n",
"In order to compensate for poor identification in the likelihood function we\n",
"need the prior distribution to incorporate just enough domain expertise to\n",
"suppress extreme, although not impossible, model configurations.\n",
"\n",
"<br><br>\n",
"![](figures/joint_triptych/joint_triptych_three/joint_triptych_three.png)\n",
"<br><br>\n",
"\n",
"That said, the success of a _weakly-informative prior_ like this depends on how \n",
"the prior interacts with the likelihood function, and in sophisticated models \n",
"these interactions can be difficult to analyze. In practice it's often easiest \n",
"to study how these interactions affect observations predicted by the model.\n",
"\n",
"The _prior predictive distribution_,\n",
"$$\n",
"\\pi_{\\mathcal{S}}(y) = \\int \\mathrm{d} \\theta \\, \\pi_{\\mathcal{S}}(y, \\theta),\n",
"$$\n",
"averages each data generating process within the observational model over the\n",
"prior distribution, quantifying the range of observations deemed reasonable\n",
"within the scope of our modeling assumptions. A model predicting too many \n",
"observations that are considered extreme within the scope of our domain\n",
"expertise then indicates conflict between the modeling assumptions and that \n",
"domain expertise.\n",
"\n",
"Analyzing the posterior predictive distribution, however, is not particularly\n",
"straightforward when the observation space is high-dimensional. In this case \n",
"we have to consider _summary statistics_ that project the observational space to \n",
"some lower dimensional space that's more amenable to visualization, and hence \n",
"practical comparison to our often implicit domain expertise. For example we \n",
"might project the observations to a one-dimensional summary, \n",
"$t : Y \\rightarrow \\mathbb{R}$, or even a structured summary like a histogram or \n",
"empirical cumulative distribution function. Constructing an interpretable yet\n",
"informative summary statistic is very much a fine art.\n",
"\n",
"Once a summary statistic has been chosen we can then identify which values\n",
"of the summary statistic correspond to observations that begin to become \n",
"extreme. For example, if we're observing how quickly a drug propagates through\n",
"a patient's blood stream we can be fairly confident that the observed \n",
"propagation speed will be less than the speed of sound in water as a drug dose\n",
"traveling that fast would have unfortunately physical consequences. Similarly,\n",
"if we're observing carbon in the air then we don't want our model predicting\n",
"carbon concentrations more characteristic of solid concrete than gaseous, \n",
"breathable air. If we're observing the migration of birds then we can be fairly \n",
"sure that we won't observe any particularly healthy individuals cruising near \n",
"the speed of light. In practice these thresholds are readily identified with a \n",
"cursory examination of the available domain expertise, especially if the summary \n",
"statistics are well-chosen.\n",
"\n",
"With summary statistics and thresholds in hand _prior predictive checks_\n",
"follow immediately. We simply push the prior predictive distribution forward \n",
"along each the summary statistic, the corresponding probability density function\n",
"$\\pi_{t(Y)}(t)$ here show in dark red, and then consider how much \n",
"probability leaks past the threshold and into the extreme values, here shown in \n",
"light red.\n",
"\n",
"<br><br>\n",
"![](figures/prior_predictive/prior_predictive_theory/prior_predictive_theory.png)\n",
"<br><br>\n",
"\n",
"In practice we typically can't construct the probability density function for\n",
"this pushforward distribution analytically, but we can approximate it using \n",
"samples from the joint distribution. Sampling\n",
"$$\n",
"\\tilde{\\theta} \\sim \\pi_{\\mathcal{S}}(\\theta)\n",
"$$\n",
"$$\n",
"\\tilde{y} \\sim \\pi_{\\mathcal{S}}(y \\mid \\tilde{\\theta})\n",
"$$\n",
"yields samples from the joint distribution, $(\\tilde{y}, \\tilde{\\theta})$,\n",
"and evaluating the summary statistic at the simulated observations,\n",
"$$\n",
"t(\\tilde{y})\n",
"$$\n",
"yields samples from the pushforward of the prior predictive distribution. We\n",
"can then histogram these samples to implement the posterior predictive check.\n",
"\n",
"<br><br>\n",
"![](figures/prior_predictive/prior_predictive_practice/prior_predictive_practice.png)\n",
"<br><br>\n",
"\n",
"The model shouldn't predict _no_ observations past the thresholds -- extreme\n",
"observations are unreasonable but not impossible, after all. Instead what we\n",
"want to look for is excessive predictions surpassing the thresholds, and this is \n",
"a more qualitative than quantitative judgement. Indeed visualizations of the \n",
"pushforward distributions are a powerful way to elicit qualitative information \n",
"from domain experts unfamiliar with statistics. The emphasis on the observation \n",
"space is often more interpretable than the configurations of a specific \n",
"observational model. Moreover, the emphasis on extremes allows those experts to \n",
"reject bad assumptions, which they are often far more eager to do opposed to\n",
"accepting good assumptions.\n",
"\n",
"The idea of analyzing the prior predictive distribution goes back to Good's \n",
"_device of imaginary results_ (Good 1950), which conveniently also gives its \n",
"user +2 to Wisdom saving throws.\n",
"\n",
"## Computational Faithfulness\n",
"\n",
"A model is only as good as our ability to wield it. In Bayesian inference, a\n",
"model is only as good as our ability to compute expectation values with respect\n",
"to the posterior distributions it generates, which is often denoted _fitting_\n",
"the model. Robust methods that offer a self-diagnostics capable of identifying \n",
"inaccurate computation, such as Hamiltonian Monte Carlo, are extremely powerful \n",
"in this regard as they provide some confidence that we can realize the \n",
"inferences of our model. \n",
"\n",
"If these diagnostics are available then the prior predictive distribution \n",
"proves further useful. Fitting observations simulated from the prior predictive \n",
"distribution allows us check the performance of our chosen computational method \n",
"over the range of reasonable posterior distribution behaviors.\n",
"\n",
"What can we do, however, if our method isn't self-diagnostic? Fortunately\n",
"Bayesian inference implies a subtle consistency property that allows us to \n",
"check _any_ method capable of generating posterior samples. Averaging the\n",
"posterior distributions fit from observations drawn from the prior predictive \n",
"distribution will _always_ recover the prior distribution,\n",
"$$\n",
"\\pi_{\\mathcal{S}}(\\theta') = \\int \\mathrm{d} y \\, \\mathrm{d} \\theta \\,\n",
"\\pi_{\\mathcal{S}} (\\theta' \\mid y) \\, \\pi_{\\mathcal{S}} (y, \\theta).\n",
"$$\n",
"In particular this implies that the ensemble of posterior samples,\n",
"$$\n",
"\\tilde{\\theta} \\sim \\pi_{\\mathcal{S}}(\\theta)\n",
"$$\n",
"$$\n",
"\\tilde{y} \\sim \\pi_{\\mathcal{S}}(y \\mid \\tilde{\\theta})\n",
"$$\n",
"$$\n",
"\\tilde{\\theta}' \\sim \\pi(\\theta \\mid \\tilde{y}),\n",
"$$\n",
"will always be identical to samples from the prior distribution. Any\n",
"deviation between the two samples indicates that either our model has been \n",
"incorrectly implemented or our computational method is generating biased\n",
"samples.\n",
"\n",
"_Simulated-based calibration_ (Talts et al. 2018) compares the ensemble posterior\n",
"sample and the prior sample using _ranks_. For each simulated observation we\n",
"generate $R$ samples from the corresponding posterior distribution,\n",
"$$\n",
"\\tilde{\\theta} \\sim \\pi_{\\mathcal{S}}(\\theta)\n",
"$$\n",
"$$\n",
"\\tilde{y} \\sim \\pi_{\\mathcal{S}}(y \\mid \\tilde{\\theta})\n",
"$$\n",
"$$\n",
"(\\tilde{\\theta}'_{1}, \\ldots, \\tilde{\\theta}'_{R}) \\sim \\pi(\\theta \\mid \\tilde{y}),\n",
"$$\n",
"and compute the rank of the prior sample within the posterior samples, i.e.\n",
"the number of posterior samples larger than the prior sample,\n",
"$$\n",
"\\rho = \\sharp \\left\\{ \\tilde{\\theta} < \\tilde{\\theta}'_{r} \\right\\}.\n",
"$$\n",
"If the ensemble posterior sample and the prior sample are equivalent then these\n",
"ranks should be uniformly distributed, which we can quickly analyze for each\n",
"parameter in the model configuration space using a histogram.\n",
"\n",
"<br><br>\n",
"![](figures/sbc/sbc.png)\n",
"<br><br>\n",
"\n",
"Here the grey band demonstrates the expected variation of ranks distributed \n",
"according to an exact uniform distribution. Deviation outside of these bands,\n",
"especially systematic deviation of many bins at once, indicates computational \n",
"problems. See Talts et al. (2018) for further discussion of common systematic \n",
"deviations and their interpretations.\n",
"\n",
"Technically this method requires exact posterior samples, which are not \n",
"available if we using, for example, Markov chain Monte Carlo. To use simulation\n",
"based calibration with Markov chain Monte Carlo we have to first thin our Markov \n",
"chains to remove any effective autocorrelation. See Talts et al. (2018) for more \n",
"details.\n",
"\n",
"## Model Sensitivity\n",
"\n",
"The scope of reasonable observations and model configurations quantified in the \n",
"Bayesian joint distribution also provides a way to analyze the range of \n",
"inferential outcomes we should expect, and in particular _calibrate_ them with\n",
"respect to the simulated truths (Betancourt 2018).\n",
"\n",
"For example, let's say that we have some _decision making process_ that takes in\n",
"an observation as input and returns an action from the space of possible \n",
"actions, $A$,\n",
"$$\n",
"\\begin{alignat*}{6}\n",
"a :\\; &Y& &\\rightarrow& \\; &A&\n",
"\\\\\n",
"&y& &\\mapsto& &a(y)&.\n",
"\\end{alignat*}\n",
"$$\n",
"This process might be a Bayesian decision theoretical process informed by\n",
"posterior expectations, but it need not be. It could just as easily be a\n",
"process to reject or accept a null hypothesis using an orthodox significance \n",
"test.\n",
"\n",
"Further let's say that the benefit of taking action $a \\in A$ when the true\n",
"data generating process is given by the model configuration $\\theta$ is\n",
"$$\n",
"\\begin{alignat*}{6}\n",
"U :\\; &A \\times \\mathcal{S}& &\\rightarrow& \\; &\\mathbb{R}&\n",
"\\\\\n",
"&(a, \\theta)& &\\mapsto& &U(a, \\theta)&.\n",
"\\end{alignat*}\n",
"$$\n",
"For example this might be a false discovery rate or the benefits of an \n",
"intervention minus any of its implementation costs. In this context the utility \n",
"of a decision making process for a given observation becomes \n",
"$$\n",
"\\begin{alignat*}{6}\n",
"U :\\; &Y& &\\rightarrow& \\; &\\mathbb{R}&\n",
"\\\\\n",
"&(y, \\theta)& &\\mapsto& &U(a(y), \\theta)&.\n",
"\\end{alignat*}\n",
"$$\n",
"\n",
"Before the measurement process resolves we don't know what the observation will\n",
"be, nor do we know what the true data generating process will be. If we \n",
"assume that our model is rich enough to capture the true data generating \n",
"process, however, then the joint distribution quantifies the possibilities.\n",
"Consequently we can construct the distribution of possible utilities by \n",
"pushing the Bayesian joint distribution forward along the utility function to\n",
"give $\\pi_{U(A, \\mathcal{S})}$.\n",
"\n",
"<br><br>\n",
"![](figures/sensitivity/sensitivity_theory/sensitivity_theory.png)\n",
"<br><br>\n",
"\n",
"As with the prior predictive check, we can approximate the probability density \n",
"function of this pushforward distribution in practice using samples from the \n",
"joint distribution,\n",
"$$\n",
"\\tilde{\\theta} \\sim \\pi_{\\mathcal{S}}(\\theta)\n",
"$$\n",
"$$\n",
"\\tilde{y} \\sim \\pi_{\\mathcal{S}}(y \\mid \\tilde{\\theta})\n",
"$$\n",
"$$\n",
"U(a(\\tilde{y}), \\tilde{\\theta}).\n",
"$$\n",
"\n",
"<br><br>\n",
"![](figures/sensitivity/sensitivity_practice/sensitivity_practice.png)\n",
"<br><br>\n",
"\n",
"This distribution of utilities is particularly useful for understanding how\n",
"sensitive the measurement process is to the relevant features of the latent\n",
"system that we're studying. By carefully quantifying what we want to learn in\n",
"the experiment into a utility function, we can formalize how capable our model\n",
"is at answering those questions.\n",
"\n",
"We can also take a less application-specific approach and use the Bayesian joint \n",
"distribution to characterize the general performance of our inferences and\n",
"capture any pathologies that might be lurking within the assumptions of our\n",
"model.\n",
"\n",
"Consider, for example, the _posterior $z$-score_ of a given parameter in the\n",
"model configuration space,\n",
"$$\n",
"z = \\left|\n",
"\\frac{ \\mu_{\\mathrm{post}} - \\tilde{\\theta} }\n",
"{ \\sigma_{\\mathrm{post}} } \\right|.\n",
"$$\n",
"The posterior $z$-score quantifies how accurately the posterior recovers the \n",
"true model configuration, with smaller values indicating a posterior that\n",
"concentrates more tightly around the corresponding true parameter and larger\n",
"values indicating a posterior that concentrates elsewhere.\n",
"\n",
"Similarly, the _posterior shrinkage_ of a given parameter,\n",
"$$\n",
"s = 1 -\n",
"\\frac{ \\sigma^{2}_{\\mathrm{post}} }\n",
"{ \\sigma^{2}_{\\mathrm{prior}} },\n",
"$$\n",
"quantifies how much the posterior learns about that parameter from a given \n",
"observation. Posterior shrinkage near zero indicates that the data provide \n",
"little information beyond the domain expertise encoded in the prior \n",
"distribution, while posterior shrinkage near one indicates observations that \n",
"strongly inform this parameter.\n",
"\n",
"Conveniently, these two posterior expectations can capture an array of \n",
"pathological behavior that typically compromises our inferences. For example,\n",
"a posterior with high posterior shrinkage and high posterior $z$-score \n",
"indicates a model that overfits to the variation in the measurement and is\n",
"unable to accurately recover the true model configuration. A posterior with\n",
"small posterior shrinkage and small posterior $z$-scores indicates a model that\n",
"is poorly informed by the observations, where as a posterior with small \n",
"posterior shrinkage and large posterior $z$-scores indicates a model with\n",
"substantial conflict between the observational model and the prior distribution.\n",
"\n",
"If we fit the posterior for each observation in an ensemble of samples from\n",
"the Bayesian joint distribution,\n",
"$$\n",
"\\tilde{\\theta} \\sim \\pi_{\\mathcal{S}}(\\theta)\n",
"$$\n",
"$$\n",
"\\tilde{y} \\sim \\pi_{\\mathcal{S}}(y \\mid \\tilde{\\theta}),\n",
"$$\n",
"then we can visualize the distribution of posterior $z$-scores and posterior \n",
"shrinkages for each parameter,\n",
"$$\n",
"z(\\pi_{S}(\\theta_{n} \\mid \\tilde{y}), \\tilde{\\theta}_{n})\n",
"$$\n",
"$$\n",
"s(\\pi_{S}(\\theta_{n} \\mid \\tilde{y}), \\pi_{S}(\\theta_{n}))\n",
"$$\n",
"with a scatter plot of each posterior's behavior and quickly identify potential \n",
"problems.\n",
"\n",
"<br><br>\n",
"![](figures/eye_chart/eye_chart_regimes.png)\n",
"<br><br>\n",
"\n",
"## Model Adequacy\n",
"\n",
"Finally we can consider the adequacy of our model in capturing the relevant\n",
"structure of the true data generating process.\n",
"\n",
"If our model is sufficiently rich then the small world might contain the true\n",
"data generating process and inferences within the small world might identify \n",
"that true data generating process. In practice, however, reality is much richer \n",
"than any model we could feasibly implement and the small world is unlikely to\n",
"contain the true data generating process.\n",
"\n",
"<br><br>\n",
"![](figures/small_world/small_world_one/small_world_one.png)\n",
"<br><br>\n",
"\n",
"In this more realistic scenario the posteriors can't concentrate around the true\n",
"data generating process, and instead must be resort to concentrating around the\n",
"model configurations within the small world that best approximate the true data\n",
"generating process.\n",
"\n",
"<br><br>\n",
"![](figures/small_world/small_world_two/small_world_two.png)\n",
"<br><br>\n",
"\n",
"The pertinent question is then not whether or not our small world contains the\n",
"true data generating process but rather whether or not our small world is \n",
"_close enough_ to the true data generating process. To paraphrase George Box,\n",
"our model is probably wrong, but is it useful?\n",
"\n",
"Consequently, in order to quantify the adequacy of our model we need to \n",
"determine how close it is to the true data generating process, and in order to \n",
"do that we need to summarize our inferences into a predictive distribution that \n",
"we can directly compare to the true data generating process. For that we \n",
"appeal to the _posterior predictive distribution_,\n",
"$$\n",
"\\pi_{\\mathcal{S}}(y \\mid \\tilde{y}) = \\int \\mathrm{d} \\theta \\, \n",
"\\pi_{\\mathcal{S}}(\\theta \\mid \\tilde{y}) \\, \\pi_{\\mathcal{S}}(y \\mid \\theta),\n",
"$$\n",
"which averages all of the data generating process within the observational model \n",
"over the posterior distribution, quantifying the range of predictions consistent \n",
"with both our domain expertise and the observation, $\\tilde{y}$. \n",
"\n",
"If we want to compare the whole of the posterior predictive distribution and the \n",
"true data generating process, then some reasonable assumptions quickly lead us\n",
"to the _Kullback-Leibler divergence_, \n",
"$\\mathrm{KL}(\\pi^{\\dagger}(y) \\mid \\mid \\pi_{\\mathcal{S}}(y \\mid \\tilde{y}))$ \n",
"(Betancourt 2015), which vanishes when the two distributions are identical and \n",
"monotonically increases as they deviate. \n",
"\n",
"That said, in practice we can't compute this divergence without knowing the true \n",
"data generating process! Instead we have to _approximate_ the divergence using \n",
"just our observed data, which ultimately yields Bayesian cross validation and \n",
"the many information criteria that further approximate that. Importantly in \n",
"this series of approximations we lose the ability to estimate absolute \n",
"divergences and instead can estimate only _differences_ between divergences from \n",
"the posterior predictive distributions of different models. This makes these \n",
"approximations useful for model comparison, but not determining the adequacy of \n",
"a single model in isolation.\n",
"\n",
"Moreover, there's also a deeper issue with comprehensive measures like the \n",
"Kullback-Leibler divergence between the posterior predictive distribution and\n",
"the true data generating process. We know our model will not capture every \n",
"intimate detail of reality, and we really only want our model to capture those\n",
"details relevant to our scientific goals. If we're modeling bird migration then \n",
"we don't have any need to model the irrelevant effects of quantum mechanics or \n",
"general relativity! Comprehensive measures, however, are often strongly \n",
"affected by these irrelevant details, making it difficult to isolate the \n",
"structures about which we do care.\n",
"\n",
"Fortunately we've already thought about isolating the features of the \n",
"observation space relevant to our scientific questions -- they're exactly what\n",
"motivated the summary statistics we would use for a prior predictive check!\n",
"Consequently we can reuse those summary statistics to construct _posterior\n",
"predictive checks_ that visually compare the pushforwards of the posterior \n",
"predictive distribution, $\\pi_{t(Y) \\mid Y}(t \\mid \\tilde{y})$, to the observed\n",
"summary statistic, $t(\\tilde{y})$.\n",
"\n",
"If the observed summary statistic, here shown in light red, falls within the \n",
"bulk of the pushforward distribution, here shown in dark red, then our model is\n",
"doing an adequate job of modeling the behaviors captured by the summary \n",
"statistic.\n",
"\n",
"<br><br>\n",
"![](figures/posterior_predictive/posterior_predictive_good/posterior_predictive_good.png)\n",
"<br><br>\n",
"\n",
"If the observed summary statistic falls in the tails of the pushforward \n",
"distribution, however, then the situation is not so clear.\n",
"<br><br>\n",
"![](figures/posterior_predictive/posterior_predictive_bad/posterior_predictive_bad.png)\n",
"<br><br>\n",
"\n",
"Unfortunately we can't discriminate between our observation being a rare but not\n",
"impossible extreme and our model being insufficiently sophisticated to capture \n",
"the relevant structure of the true data generating process that manifests in the \n",
"summary statistic. Instead we have to rely on our domain expertise to identify\n",
"possible _model expansions_ that could resolve this discrepancy without \n",
"compromising our inferences if the discrepancy is just a mirage.\n",
"\n",
"As with prior predictive checks, in practice we typically can't construct the \n",
"probability density function for these pushforward distributions analytically \n",
"in order to implement a posterior predictive check. We can, however, readily\n",
"approximate it with samples from the posterior predictive distribution,\n",
"$$\n",
"\\tilde{y}' \\sim \\pi_{\\mathcal{S}}(\\theta \\mid \\tilde{y}),\n",
"$$\n",
"on which we can then evaluate our summary statistics, \n",
"$$\n",
"t(\\tilde{y}')\n",
"$$\n",
"to give samples from the pushforwards of the posterior predictive distribution.\n",
"\n",
"<br><br>\n",
"![](figures/posterior_predictive/posterior_predictive_practice/posterior_predictive_practice.png)\n",
"<br><br>\n",
"\n",
"Posterior predictive checks date back to Rubin (1984) and the beginnings of \n",
"the simulation revolution in statistics."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Building a Mystery\n",
"\n",
"If there are no indications that a given model is failing any of these four\n",
"questions then we can proceed to apply it in practice confident that it will\n",
"provide reasonably robust inferences for the relevant scientific goals. If \n",
"there are signs of trouble for any of these four questions, however, then we\n",
"have to consider improving the model, or occasionally even reducing the \n",
"ambition of our scientific questions in the first place.\n",
"\n",
"The goal of a principled Bayesian workflow is to guide a practitioner through \n",
"a regimented model critique that addresses these four questions. In particular\n",
"any criticisms need to be _interpretable_ in order to inform the practitioner \n",
"how to improve the analysis and address those criticisms. This feedback then\n",
"further guides iterative model development that converges to a model capable of\n",
"tackling the relevant scientific goals.\n",
"\n",
"A robust model development is initialized with a conceptual analysis of the\n",
"model that a practitioner would fit without the practical limitations of finite\n",
"computational resources, time, and mathematical tools. This _aspirational \n",
"model_, $\\mathcal{S}_{A}$, would incorporate all of the systematic effects that \n",
"might influence the measurement process, such as heterogeneity across \n",
"individuals or variation across space and time, and richer structure than \n",
"typically presumed in the phenomenological models often used in practice. It\n",
"would also incorporate all of the domain expertise available within a field.\n",
"Reasoning about the aspirational model forces practitioners to reason about the \n",
"subtleties in the latent system being studied and the measurement process being \n",
"used to probe that system.\n",
"\n",
"Within this abstract aspirational model we then explicitly build our initial \n",
"model, $\\mathcal{S}_{1}$, to be just sophisticated enough to incorporate the \n",
"phenomena of scientific interest but no more. This initial model typically \n",
"includes few if any systematic effects and only as much domain expertise as \n",
"needed to motivate an initial prior distribution.\n",
"\n",
"<br><br>\n",
"![](figures/asp_simple/asp_simple.png)\n",
"<br><br>\n",
"\n",
"The aspirational model provides critical context for this initial model. In\n",
"particular, the _difference_ between the two models, \n",
"$\\mathcal{S}_{A} \\setminus \\mathcal{S}_{1}$, encapsulates all of the potentially\n",
"relevant structure ignored by the initial model.\n",
"\n",
"<br><br>\n",
"![](figures/asp_simple_diff/asp_simple_diff.png)\n",
"<br><br>\n",
"\n",
"This abstract difference then motivates principled summary statistics that \n",
"isolate these ignored structures, providing the foundations of constructive\n",
"prior and posterior predictive checks for the initial model.\n",
"\n",
"If the initial model proves inadequate then we have to improve it, and the\n",
"context of the aspirational model guides that improvement so that we're not\n",
"adding arbitrary complexity irrelevant to our scientific goals. In particular,\n",
"posterior predictive checks based on interpretable summary statistics \n",
"immediately suggest the structure isolated by those summary statistics, such as\n",
"heterogeneity or time dependence in the observational model or domain expertise\n",
"neglected by the prior distribution. Adding this structure yields an expanded\n",
"model, and if that expanded model still proves inadequate then we iterate, \n",
"sequentially expanding our model until it becomes capable of achieving our \n",
"scientific goals. \n",
"\n",
"<br><br>\n",
"![](figures/model_expansion/model_expansion.png)\n",
"<br><br>\n",
"\n",
"By reasoning about the aspirational model first we don't randomly walk through\n",
"the space of all possible models but rather take coherent steps from our simple\n",
"initial model towards the aspirational model.\n",
"\n",
"The emphasis on model _expansion_ is critical. By subsuming the previous model\n",
"a new model can always fall back to the previous model configurations if the\n",
"added structure isn't actually helpful in explaining the true data generating\n",
"process. Moreover, quantifying our inferences with an entire posterior\n",
"distribution, as opposed to say a point estimate of a single model \n",
"configuration, ensures that the initial model configurations continue to\n",
"influence our inferences so long as they remain consistent with the \n",
"observations. This makes our model development workflow particularly robust to\n",
"the faux pas of rushing towards unneeded complexity and overfitting, especially \n",
"when equipping each expanded model with priors distribution that incorporate the \n",
"qualitative features of _penalized complexity priors_ that favor the previous \n",
"model (Simpson et al. 2017).\n",
"\n",
"Finally it's important to recognize that a better model is not always sufficient\n",
"to obtain the answers we want. Sometimes our initial scientific goals are too \n",
"ambitious, or our experiment is too insensitive to the phenomena of interest. \n",
"In these cases a useful workflow will also identify the limitations of the \n",
"measurement process and suggest more realistic goals for the available data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Work it, Make it, Do it, Makes Us Harder, Better, Faster, Stronger\n",
"\n",
"Putting this all together we can now design a principled Bayesian workflow that\n",
"guides the development of robust models in practical applications. The\n",
"workflow introduced here begins by examining the experimental design and the\n",
"resulting measurement process. Only once the measurement process is understood\n",
"do we build our initial model and study its performance in the context of our\n",
"statistical and domain expertise. Finally we can analyze the fit of the model\n",
"on the observed data, verifying the computational accuracy of the fit and the \n",
"adequacy of the model to capture the relevant features of the true data \n",
"generating process. At any point we might identify failures which motivate \n",
"improved models for which the workflow restarts.\n",
"\n",
"Here _domain expertise_ refers to the experience of those responsible for\n",
"collecting, curating, or otherwise manipulating the data, as well as \n",
"stakeholders who will make decisions using the final inferences or any \n",
"intermediaries who will help stakeholders to make those decisions. _Statistical \n",
"expertise_ refers to proficiency in probabilistic modeling and computation.\n",
"\n",
"## Experimental Design\n",
"\n",
"Before modeling anything we need to first consider the design of our experiment, \n",
"especially in the context of our scientific goals. \n",
"\n",
"### Step One: Conceptual Analysis\n",
"\n",
"We begin by developing a conceptual understanding of the measurement process \n",
"from the latent phenomena of interest through its interactions with the \n",
"environment and how those interactions manifest as observations. This\n",
"interaction then needs to be contrasted with our scientific goals in order to \n",
"qualify how sensitive the observations are to the desired scientific phenomena.\n",
"Moreover, we want to pay particular attention to the many possible systematic \n",
"effects that influence this process but are not known with complete certainty. \n",
"\n",
"This conceptual analysis yields informal narratives describing how observations \n",
"are generated, forming the basis of the aspirational model to which all of our\n",
"explicit models will yearn to be when they grow up.\n",
"\n",
"**Requirements:** Domain Expertise\n",
"\n",
"### Step Two: Define Observations\n",
"\n",
"Given our conceptual understanding of the measurement process we can take our \n",
"first steps towards building a formal mathematical model by defining the space \n",
"in which observations take values. This may include, for example, the number of \n",
"repetitions, subjects, groups, or components in each observation as well as the \n",
"range of values each component can take.\n",
"\n",
"**Requirements:** Domain Expertise and Statistical Expertise\n",
"\n",
"### Step Three: Identify Relevant Summary Statistics\n",
"\n",
"With an observation space formally defined we can begin to construct summary \n",
"statistics that isolate relevant properties of the experimental design, \n",
"especially those properties that influence decision making and those that we \n",
"expect to be difficult to model and hence unlikely to be adequately described\n",
"by our early models. These summary statistics provide then provide the basis \n",
"for answering [Question One](#Question-One:-Domain-Expertise-Consistency) and \n",
"[Question Four](#Question-Four:-Model-Adequacy).\n",
"\n",
"In preparation for prior predictive checks we also want to complement these \n",
"summary statistics with conceptual expectations for what behaviors are \n",
"reasonable, or, as is often easier to elicit in practice, what behaviors are \n",
"_unreasonable_.\n",
"\n",
"**Requirements:** Domain Expertise\n",
"\n",
"## Modeling\n",
"\n",
"Once the experimental design has been fully considered we can begin the\n",
"modeling process in earnest. Before we utilize any observed data we want to \n",
"ensure that our model holds steadfast under the scrutiny of \n",
"[Question One](#Question-One:-Domain-Expertise-Consistency), [Question Two](#Question-Two: Computational-Faithfulness), and\n",
"[Question Three](#Question-Three:-Model-Sensitivity).\n",
"\n",
"### Step Four: Build a Model \n",
"\n",
"A Bayesian model requires an observation model, \n",
"$\\pi_{\\mathcal{S}}(y \\mid \\theta)$, comprised of possible data generating \n",
"processes and a prior model, $\\pi_{\\mathcal{S}}(\\theta)$, that encodes selected \n",
"domain expertise. As our models grow in sophistication, however, the \n",
"distinction between these two components is not always clear and so it is \n",
"prescient to focus more on building the Bayesian joint distribution.\n",
"\n",
"When first beginning the model development process our initial model should be \n",
"as simple as possible, perhaps ignoring many if not all of the systematic \n",
"effects of the environment in which the system of interest is contained. The\n",
"initial model should be just sophisticated enough to be able to answer the \n",
"scientific questions of interest in an idealized experiment.\n",
"\n",
"#### Build an Observational Model\n",
"\n",
"The observational model is built from many probability distributions over the \n",
"observation space, each defining a mathematical narrative of how observations\n",
"are generated. This collection of data generating processes should translate \n",
"the conceptual narrative produced in [Step One](#Step-One:-Conceptual-Analysis) into a more \n",
"rigorous probabilistic model.\n",
"\n",
"Often the observational model is a crude approximation to the complexity of the \n",
"true measurement process, but even crude approximations can be sufficient to \n",
"answer sophisticated questions in practice. Still, we want to keep the\n",
"approximate nature of our observational model, in particular the natural \n",
"extensions of the model to include for example heterogeneities, time\n",
"dependences, and other systematic effects, in mind.\n",
"\n",
"**Requirements:** Domain Expertise and Statistical Expertise\n",
"\n",
"#### Complete The Model With Prior Distributions\n",
"\n",
"In a Bayesian model the observational model is complemented with a prior \n",
"distribution over the space of all unknown model configuration parameters,\n",
"$\\pi_{\\mathcal{S}}(\\theta)$. The prior distribution does not need to \n",
"incorporate all of our available domain expertise, just enough to ensure that \n",
"the joint model,\n",
"$$\n",
"\\pi_{\\mathcal{S}}(y, \\theta) \n",
"= \n",
"\\pi_{\\mathcal{S}}(y \\mid \\theta) \\, \\pi_{\\mathcal{S}}(\\theta),\n",
"$$\n",
"is well-behaved (Gelman, Simpson, and Betancourt 2017). In practice it is often sufficient for the prior distribution to incorporate information about the _scales_, _units_, or _orders of magnitude_ which identify unreasonable but not impossible model configurations.\n",
"\n",
"Keep in mind that the separation of the Bayesian model into an observation\n",
"model and a prior model is not uniquely defined. Consider, for example, \n",
"unobserved latent structure, $\\pi_{\\mathcal{S}}(\\theta \\mid \\phi)$, such as that \n",
"arising in hierarchical models and hidden Markov models. We could equally well \n",
"define the observation model as \n",
"$\\pi_{\\mathcal{S}}(y \\mid \\theta) \\, \\pi_{\\mathcal{S}}(\\theta \\mid \\phi)$ with \n",
"the corresponding prior distribution $\\pi_{\\mathcal{S}}(\\phi)$, or the \n",
"observation model as $\\pi_{\\mathcal{S}}(y \\mid \\theta)$ with the prior \n",
"distribution $\\pi_{\\mathcal{S}}(\\theta \\mid \\phi) \\, \\pi_{\\mathcal{S}}(\\phi)$. \n",
"Which is more appropriate in a given application depends on what parameters one \n",
"expects to vary in the generation of new observations.\n",
"\n",
"Consequently the focus should be not on specifying prior distributions in\n",
"isolation but rather _completing_ the observation model developed\n",
"[above](#Build-an-Observational-Model) with necessary domain expertise about the model\n",
"configurations. This might take the form of independent, one-dimensional prior \n",
"distributions for each parameter or more sophisticated prior distributions that\n",
"incorporate multiple parameters. These joint priors often introduce their own \n",
"parameters that require another round of prior distributions. When building \n",
"sophisticated models it's probability distributions all the way down.\n",
"\n",
"**Requirements:** Domain Expertise and Statistical Expertise\n",
"\n",
"### Step Five: Identify New Summary Statistics\n",
"\n",
"Once a we have developed an explicit model we can reevaluate the context of\n",
"the aspirational model and modify existing summary statistics or introduce new\n",
"ones. \n",
"\n",
"If the current model is an expansion from a previous model then we can also \n",
"take this opportunity to exploit any information from failures of the previous\n",
"model or information about added features to motivate new summary statistics. \n",
"For example, if we add a new systematic effect then we might also introduce \n",
"summary statistics that are sensitive to the heterogeneity of that systematic \n",
"effect across configurations of the experiment, such as which individual \n",
"collecting the data or at which site the data were collected.\n",
"\n",
"### Step Six: Analyze the Joint Ensemble\n",
"\n",
"Before considering any observed data we first want to understand the \n",
"consequences of our modeling assumptions by answering \n",
"[Question One](#Question-One:-Domain-Expertise-Consistency), [Question Two](#Question-Two: Computational-Faithfulness), \n",
"and [Question Three](#Question-Three:-Model-Sensitivity).\n",
"\n",
"Fortunately this analysis is readily performed with samples of reasonable\n",
"model configurations and observations from the joint distribution, \n",
"$\\pi_{\\mathcal{S}}(y, \\theta)$. Typically we first simulate model configurations \n",
"from the prior distribution and then observations from the corresponding \n",
"data generating process,\n",
"$$\n",
"\\tilde{\\theta} \\sim \\pi_{\\mathcal{S}}(\\theta)\n",
"$$\n",
"$$\n",
"\\tilde{y} \\sim \\pi_{\\mathcal{S}}(y \\mid \\tilde{\\theta} ),\n",
"$$\n",
"although when the separation between observational and prior models isn't so\n",
"clear we can consider any means of sampling from the joint distribution.\n",
"\n",
"**Requirements:** Statistical Expertise\n",
"\n",
"#### Analyze the Prior Predictive Distribution\n",
"\n",
"Answers to [Question One](#Question-One:-Domain-Expertise-Consistency) are readily informed with prior\n",
"predictive checks of the summary statistics constructed in \n",
"[Step Three](#Step-Three:-Identify-Relevant-Summary-Statistics) or [Step Five](#Step-Five:-Identify-New-Summary-Statistics).\n",
"\n",
"If the prior predictive checks indicate conflict between the model and our \n",
"domain expertise then we have to return to [Step Four](#Step-Four:-Build-a-Model) and \n",
"modify our model, typically by incorporating additional domain expertise to \n",
"better constrain [the prior distribution](#Complete-The-Model-With-Prior-Distributions).\n",
"\n",
"**Requirements:** Domain Expertise\n",
"\n",
"#### Evaluate Simulated Fits\n",
"\n",
"For each simulated observation we can also construct a posterior distribution\n",
"with a given computational method, $\\pi_{\\mathcal{S}}(\\theta \\mid \\tilde{y})$, \n",
"or at least attempt as much. In particular, the distribution of reconstructed \n",
"posteriors allows to evaluate the accuracy of the computational method, \n",
"answering [Question Two](#Question-Two: Computational-Faithfulness).\n",
"\n",
"If the designated computational method features diagnostics, such as $\\hat{R}$ \n",
"for Markov chain Monte Carlo in general and divergences for Hamiltonian Monte \n",
"Carlo in particular, then we can look for failed diagnostics across the \n",
"distribution of posterior fits. Failures indicate that we need to consider \n",
"improving the tuning or implementation the computational method, or selecting \n",
"another method altogether, and then repeating the fits. \n",
"\n",
"Correlating failed diagnostics with the corresponding simulated model \n",
"configurations can also help to identify regions of the model configuration \n",
"space that may be ill-posed for practical computation. Moreover, the emergence \n",
"of these pathological fits implies extreme behavior in the posterior which often \n",
"hints at tension between the model and our domain expertise that may require \n",
"reevaluating [Step Four](#Step-Four:-Build-a-Model).\n",
"\n",
"**Requirements:** Statistical Expertise\n",
"\n",
"#### Implement Simulated-Based Calibration\n",
"\n",
"Even if our chosen computational method doesn't have its own diagnostics, we \n",
"can evaluate its performance by performing simulation-based calibration over the\n",
"joint sample. As above, if the performance of the method is unsatisfactory then \n",
"we need to consider retuning the method, or selecting another method altogether \n",
"and then repeating the fits.\n",
"\n",
"**Requirements:** Statistical Expertise\n",
"\n",
"#### Analyze General Model Sensitivity\n",
"\n",
"Once we trust the faithfulness of our reconstructed posterior distributions we\n",
"can consider the breadth of inferences they provide to answer \n",
"[Question Three](#Model-Sensitivity). In general we can always consider the \n",
"distribution of posterior $z$-scores against posterior shrinkages to identify \n",
"common pathologies in our inferences.\n",
"\n",
"If there are indications of undesired behavior, such as overfitting or\n",
"non-identifiability, then we might need to return to [Step One](#Step-One:-Conceptual-Analysis) \n",