-
Notifications
You must be signed in to change notification settings - Fork 0
/
Against_SPDs.Rmd
174 lines (147 loc) · 6.22 KB
/
Against_SPDs.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
---
title: "Why Not to Use SPDs"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Why Not to Use SPDs}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
bibliography: references.bib
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
dev='png',
fig.width=7,
fig.height=5.5 # ,
# dev.args=list(antialias = "none")
)
```
```{r setup}
library(carbondate)
```
## Summed Probability Distributions
Currently, the most commonly-used approach to summarise calendar age information from multiple ^14^C determinations is via summed probability distributions (SPD). These are **not statistically valid** estimators of the calendar age of a potential future sample. It is our view that they should not be used in any dates-as-data approach to provide a population proxy.
For an SPD, the posterior calendar age density of each object is first calculated independently) from the others as using the function above. These individual densities are then summed/averaged to give an SPD estimate. The independence assumed in the separate calibration of each sample, followed by subsequent summarisation, generates a contradiction.
Additionally, the SPDs approach fundamentally does not model the samples in the calendar age domain. Consequently, it is also not able to deal with inversions in the calibration curve where there are multiple disjoint calendar periods which are consistent with the observed determinations; or with plateau periods.
The SPD function is **ONLY** provided here as a comparison with the other routines. To calculate the SPD for a set of radiocarbon determinations (here we use the example dataset `armit` [@armit2014]) see the example below, where we also plot the results.
```{r find_spd, out.width="100%"}
spd <- FindSummedProbabilityDistribution(
calendar_age_range_BP = c(1000, 4500),
rc_determinations = armit$c14_age,
rc_sigmas = armit$c14_sig,
F14C_inputs = FALSE,
calibration_curve = intcal20,
plot_output = TRUE)
```
**Note:** The summary functions for plotting the predictive joint calendar age density using the rigorous Bayesian non-parametric alternative to SPDs (described in the vignette [Non-parametric Summed Density](Non-parametric-summed-density.html)) can also optionally plot the SPD without having to calculate it separately first.
## Illustration of why not to use SPDs
### Fitting to two Normals
Consider the simulated `two_normals` dataset. We know that the calendar age density that underlies these ^14^C value is a mixture of two normal densities - one centred at 3500 cal yr BP (with a 1$\sigma$ standard deviation of 200 cal yrs); and another (more concentrated) centred at 5000 cal yr BP (with a 1$\sigma$ standard deviation of 100 cal yrs). However, when we calculate the SPD we obtain:
```{r two_normals_spd, echo = 2, out.width="100%"}
oldpar <- par(no.readonly = TRUE)
two_normals_spd <- FindSummedProbabilityDistribution(
calendar_age_range_BP=c(2500, 7000),
rc_determinations= two_normals$c14_age,
rc_sigmas = two_normals$c14_sig,
calibration_curve=intcal20,
plot_output = TRUE)
par(new = TRUE,
mgp = c(3, 0.7, 0),
xaxs = "i",
yaxs = "i",
mar = c(5, 4.5, 4, 2) + 0.1,
las = 1)
xlim <- rev(range(two_normals_spd$calendar_age_BP))
ylim <- c(0, 3 * max(two_normals_spd$probability))
plot(
NULL,
NULL,
type = "n",
ylim = ylim,
xlim = xlim,
axes = FALSE,
xlab = NA,
ylab = NA,
xaxs = "i",
yaxs = "i")
# Show true underlying calendar age density
weights_true <- c(0.45, 0.55)
cluster_means_true_calBP <- c(3500, 5000)
cluster_precisions_true <- 1 / c(200, 100)^2
# Find and plot true exact density
truedens <- function(t, w, truemean, trueprec) {
dens <- 0
for(i in 1:length(w)) {
dens <- dens + w[i] * dnorm(t, mean = truemean[i], sd = 1/sqrt(trueprec[i]))
}
dens
}
curve(truedens(
x,
w = weights_true,
truemean = cluster_means_true_calBP,
trueprec = cluster_precisions_true),
from = 2500, to = 7000, n = 401,
lwd = 2,
col = "red", add = TRUE)
# Reset plotting parameters
par(oldpar)
```
Here we have manually overlain the true (known) shared calendar age density in red. As we can see, the SPD captures does capture some broad features but does not reconstruct the truth well, and is hard to interpret. In particular, the SPD is highly variable, showing multiple peaks, due to the wiggliness of the calibration curve. The SPD peak shown around 5300 cal yr BP is entirely spurious, yet almost of the same magnitude as its peak around 3500 cal yr BP (which is a part of the genuine density).
### An improvement using our library approaches
While this is jumping forward somewhat, to evidence that our methods provide better reconstructions, we run the same example using our Bayesian non-parametric summarisation approach (shown in purple as _Polya Urn_) and obtain:
```{r two_normals_compare, echo = FALSE, out.width="100%"}
oldpar <- par(no.readonly = TRUE)
polya_urn_output <- PolyaUrnBivarDirichlet(
rc_determinations = two_normals$c14_age,
rc_sigmas = two_normals$c14_sig,
calibration_curve=intcal20,
n_iter = 1e4,
show_progress = FALSE)
two_normals_DPMM <- PlotPredictiveCalendarAgeDensity(
output_data = polya_urn_output,
show_SPD = TRUE)
par(new = TRUE,
mgp = c(3, 0.7, 0),
xaxs = "i",
yaxs = "i",
mar = c(5, 4.5, 4, 2) + 0.1,
las = 1)
xlim <- rev(range(two_normals_DPMM[[1]]$calendar_age_BP))
ylim <- c(0, 3 * max(two_normals_DPMM[[1]]$density_mean))
plot(
NULL,
NULL,
type = "n",
ylim = ylim,
xlim = xlim,
axes = FALSE,
xlab = NA,
ylab = NA,
xaxs = "i",
yaxs = "i")
# Show true underlying calendar age density
weights_true <- c(0.45, 0.55)
cluster_means_true_calBP <- c(3500, 5000)
cluster_precisions_true <- 1 / c(200, 100)^2
# Find and plot true exact density
truedens <- function(t, w, truemean, trueprec) {
dens <- 0
for(i in 1:length(w)) {
dens <- dens + w[i] * dnorm(t, mean = truemean[i], sd = 1/sqrt(trueprec[i]))
}
dens
}
curve(truedens(
x,
w = weights_true,
truemean = cluster_means_true_calBP,
trueprec = cluster_precisions_true),
from = 2500, to = 7000, n = 401,
lwd = 2,
col = "red", add = TRUE)
# Reset plotting parameters
par(oldpar)
```
# References