New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aoa()
appears to return incorrect thresholds (different from Meyer & Pebesma 2021)
#46
Comments
Looking at the source of > grDevices::boxplot.stats
function (x, coef = 1.5, do.conf = TRUE, do.out = TRUE)
{
if (coef < 0)
stop("'coef' must not be negative")
nna <- !is.na(x)
n <- sum(nna)
stats <- stats::fivenum(x, na.rm = TRUE)
iqr <- diff(stats[c(2, 4)])
if (coef == 0)
do.out <- FALSE
else {
out <- if (!is.na(iqr)) {
x < (stats[2L] - coef * iqr) | x > (stats[4L] + coef *
iqr)
}
else !is.finite(x)
if (any(out[nna], na.rm = TRUE))
stats[c(1, 5)] <- range(x[!out], na.rm = TRUE)
}
conf <- if (do.conf)
stats[3L] + c(-1.58, 1.58) * iqr/sqrt(n)
list(stats = stats, n = n, conf = conf, out = if (do.out) x[out &
nna] else numeric())
} The relevant lines here are: out <- if (!is.na(iqr)) {
x < (stats[2L] - coef * iqr) | x > (stats[4L] + coef * iqr)
}
#> [...]
stats[c(1, 5)] <- range(x[!out], na.rm = TRUE) In context, that means that the AOA threshold winds up equaling: max(di[!(di > (quantile(di, 0.75) + 1.5 * IQR(di)))]) Which means that the threshold is going to be the value in di closest to, but not more than |
Thanks for finding that! I fixed it according to your suggestion. |
Hi all,
Adapting some code from the MEE-AOA repo, I believe I can calculate an AOA like this:
According to the 2021 paper, I believe the AOA threshold after this should be equal to "the 75-percentile plus 1.5 times the IQR of the DI values of the cross-validated training data". Calculating that using
quantile
andIQR
gives us these results:But the AOA threshold returned by
aoa()
doesn't match that calculation:If I'm right and this is unexpected, it seems to be due to the use of
boxplot.stats()
here:CAST/R/trainDI.R
Line 221 in afcba3f
That gives us the threshold that CAST returns:
But I'm not entirely sure what
boxplot.stats()
actually does. For instance, imagine that we cut off the last di value in our vector:Because it's a rather low number, both our 75% percentile and IQR increase:
But
boxplot.stats()
returns the same value as before:Created on 2022-12-11 by the reprex package (v2.0.1)
Apologies if I'm misunderstanding something here! The return here just didn't match my expectations.
The text was updated successfully, but these errors were encountered: