Differences between get_predicted and get_predicted_ci for mixed mode…

…ls (#814) * Differences between get_predicted and get_predicted_ci for mixed models Fixes #797 * news, version * docs * this is intentional, turn styler off * typos
easystats · Sep 27, 2023 · d8c6e08 · d8c6e08
1 parent 0ef5213
commit d8c6e08
Show file tree

Hide file tree

Showing 8 changed files with 88 additions and 17 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,7 +1,7 @@
 Type: Package
 Package: insight
 Title: Easy Access to Model Information for Various Model Objects
-Version: 0.19.5.4
+Version: 0.19.5.5
 Authors@R: 
     c(person(given = "Daniel",
              family = "Lüdecke",

diff --git a/NEWS.md b/NEWS.md
@@ -1,9 +1,18 @@
 # insight 0.19.6
 
+## General
+
+* Improved documentation for `get_predicted_ci()`.
+
 ## Changes to functions
 
 * `model_info()` now recognized ordered beta families.
 
+## Bug fixes
+
+* `find_transformation()` better detects power-transformation of the response
+  variable.
+
 # insight 0.19.5
 
 ## Bug fixes

diff --git a/R/get_predicted.R b/R/get_predicted.R
@@ -12,11 +12,14 @@
 #' with lots of caveats and complications. Read the 'Details' section for more
 #' information.
 #'
-#' `get_predicted_ci()` returns the confidence (or prediction) interval (CI)
+#' [`get_predicted_ci()`] returns the confidence (or prediction) interval (CI)
 #' associated with predictions made by a model. This function can be called
 #' separately on a vector of predicted values. `get_predicted()` usually
 #' returns confidence intervals (included as attribute, and accessible via the
-#' `as.data.frame()` method) by default.
+#' `as.data.frame()` method) by default. It is preferred to rely on the
+#' `get_predicted()` function for standard errors and confidence intervals -
+#' use `get_predicted_ci()` only if standard errors and confidence intervals
+#' are not available otherwise.
 #'
 #' @param x A statistical model (can also be a data.frame, in which case the
 #'   second argument has to be a model).
@@ -734,7 +737,7 @@ get_predicted.phylolm <- function(x,
 
     # Transform iterations
     if ("iterations" %in% names(attributes(predictions))) {
-      attr(predictions, "iterations") <- as.data.frame(sapply(attributes(predictions)$iterations, link_inv))
+      attr(predictions, "iterations") <- as.data.frame(sapply(attributes(predictions)$iterations, link_inv)) # nolint
     }
 
     # Transform to response "type"
@@ -744,7 +747,7 @@ get_predicted.phylolm <- function(x,
       predictions <- .get_predict_transform_response(predictions, response = response)
       if ("iterations" %in% names(attributes(predictions))) {
         attr(predictions, "iterations") <- as.data.frame(
-          sapply(
+          sapply( # nolint
             attributes(predictions)$iterations,
             .get_predict_transform_response,
             response = response

diff --git a/R/get_predicted_ci.R b/R/get_predicted_ci.R
@@ -2,12 +2,39 @@
 #'
 #' @inheritParams get_predicted
 #' @param predictions A vector of predicted values (as obtained by
-#'   `stats::fitted()`, `stats::predict()` or
-#'   [get_predicted()]).
+#'   `stats::fitted()`, `stats::predict()` or [get_predicted()]).
 #' @param se Numeric vector of standard error of predicted values. If `NULL`,
 #'   standard errors are calculated based on the variance-covariance matrix.
 #' @inheritParams get_predicted
 #'
+#' @details
+#' Typically, `get_predicted()` returns confidence intervals based on the standard
+#' errors as returned by the `predict()`-function, assuming normal distribution
+#' (`+/- 1.96 * SE`) resp. a Student's t-distribution (if degrees of freedom are
+#' available). If `predict()` for a certain class does _not_ return standard
+#' errors (for example, *merMod*-objects), these are calculated manually, based
+#' on following steps: matrix-multiply `X` by the parameter vector `B` to get the
+#' predictions, then extract the variance-covariance matrix `V` of the parameters
+#' and compute `XVX'` to get the variance-covariance matrix of the predictions.
+#' The square-root of the diagonal of this matrix represent the standard errors
+#' of the predictions, which are then multiplied by the critical test-statistic
+#' value (e.g., ~1.96 for normal distribution) for the confidence intervals.
+#'
+#' If `ci_type = "prediction"`, prediction intervals are calculated. These are
+#' wider than confidence intervals, because they also take into account the
+#' uncertainty of the model itself. Before taking the square-root of the
+#' diagonal of the variance-covariance matrix, `get_predicted_ci()` adds the
+#' residual variance to these values. For mixed models, `get_variance_residual()`
+#' is used, while `get_sigma()^2` is used for non-mixed models.
+#'
+#' It is preferred to rely on standard errors returned by `get_predicted()` (i.e.
+#' returned by the `predict()`-function), because these are more accurate than
+#' manually calculated standard errors. Use `get_predicted_ci()` only if standard
+#' errors are not available otherwise. An exception are Bayesian models or
+#' bootstrapped predictions, where `get_predicted_ci()` returns quantiles of the
+#' posterior distribution or bootstrapped samples of the predictions. These are
+#' actually accurate standard errors resp. confidence (or uncertainty) intervals.
+#'
 #' @examplesIf require("boot") && require("datawizard") && require("bayestestR")
 #' # Confidence Intervals for Model Predictions
 #' # ------------------------------------------
@@ -274,7 +301,7 @@ get_predicted_ci.bracl <- get_predicted_ci.mlm
         # for multiple length, SE and predictions may match, could be intended?
         # could there be any cases where we have twice or x times the length of
         # predictions as standard errors?
-        format_warning("Predictions and standard errors are not of the same length. Please check if you need the `data` argument.")
+        format_warning("Predictions and standard errors are not of the same length. Please check if you need the `data` argument.") # nolint
       } else {
         format_error("Predictions and standard errors are not of the same length. Please specify the `data` argument.")
       }
@@ -298,7 +325,7 @@ get_predicted_ci.bracl <- get_predicted_ci.mlm
     format_error("The `data` argument should be a data frame.")
   }
   mm <- get_modelmatrix(x, data = data)
-  out <- sapply(
+  out <- sapply( # nolint
     seq_len(nrow(mm)), function(i) {
       suppressMessages(
         lmerTest::contestMD(x, mm[i, , drop = FALSE], ddf = type)[["DenDF"]]

diff --git a/man/get_predicted.Rd b/man/get_predicted.Rd
diff --git a/man/get_predicted_ci.Rd b/man/get_predicted_ci.Rd
diff --git a/man/get_transformation.Rd b/man/get_transformation.Rd
diff --git a/tests/testthat/test-find_transformation.R b/tests/testthat/test-find_transformation.R
@@ -49,6 +49,7 @@ test_that("find_transformation - strange bayestestR example", {
 })
 
 test_that("find_transformation - detect powers", {
+  # styler: off
   data(iris)
   m1 <- lm(Sepal.Length^(1 / 2) ~ Species, data = iris)
   m2 <- lm(Sepal.Length^2 ~ Species, data = iris)
@@ -81,4 +82,5 @@ test_that("find_transformation - detect powers", {
   expect_identical(insight::find_transformation(m4), "power")
   expect_identical(insight::find_transformation(m5), "power")
   expect_identical(insight::find_transformation(m6), "power")
+  # styler: on
 })