-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Getting conversation started about a proposal for functionality to correct outliers (assuming said outliers and the corrected values have been provided by some other functionality to be filed in a separate issue).
correct_outliers function
This is the function package users would interact with.
#' Correct and redistribute outliers in an `epi_signal`.
#'
#' Outliers are replaced with a corrected value. Optionally, the difference
#' between the initial outlying value and the corrected value may be
#' redistributed to other times.
#'
#' @param x An `epi_signal` data frame to correct
#' @param outliers A data frame specifying outliers and corrections to make
#' with the following columns:
#' * `<key name>`: a column for each column in the `epi_signal` key, specifying
#' values identifying the series to adjust
#' * `index` <date>: the date of the outlier
#' * `signal_name` <str>: the name of the signal (column) in `x` where the
#' outlier is found
#' * `replacement` <num>: the corrected value that will replace the outlier
#' * `redistribution_strategy` <str>, optional: specification of how the
#' difference between the initial value and the corrected value should be
#' redistributed to other times. This may be `"none"` to do no redistribution,
#' `"prop"` to redistribute proportionally to existing values (a larger
#' amount of the difference is distributed to dates where the signal has
#' larger values), `"zeros"` to redistribute to dates where the signal has a
#' value of 0, or `"equal"` to redistribute equally across a range of dates.
#' If this column is not provided or the value is missing in a given row, a
#' default strategy of `"none"` will be used.
#' * `redistribution_start` <date>: the earliest index value to which we may
#' redistribute a difference between the initial value and corrected value.
#' If this column is not provided or the value is missing in a given row, a
#' default of the earliest index value for the given `key` will be used.
#' * `redistribution_end` <date>: the last index value to which we may
#' redistribute a difference between the initial value and corrected value.
#' If this column is not provided or the value is missing in a given row, a
#' default of the `index` of the outlier minus one day will be used.
#'
#' @return a new `epi_signal` with corrected outliers
#'
#' @export
correct_outliers = function(x, outliers) {
}
This function validates its arguments and then iterates through the rows of the outliers
argument, correcting the outliers sequentially (in the order they are provided by the user) by calling correct_one_outlier
(see below).
correct_one_outlier function
This function is not exported; it is called by correct_outliers
. Maybe want to enforce that the signal passed to this function has only one value for the key, i.e., do subsetting in correct_outliers
?
#' Correct and redistribute one outlier in an `epi_signal`.
#'
#' The outlier is replaced with a corrected value. Optionally, the difference
#' between the initial outlying value and the corrected value may be
#' redistributed to other times.
#'
#' @param x An `epi_signal` to correct
#' @param index <date>: the date of the outlier
#' @param signal_name <str>: the name of the signal (column) in `x` where the
#' outlier is found
#' @param replacement <num>: the corrected value that will replace the outlier
#' @param redistribution_strategy <str>, optional: specification of how the
#' difference between the initial value and the corrected value should be
#' redistributed to other times. This may be `"none"` to do no redistribution,
#' `"prop"` to redistribute proportionally to existing values (a larger
#' amount of the difference is distributed to dates where the signal has
#' larger values), `"equal"` to redistribute equally across a range of dates,
#' or `"zeros"` to redistribute to dates where the signal has a value of 0,
#' distributing equally among those dates.
#' If this column is not provided or the value is missing in a given row, a
#' default strategy of `"none"` will be used.
#' @param redistribution_start <date>: the earliest index value to which we may
#' redistribute a difference between the initial value and corrected value.
#' If this column is not provided or the value is missing in a given row, a
#' default of the earliest index value for the given `key` will be used.
#' @param redistribution_end <date>: the last index value to which we may
#' redistribute a difference between the initial value and corrected value.
#' If this column is not provided or the value is missing in a given row, a
#' default of the `index` of the outlier minus one day will be used.
#'
#' @param return a new `epi_signal` with a corrected_outlier.
correct_one_outlier = function(x, index, signal_name, replacement,
redistribution_strategy = "none",
redistribution_start = min(x$index),
redistribution_end = index - 1) {
}
distribute_prop function
This function is not exported; it is called by correct_one_outlier
.
#' Redistribute a value over a specified time range, proportionally to the
#' reported values on those dates.
#'
#' @param x an `epi_signal` to update
#' @param signal_name: the name of the signal (column) in `x` that will be
#' updated
#' @param value the amount to redistribute
#' @param start the earliest index value to which we may redistribute some part
#' of the `value`.
#' @param end the last index value to which we may redistribute some part of the
#' `value`.
#'
#' @return an updated `epi_signal`
distribute_prop = function(x, signal_name, value, start, end) {
}
distribute_equal function
This function is not exported; it is called by correct_one_outlier
.
#' Redistribute a value over a specified time range, distributing the value
#' across those dates approximately evenly.
#'
#' @param x an `epi_signal` to update
#' @param signal_name: the name of the signal (column) in `x` that will be
#' updated
#' @param value the amount to redistribute
#' @param start the earliest index value to which we may redistribute some part
#' of the `value`.
#' @param end the last index value to which we may redistribute some part of the
#' `value`.
#'
#' @return an updated `epi_signal`
distribute_equal = function(x, signal_name, value, start, end) {
}
distribute_zeros function
This function is not exported; it is called by correct_one_outlier
.
Might need to change this function to work based on a specified set of index dates, in case there are multiple outliers we'd like to redistribute to zero dates.
#' Redistribute a value over a specified time range, only updating dates with
#' reported values of 0. The value is evenly distributed across those dates.
#'
#' @param x an `epi_signal` to update
#' @param signal_name: the name of the signal (column) in `x` that will be
#' updated
#' @param value the amount to redistribute
#' @param start the earliest index value to which we may redistribute some part
#' of the `value`.
#' @param end the last index value to which we may redistribute some part of the
#' `value`.
#'
#' @return an updated `epi_signal`
distribute_zeros = function(x, signal_name, value, start, end) {
}