-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
P3very low priorityvery low prioritydocumentationImprovements or additions to documentationImprovements or additions to documentation
Description
It seems like detect_outlr_stl() requires gap filling. For example, if we manually remove two rows from the below epi_df & pop that into detect_outlr_stl(), then we inevitably get an error about the data containing implicit gaps in time. In contrast, if we use the same data in detect_outlr_rm(), there are no complaints from that function (likely due to epi_slide). So, there should be strong documentation that explains how missing rows are handled by each function. As well, we should probably update the epi_slide vignette to explain how it handles missing rows of data (which is not likely to be uncommon for users of this package).
Ex. showing error in detect_outlr_stl() and no problem in detect_outlr_rm()
library(epidatr)
library(epiprocess)
library(dplyr)
library(tidyr)
# Load #s of new confirmed COVID-19 cases, daily, for FL
# over a fairly large time window
x <- covidcast(
data_source = "jhu-csse",
signals = "confirmed_incidence_num",
time_type = "day",
geo_type = "state",
time_values = epirange(20200601, 20220601),
geo_values = "fl",
as_of = 20220606
) %>%
fetch_tbl() %>%
select(geo_value, time_value, cases = value) %>%
as_epi_df()
x<- x[-c(2,10),] # Remove some rows from x
y = x$cases
x = x$time_value
# The below should all be the default values from detect_outlr_stl()
n_trend = 21
n_seasonal = 21
n_threshold = 21
seasonal_period = NULL
log_transform = FALSE
detect_negatives = FALSE
detection_multiplier = 2.5
min_radius = 0
replacement_multiplier = 0
# Below is the first part of the detect_outlr_stl() function
# Transform if requested
if (log_transform) {
# Replace all negative values with 0
y = pmax(0, y)
offset = as.integer(any(y == 0))
y = log(y + offset)
}
# Make a tsibble for fabletools, setup and run STL
z_tsibble = tsibble::tsibble(x = x, y = y, index = x)
stl_formula = y ~ trend(window = n_trend) +
season(period = seasonal_period, window = n_seasonal)
stl_components = z_tsibble %>%
fabletools::model(feasts::STL(stl_formula, robust = TRUE)) %>%
generics::components() %>%
tibble::as_tibble() %>%
dplyr::select(trend:remainder) %>%
dplyr::rename_with(~ "seasonal", tidyselect::starts_with("season")) %>%
dplyr::rename(resid = remainder)
# Now, the same data when inputted into detect_outlr_rm() has no apparent problem
x <- covidcast(
data_source = "jhu-csse",
signals = "confirmed_incidence_num",
time_type = "day",
geo_type = "state",
time_values = epirange(20200601, 20220601),
geo_values = "fl",
as_of = 20220606
) %>%
fetch_tbl() %>%
select(geo_value, time_value, cases = value) %>%
as_epi_df()
x<- x[-c(2,10),] # Remove some rows from x
x <- x %>%
group_by(geo_value) %>%
mutate(outlier_info = detect_outlr_rm(
x = time_value, y = cases),
detection_multiplier = 2.5) %>% #%% change this to something larger potentially or nah?
unnest(outlier_info)
x
Metadata
Metadata
Assignees
Labels
P3very low priorityvery low prioritydocumentationImprovements or additions to documentationImprovements or additions to documentation