Skip to content

Commit

Permalink
data resolution tools
Browse files Browse the repository at this point in the history
  • Loading branch information
stevencarlislewalker committed Jun 25, 2024
1 parent c44a832 commit 8bc5f91
Show file tree
Hide file tree
Showing 6 changed files with 95 additions and 8 deletions.
2 changes: 2 additions & 0 deletions R/iidda/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ export(summarise_locations)
export(summarise_periods)
export(summarise_strings)
export(test_result)
export(time_series_islands)
export(tracking_table_keys)
export(tracking_tables_with_column)
export(two_field_format)
Expand All @@ -123,6 +124,7 @@ export(xlsx_to_csv)
importFrom(dplyr,`%>%`)
importFrom(dplyr,across)
importFrom(dplyr,anti_join)
importFrom(dplyr,arrange)
importFrom(dplyr,between)
importFrom(dplyr,bind_rows)
importFrom(dplyr,distinct)
Expand Down
49 changes: 43 additions & 6 deletions R/iidda/R/data_prep_tools.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,23 @@
## operations
##
## * get longest time scale in each group
## * get shortest time scale in each group
## * flatten disease hierarchy in each group
## * distribute counts evenly ...
## * sum counts ...
## * find nesting time-scale
##
## questions
##
## * can we put the above (or more) operations together so that
## we remove double counting for the full archive?
## * if so, can we make an interface that clarifies for the user
## how to make alternative choices?
##




#' Write Tidy Digitized Data and Metadata
#'
#' @param tidy_data Data frame of prepared data that
Expand Down Expand Up @@ -685,19 +705,36 @@ is_leaf_disease = function(disease, nesting_disease) !disease %in% unique(nestin
#' grouping to determine the context.
#' @param basal_diseases_to_prune Character vector of `disease`s to
#' remove from `data`.
#' @param specials_pattern Optional regular expression to use to match
#' `disease` names in `data` that should be added to the lookup table. This
#' is useful for disease names that are not historical and produced for
#' harmonization purposes. The most common example is `"_unaccounted$"`,
#' which is the default. Setting this argument to `NULL` avoids adding
#' any special disease names to the lookup table.
#'
#' @export
flatten_disease_hierarchy = function(data
, disease_lookup
, grouping_columns = c("period_start_date", "period_end_date", "location")
, basal_diseases_to_prune = character()
, specials_pattern = "_unaccounted$"
) {
disease_lookup =
(disease_lookup
|> select(disease, nesting_disease)
|> distinct())
pruned_lookup =
(disease_lookup

# only need the lookup table to infer the hierarchy
disease_lookup = (disease_lookup
|> select(disease, nesting_disease)
|> distinct()
)

if (!is.null(specials_pattern)) {
specials = (data
|> filter(grepl(specials_pattern, canmod_cdi_api$disease))
|> select(disease, nesting_disease)
|> distinct()
)
disease_lookup = bind_rows(disease_lookup, specials)
}
pruned_lookup = (disease_lookup
|> filter(!disease %in% basal_diseases_to_prune)
|> mutate(nesting_disease = ifelse(
nesting_disease %in% basal_diseases_to_prune
Expand Down
19 changes: 19 additions & 0 deletions R/iidda/R/time_series_tools.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#' Time Series Islands
#'
#' Find 'island rows' in a dataset with ordered rows. Islands have a
#' series variable that is not `NA` surrounded by `NA` values in that
#' same variable.
#'
#' @param data A dataset (must be ordered if `time_variable` is `NULL`).
#' @param series_variable Name of a series variable.
#' @param time_variable Optional variable to use for ordering the dataset
#' before islands are located.
#'
#' @importFrom dplyr arrange mutate
#' @export
time_series_islands = function(data, series_variable, time_variable = NULL) {
if (!is.null(time_variable)) data = arrange(data, get(time_variable))
filter(data,
is.na(lag(get(series_variable))) & is.na(lead(get(series_variable)))
)
}
2 changes: 1 addition & 1 deletion R/iidda/man/filter_out_time_scales.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 9 additions & 1 deletion R/iidda/man/flatten_disease_hierarchy.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions R/iidda/man/time_series_islands.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 8bc5f91

Please sign in to comment.