Skip to content

Commit

Permalink
Merge pull request #1324 from cynkra/f-1272-flatten-recursive
Browse files Browse the repository at this point in the history
api: deprecate `dm_squash_to_tbl()` and instead provide argument `.recursive` for `dm_flatten_to_tbl()`; rename `start` to `.start` and `join` to `.join` (#1272)
  • Loading branch information
krlmlr committed Jul 21, 2022
2 parents c8de85c + 4a69bf6 commit 6eac8f8
Show file tree
Hide file tree
Showing 12 changed files with 141 additions and 136 deletions.
8 changes: 4 additions & 4 deletions R/error-helpers.R
Expand Up @@ -162,7 +162,7 @@ abort_tables_not_reachable_from_start <- function() {
}

error_txt_tables_not_reachable_from_start <- function() {
glue("All selected tables must be reachable from `start`.")
glue("All selected tables must be reachable from `.start`.")
}


Expand Down Expand Up @@ -287,9 +287,9 @@ abort_only_parents <- function() {

error_txt_only_parents <- function() {
paste0(
"When using `dm_join_to_tbl()` or `dm_flatten_to_tbl()` all join partners of table `start` ",
"When using `dm_join_to_tbl()` or `dm_flatten_to_tbl()` all join partners of table `.start` ",
"have to be its direct neighbors. For 'flattening' with `left_join()`, `inner_join()` or `full_join()` ",
"use `dm_squash_to_tbl()` as an alternative."
"use `dm_flatten_to_tbl(.recursive = TRUE)` as an alternative."
)
}

Expand Down Expand Up @@ -323,7 +323,7 @@ abort_squash_limited <- function() {
}

error_txt_squash_limited <- function() {
"`dm_squash_to_tbl()` only supports join methods `left_join`, `inner_join`, `full_join`."
"`dm_flatten_to_tbl(.recursive = TRUE)` only supports join methods `left_join`, `inner_join`, `full_join`."
}

abort_apply_filters_first <- function(join_name) {
Expand Down
84 changes: 30 additions & 54 deletions R/flatten.R
@@ -1,103 +1,79 @@
#' Flatten a part of a `dm` into a wide table
#'
#' `dm_flatten_to_tbl()` and `dm_squash_to_tbl()` gather all information of interest in one place in a wide table.
#' Both functions perform a disambiguation of column names and a cascade of joins.
#' `dm_flatten_to_tbl()` gathers all information of interest in one place in a wide table.
#' It performs a disambiguation of column names and a cascade of joins.
#'
#' @inheritParams dm_join_to_tbl
#' @param start The table from which all outgoing foreign key relations are considered
#' @param .start The table from which all outgoing foreign key relations are considered
#' when establishing a processing order for the joins.
#' An interesting choice could be
#' for example a fact table in a star schema.
#' @param ...
#' `r lifecycle::badge("experimental")`
#'
#' Unquoted names of the tables to be included in addition to the `start` table.
#' Unquoted names of the tables to be included in addition to the `.start` table.
#' The order of the tables here determines the order of the joins.
#' If the argument is empty, all tables that can be reached will be included.
#' Only `dm_squash_to_tbl()` allows using tables that are not direct neighbors of `start`.
#' `tidyselect` is supported, see [dplyr::select()] for details on the semantics.
#' @param .recursive Logical, defaults to `FALSE`. Should not only parent tables be joined to `.start`, but also their ancestors?
#' @param .join The type of join to be performed, see [dplyr::join()].
#' @family flattening functions
#'
#' @details
#' With `...` left empty, this function will join together all the tables of your [`dm`]
#' object that can be reached from the `start` table, in the direction of the foreign key relations
#' object that can be reached from the `.start` table, in the direction of the foreign key relations
#' (pointing from the child tables to the parent tables), using the foreign key relations to
#' determine the argument `by` for the necessary joins.
#' The result is one table with unique column names.
#' Use the `...` argument if you would like to control which tables should be joined to the `start` table.
#' Use the `...` argument if you would like to control which tables should be joined to the `.start` table.
#'
#' How does filtering affect the result?
#'
#' **Case 1**, either no filter conditions are set in the `dm`, or set only in the part that is unconnected to the `start` table:
#' The necessary disambiguations of the column names are performed first.
#' Then all involved foreign tables are joined to the `start` table successively, with the join function given in the `join` argument.
#'
#' **Case 2**, filter conditions are set for at least one table that is connected to `start`:
#' First, disambiguation will be performed if necessary. The `start` table is then calculated using `dm[[start]]`.
#' This implies
#' that the effect of the filters on this table is taken into account.
#' For `right_join`, `full_join` and `nest_join`, an error
#' is thrown if any filters are set because filters will not affect the right hand side tables and the result will therefore be
#' incorrect in general (calculating the effects on all RHS-tables would also be time-consuming, and is not supported;
#' if desired, call `dm_apply_filters()` first to achieve that effect).
#' For all other join types, filtering only the `start` table is enough because the effect is passed on by
#' successive joins.
#'
#' Mind that calling `dm_flatten_to_tbl()` with `join = right_join` and no table order determined in the `...` argument
#' will not lead to a well-defined result if two or more foreign tables are to be joined to `start`.
#' Mind that calling `dm_flatten_to_tbl()` with `.join = right_join` and no table order determined in the `...` argument
#' will not lead to a well-defined result if two or more foreign tables are to be joined to `.start`.
#' The resulting
#' table would depend on the order the tables that are listed in the `dm`.
#' Therefore, trying this will result in a warning.
#'
#' Since `join = nest_join()` does not make sense in this direction (LHS = child table, RHS = parent table: for valid key constraints
#' Since `.join = nest_join` does not make sense in this direction (LHS = child table, RHS = parent table: for valid key constraints
#' each nested column entry would be a tibble of one row), an error will be thrown if this method is chosen.
#'
#' The difference between `dm_flatten_to_tbl()` and `dm_squash_to_tbl()` is
#' The difference between `.recursive = FALSE` and `.recursive = TRUE` is
#' the following (see the examples):
#'
#' - `dm_flatten_to_tbl()` allows only one level of hierarchy
#' (i.e., direct neighbors to table `start`), while
#' - `.recursive = FALSE` allows only one level of hierarchy
#' (i.e., direct neighbors to table `.start`), while
#'
#' - `dm_squash_to_tbl()` will go through all levels of hierarchy while joining.
#' - `.recursive = TRUE` will go through all levels of hierarchy while joining.
#'
#' Additionally, these functions differ from `dm_wrap_tbl()`, which always
#' returns a `dm` object.
#'
#' @return A single table that results from consecutively joining all affected tables to the `start` table.
#' @return A single table that results from consecutively joining all affected tables to the `.start` table.
#'
#' @examples
#'
#' dm_financial() %>%
#' dm_select_tbl(-loans) %>%
#' dm_flatten_to_tbl(start = cards)
#' dm_flatten_to_tbl(.start = cards)
#'
#' dm_financial() %>%
#' dm_select_tbl(-loans) %>%
#' dm_squash_to_tbl(start = cards)
#' dm_flatten_to_tbl(.start = cards, .recursive = TRUE)
#'
#' @export
dm_flatten_to_tbl <- function(dm, start, ..., join = left_join) {
check_not_zoomed(dm)
join_name <- as_label(enexpr(join))
start <- dm_tbl_name(dm, {{ start }})
dm_flatten_to_tbl_impl(dm, start, ..., join = join, join_name = join_name, squash = FALSE)
}

#' @rdname dm_flatten_to_tbl
#' @export
dm_squash_to_tbl <- function(dm, start, ..., join = left_join) {
dm_flatten_to_tbl <- function(dm, .start, ..., .recursive = FALSE, .join = left_join) {
check_not_zoomed(dm)
join_name <- as_label(enexpr(join))
if (!(join_name %in% c("left_join", "full_join", "inner_join"))) abort_squash_limited()
start <- dm_tbl_name(dm, {{ start }})
dm_flatten_to_tbl_impl(dm, start, ..., join = join, join_name = join_name, squash = TRUE)
}
join_name <- as_label(enexpr(.join))
if (.recursive && !(join_name %in% c("left_join", "full_join", "inner_join"))) abort_squash_limited()

start <- dm_tbl_name(dm, {{ .start }})

dm_flatten_to_tbl_impl <- function(dm, start, ..., join, join_name, squash, .position = "suffix") {
vars <- setdiff(src_tbls_impl(dm), start)
list_of_pts <- eval_select_table(quo(c(...)), vars)

dm_flatten_to_tbl_impl(dm, start, list_of_pts, join = .join, join_name = join_name, squash = .recursive)
}

dm_flatten_to_tbl_impl <- function(dm, start, list_of_pts, join, join_name, squash, .position = "suffix") {
if (join_name == "nest_join") abort_no_flatten_with_nest_join()

force(join)
Expand Down Expand Up @@ -146,10 +122,10 @@ dm_flatten_to_tbl_impl <- function(dm, start, ..., join, join_name, squash, .pos
squash
)

# rename dm and replace table `start` by its filtered, renamed version
# rename dm and replace table `.start` by its filtered, renamed version
prep_dm <- prepare_dm_for_flatten(dm, order_df$name, gotta_rename, position = .position)

# Drop the first table in the list of join partners. (We have at least one table, `start`.)
# Drop the first table in the list of join partners. (We have at least one table, `.start`.)
# (Working with `reduce2()` here and the `.init`-argument is the first table)
# in the case of only one table in the `dm` (table "start"), all code below is a no-op
order_df <- order_df[-1, ]
Expand Down Expand Up @@ -194,7 +170,7 @@ dm_join_to_tbl <- function(dm, table_1, table_2, join = left_join) {
start <- rel$child_table
other <- rel$parent_table

dm_flatten_to_tbl_impl(dm, start, !!other, join = join, join_name = join_name, squash = FALSE, .position = "prefix")
dm_flatten_to_tbl_impl(dm, start, other, join = join, join_name = join_name, squash = FALSE, .position = "prefix")
}

parent_child_table <- function(dm, table_1, table_2) {
Expand Down Expand Up @@ -244,7 +220,7 @@ check_flatten_to_tbl <- function(join_name,


# If called by `dm_join_to_tbl()` or `dm_flatten_to_tbl()`, the argument `squash = FALSE`.
# Then only one level of hierarchy is allowed (direct neighbors to table `start`).
# Then only one level of hierarchy is allowed (direct neighbors to table `.start`).
if (!squash && has_grandparent) {
abort_only_parents()
}
Expand Down
2 changes: 1 addition & 1 deletion R/learn.R
Expand Up @@ -95,7 +95,7 @@ dm_learn_from_db <- function(dest, dbname = NA, schema = NULL, name_format = "{t
arrange(ordinal_position) %>%
dm_update_zoomed() %>%
dm_select_tbl(-table_constraints) %>%
dm_squash_to_tbl(key_column_usage) %>%
dm_flatten_to_tbl(key_column_usage, .recursive = TRUE) %>%
select(constraint_catalog, constraint_schema, constraint_name, dm_name, column_name) %>%
group_by(constraint_catalog, constraint_schema, constraint_name, dm_name) %>%
summarize(pks = list(tibble(column = list(column_name)))) %>%
Expand Down
36 changes: 33 additions & 3 deletions R/zzx-deprecated.R
Expand Up @@ -212,7 +212,11 @@ cdm_flatten_to_tbl <- function(dm, start, ..., join = left_join) {
deprecate_soft("0.1.0", "dm::cdm_flatten_to_tbl()", "dm::dm_flatten_to_tbl()")
join_name <- deparse(substitute(join))
start <- dm_tbl_name(dm, {{ start }})
dm_flatten_to_tbl_impl(dm, start, ..., join = join, join_name = join_name, squash = FALSE, .position = "prefix")

vars <- setdiff(src_tbls_impl(dm), start)
list_of_pts <- eval_select_table(quo(c(...)), vars)

dm_flatten_to_tbl_impl(dm, start, list_of_pts, join = join, join_name = join_name, squash = FALSE, .position = "prefix")
}

#' @rdname deprecated
Expand All @@ -223,7 +227,11 @@ cdm_squash_to_tbl <- function(dm, start, ..., join = left_join) {
join_name <- deparse(substitute(join))
if (!(join_name %in% c("left_join", "full_join", "inner_join"))) abort_squash_limited()
start <- dm_tbl_name(dm, {{ start }})
dm_flatten_to_tbl_impl(dm, start, ..., join = join, join_name = join_name, squash = TRUE)

vars <- setdiff(src_tbls_impl(dm), start)
list_of_pts <- eval_select_table(quo(c(...)), vars)

dm_flatten_to_tbl_impl(dm, start, list_of_pts, join = join, join_name = join_name, squash = TRUE, .position = "prefix")
}

#' @rdname deprecated
Expand All @@ -242,7 +250,7 @@ cdm_join_to_tbl <- function(dm, table_1, table_2, join = left_join) {
start <- rel$child_table
other <- rel$parent_table

dm_flatten_to_tbl_impl(dm, start, !!other, join = join, join_name = join_name, squash = FALSE, .position = "prefix")
dm_flatten_to_tbl_impl(dm, start, other, join = join, join_name = join_name, squash = FALSE, .position = "prefix")
}

#' @rdname deprecated
Expand Down Expand Up @@ -708,6 +716,28 @@ dm_bind <- function(..., repair = "check_unique", quiet = FALSE) {
new_dm3(new_def)
}

#' @description
#' `dm_squash_to_tbl()` is deprecated as of dm 1.0.0, because the same functionality
#' is offered by [dm_flatten_to_tbl()] with `recursive = TRUE`.
#'
#' @rdname deprecated
#' @keywords internal
#'
#' @export
dm_squash_to_tbl <- function(dm, start, ..., join = left_join) {
deprecate_soft("1.0.0", "dm_squash_to_tbl()", details = "Please use `recursive = TRUE` in `dm_flatten_to_tbl()` instead.")

check_not_zoomed(dm)
join_name <- as_label(enexpr(join))
if (!(join_name %in% c("left_join", "full_join", "inner_join"))) abort_squash_limited()
start <- dm_tbl_name(dm, {{ start }})

vars <- setdiff(src_tbls_impl(dm), start)
list_of_pts <- eval_select_table(quo(c(...)), vars)

dm_flatten_to_tbl_impl(dm, start, list_of_pts, join = join, join_name = join_name, squash = TRUE, .position = "prefix")
}

#' @description
#' `rows_truncate()` is deprecated as of dm 1.0.0, because it's a DDL operation
#' and requires different permissions than the `dplyr::rows_*()` functions.
Expand Down
6 changes: 6 additions & 0 deletions man/deprecated.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 6eac8f8

Please sign in to comment.