Skip to content

tidyr 1.3.0

Compare
Choose a tag to compare
@hadley hadley released this 24 Jan 21:16

New features

  • New family of consistent string separating functions:
    separate_wider_delim(), separate_wider_position(),
    separate_wider_regex(), separate_longer_delim(), and
    separate_longer_position(). These functions are thorough refreshes of
    separate() and extract(), featuring improved performance, greater
    consistency, a polished API, and a new approach for handling problems. They
    use stringr and supersede extract(), separate(), and separate_rows()
    (#1304).

  • nest() gains a .by argument which allows you to specify the columns to
    nest by (rather than the columns to nest, i.e. through ...). Additionally,
    the .key argument is no longer deprecated, and is used whenever ... isn't
    specified (#1458).

  • unnest_longer() gains a keep_empty argument like unnest() (#1339).

  • pivot_longer() gains a cols_vary argument for controlling the ordering of
    the output rows relative to their original row number (#1312).

  • New datasets who2, household, cms_patient_experience, and
    cms_patient_care to demonstrate various tidying challenges (#1333).

Breaking changes

  • The ... argument of both pivot_longer() and pivot_wider() has been
    moved to the front of the function signature, after the required arguments
    but before the optional ones. Additionally, pivot_longer_spec(),
    pivot_wider_spec(), build_longer_spec(), and build_wider_spec() have
    all gained ... arguments in a similar location. This change allows us to
    more easily add new features to the pivoting functions without breaking
    existing CRAN packages and user scripts.

    pivot_wider() provides temporary backwards compatible support for the case
    of a single unnamed argument that previously was being positionally matched to
    id_cols. This one special case still works, but will throw a warning
    encouraging you to explicitly name the id_cols argument.

    To read more about this pattern, see
    Data, dots, details in the
    tidyverse design guide (#1350).

Lifecycle changes

  • All functions deprecated in tidyr 1.0 and 1.2 (the old lazyeval functions
    ending in _ and various arguments to unnest()) now warn on every use.
    They will be made defunct in 2024 (#1406).

Rectangling

  • unnest_longer() now consistently drops rows with either NULL or empty
    vectors (like integer()) by default. Set the new keep_empty argument to
    TRUE to retain them. Previously, keep_empty = TRUE was implicitly being
    used for NULL, while keep_empty = FALSE was being used for empty vectors,
    which was inconsistent with all other tidyr verbs with this argument (#1363).

  • unnest_longer() now uses "" in the index column for fully unnamed
    vectors. It also now consistently uses NA in the index column for empty
    vectors that are "kept" by keep_empty = TRUE (#1442).

  • unnest_wider() now errors if any values being unnested are unnamed and
    names_sep is not provided (#1367).

  • unnest_wider() now generates automatic names for partially unnamed
    vectors. Previously it only generated them for fully unnamed vectors,
    resulting in a strange mix of automatic names and name-repaired names (#1367).

Bug fixes and minor improvements

General

  • Most tidyr functions now consistently disallow renaming during tidy-selection.
    Renaming was never meaningful in these functions, and previously either had no
    effect or caused problems (#1449, #1104).

  • tidyr errors (including input validation) have been thoroughly reviewed
    and should generally be more likely to point you in the right direction
    (#1313, #1400).

  • uncount() is now generic so implementations can be provided for objects
    other than data frames (@mgirlich, #1358).

  • uncount() gains a ... argument. It comes between the required and the
    optional arguments (@mgirlich, #1358).

  • nest(), complete(), expand(), and fill() now document their support
    for grouped data frames created by dplyr::group_by() (#952).

  • All built in datasets are now standard tibbles (#1459).

  • R >=3.4.0 is now required, in line with the tidyverse standard of supporting
    the previous 5 minor releases of R.

  • rlang >=1.0.4 and vctrs >=0.5.2 are now required (#1344, #1470).

  • Removed dependency on ellipsis in favor of equivalent functions in rlang
    (#1314).

Nesting, packing, and chopping

  • unnest(), unchop(), unnest_longer(), and unnest_wider() better handle
    lists with additional classes (#1327).

  • pack(), unpack(), chop(), and unchop() all gain an error_call
    argument, which in turn improves some of the error calls shown in nest()
    and various unnest() adjacent functions (#1446).

  • chop(), unpack(), and unchop() all gain ..., which must be empty
    (#1447).

  • unpack() does a better job of reporting column name duplication issues and
    gives better advice about how to resolve them using names_sep. This also
    improves errors from functions that use unpack(), like unnest() and
    unnest_wider() (#1425, #1367).

Pivoting

  • pivot_longer() no longer supports interpreting values_ptypes = list()
    and names_ptypes = list() as NULL. An empty list() is now interpreted as
    a <list> prototype to apply to all columns, which is consistent with how any
    other 0-length value is interpreted (#1296).

  • pivot_longer(values_drop_na = TRUE) is faster when there aren't any missing
    values to drop (#1392, @mgirlich).

  • pivot_longer() is now more memory efficient due to the usage of
    vctrs::vec_interleave() (#1310, @mgirlich).

  • pivot_longer() now throws a slightly better error message when
    values_ptypes or names_ptypes is provided and the coercion can't be made
    (#1364).

  • pivot_wider() now throws a better error message when a column selected by
    names_from or values_from is also selected by id_cols (#1318).

  • pivot_wider() is now faster when names_sep is provided (@mgirlich, #1426).

  • pivot_longer_spec(), pivot_wider_spec(), build_longer_spec(), and
    build_wider_spec() all gain an error_call argument, resulting in better
    error reporting in pivot_longer() and pivot_wider() (#1408).

Missing values

  • fill() now works correctly when there is a column named .direction in
    data (#1319, @tjmahr).

  • replace_na() is faster when there aren't any missing values to replace
    (#1392, @mgirlich).

  • The documentation of the replace argument of replace_na() now mentions
    that replace is always cast to the type of data (#1317).