Skip to content

janitor 2.2.0

Latest
Compare
Choose a tag to compare
@sfirke sfirke released this 03 Feb 16:19

Breaking changes

These are all minor breaking changes resulting from enhancements and are not expected to affect the vast majority of users.

  • A new ... argument was added to row_to_names(), preceding the remove_row argument, as part of the new find_header() functionality. If code previously used remove_row as an unnamed argument, it will now error. If code previously used the unsupported behavior of passing anything other than TRUE or FALSE to remove_row, unexpected results may occur.

  • Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year). excel_numeric_to_date() did not account for this error, and now it does. Dates returned from excel_numeric_to_date() that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will become as.POSIXct(NA). (#423, thanks @billdenney for fixing)

  • A minor breaking change is that the time zone is now always set for excel_numeric_to_date() and convert_date(). The default timezone is Sys.timezone(), previously it was an empty string (""). (#422, thanks @billdenney for fixing)

  • get_dupes() results are now sorted first by descending order of dupe_count, then alphabetically by sorting variables. (#493)

  • There are several minor breaking changes resulting from enhancements to adorn_ns():

    • The addition of the new argument format_func means that previous calls relying on ,,, as shorthand to get to the ... column selection argument may now require an extra comma.
    • adorn_ns() now defaults to displaying numbers of >3 digits with big.mark = ",", as part of the default value of the new format_func argument. E.g., 1234 is now 1,234.
    • adorn_ns() no longer prints leading whitespace when position = "front" - this is not a visible change in the printed result and it would be rare that this affects any code.
  • When the first column of the data.frame input to adorn_totals() is a factor and a totals row is added to the bottom, that column now remains a factor, with "Total" or other user-specified totals name added to its factor levels (#494).

New features

  • row_to_names() now has a new helper function, find_header() to help find the row that contains the names. It can be used by passing row_number="find_header". See the documentation of row_to_names() and find_header() for more examples. (fix #429)

  • remove_empty() has a new argument, cutoff which allows rows or columns to be removed if at least the cutoff fraction of the data are missing. (fix #446, thanks to @jzadra for suggesting the feature and @billdenney for fixing)

  • A new function sas_numeric_to_date() has been added to convert SAS dates, times, and datetimes to R objects (fix #475, thanks to @billdenney for suggesting and implementing)

  • A new function single_value() has been added to ensure that only a single value or missing values are present in a vector (fix #428)

  • A new function get_one_to_one() has been added to find columns that map 1:1 to each other, even if the values within the columns differ (fix #291, @billdenney)

  • adorn_Ns() contains a new format_func argument so that the user can format the Ns to their liking, e.g., changing the big.mark character. (#444)

  • clean_names() can now be called on database connection in a dbplyr code pipeline (#467)

Minor features

  • make_clean_names() (and therefore clean_names()) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by a replace argument value. (#448, thanks @IndrajeetPatil for reporting and @billdenney for fixing) The rationale is that standard transliteration would convert "[mu]g" to "mg" when it would be more typically be converted to "ug" for use as a unit. A new, unexported constant (janitor:::mu_to_u) was added to help with mu to "u" replacements.

  • excel_numeric_to_date() now warns when times are converted to NA due to hours that do not exist because of daylight savings time (fix #420, thanks @Geomorph2 for reporting and @billdenney for fixing). It also warns when inputs are not positive, since Excel only supports values down to 1 (#423).

  • If a tabyl() or similar data.frame is sorted (e.g., with dplyr::arrange()), then has adorn_totals() and/or adorn_percentages() called on it, followed by adorn_ns(), the Ns will be sorted correctly to match the tabyl they're being adorned on. (fix #407)

  • clean_names() now supports all object types that have either names or dimnames (#481, @DanChaltiel).

  • adorn_pct_formatting() uses the locale-dependent value of decimal.mark as a decimal separator, e.g., in locales where getOption("OutDec") is , it will print percentages in the format "12,34%". This character can also be set manually with options(OutDec = ",").(#451).

  • adorn_totals(where ="row") now preserves factor class and levels of the first column of the input data.frame (#494).

  • make_clean_names() now allows for duplicate names to be returned by specifying TRUE to the new allow_dupes argument (#495, @JasonAizkalns).

  • Some warning messages now have classes so that they can be specifically suppressed with suppressWarnings(..., class="the_class_to_suppress"). To find the class of a warning you typically must look at the code where the error is occurring. (#452, thanks to @mgacc0 for suggesting and @billdenney for fixing)

Bug fixes

  • adorn_percentages() was refactored for compatibility with dplyr package versions >= 1.1.0 (#490)

  • When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a tabyl, the resulting columns or list are now sorted in numeric order, not alphabetic. (#438, thanks @daaronr for reporting and @mattroumaya for fixing)

  • tabyl() now succeeds when the second variable is named "n" (#445).

  • adorn_ns() can act on a single-column data.frame input with custom Ns supplied if the variable to adorn is specified with ... (#456).

  • adorn_totals() on a one_way tabyl preserves the tabyl_type attribute so that a subsequent call to adorn_pct_formatting() works correctly on one-way tabyls (#523).