From 9b3b251d973a84c3304e0011ea1727faa7eb9f40 Mon Sep 17 00:00:00 2001 From: Jan Gorecki Date: Fri, 8 Dec 2023 20:44:39 +0100 Subject: [PATCH] Pull 1.14.10 into master (#5814) * updated NEWS and urls fixed url issues and added final details for patch release in news * add method for IDate Added `S3method(as.IDate, IDate)`. This is related to #4777 as discussed in NEWS.md. * Add `setDTthreads(1)` to vignettes To reduce runtime on building vignettes. * reset setDTthreads at end of vignettes * reset threads at end of vignettes --------- Co-authored-by: Tyson Barrett --- NAMESPACE | 1 + NEWS.md | 19 ++++++++++++++++--- README.md | 2 +- vignettes/datatable-faq.Rmd | 4 ++++ vignettes/datatable-intro.Rmd | 4 ++++ vignettes/datatable-keys-fast-subset.Rmd | 6 ++++++ vignettes/datatable-reference-semantics.Rmd | 5 +++++ vignettes/datatable-reshape.Rmd | 5 +++++ vignettes/datatable-sd-usage.Rmd | 5 +++++ ...le-secondary-indices-and-auto-indexing.Rmd | 6 ++++++ 10 files changed, 53 insertions(+), 4 deletions(-) diff --git a/NAMESPACE b/NAMESPACE index ac5415082..75b490068 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -158,6 +158,7 @@ S3method(as.IDate, Date) S3method(as.IDate, POSIXct) S3method(as.IDate, default) S3method(as.IDate, numeric) +S3method(as.IDate, IDate) S3method(as.ITime, character) S3method(as.ITime, default) S3method(as.ITime, POSIXct) diff --git a/NEWS.md b/NEWS.md index 1cfd582f7..513ac9bc5 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,6 +1,6 @@ **If you are viewing this file on CRAN, please check [latest news on GitHub](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) where the formatting is also better.** -# data.table [v1.14.99](https://github.com/Rdatatable/data.table/milestone/20) (in development) +# data.table [v1.14.99](https://github.com/Rdatatable/data.table/milestone/29) (in development) ## BREAKING CHANGE @@ -610,6 +610,19 @@ 15. Thanks to @ssh352, Václav Tlapák, Cole Miller, András Svraka and Toby Dylan Hocking for reporting and bisecting a significant performance regression in dev. This was fixed before release thanks to a PR by Jan Gorecki, [#5463](https://github.com/Rdatatable/data.table/pull/5463). +# data.table [v1.14.10](https://github.com/Rdatatable/data.table/milestone/20?closed=1) (8 Dec 2023) + +## NOTES + +1. Maintainer of the package for CRAN releases is from now on Tyson Barrett (@tysonstanley), [#5710](https://github.com/Rdatatable/data.table/issues/5710). + +2. Updated internal code for breaking change of `is.atomic(NULL)` in R-devel, [#5691](https://github.com/Rdatatable/data.table/pull/5691). Thanks to Martin Maechler for the patch. + +3. Fix multiple test concerning coercion to missing complex numbers, [#5695](https://github.com/Rdatatable/data.table/issues/5695) and [#5748](https://github.com/Rdatatable/data.table/issues/5748). Thanks to @MichaelChirico and @ben-schwen for the patches. + +4. Fix multiple format warnings (e.g., -Wformat) [#5712](https://github.com/Rdatatable/data.table/pull/5712), [#5781](https://github.com/Rdatatable/data.table/pull/5781), [#5880](https://github.com/Rdatatable/data.table/pull/5800), [#5786](https://github.com/Rdatatable/data.table/pull/5786). Thanks to @MichaelChirico and @jangorecki for the patches. + + # data.table [v1.14.8](https://github.com/Rdatatable/data.table/milestone/28?closed=1) (17 Feb 2023) ## NOTES @@ -736,7 +749,7 @@ ## NOTES -1. Continuous daily testing by CRAN using latest daily R-devel revealed, within one day of the change to R-devel, that a future version of R would break one of our tests, [#4769](https://github.com/Rdatatable/data.table/issues/4769). The characters "-alike" were added into one of R's error messages, so our too-strict test which expected the error `only defined on a data frame with all numeric variables` will fail when it sees the new error message `only defined on a data frame with all numeric-alike variables`. We have relaxed the pattern the test looks for to `data.*frame.*numeric` well in advance of the future version of R being released. Readers are reminded that CRAN is not just a host for packages. It is also a giant test suite for R-devel. For more information, [behind the scenes of cran, 2016](https://h2o.ai/blog/behind-the-scenes-of-cran/). +1. Continuous daily testing by CRAN using latest daily R-devel revealed, within one day of the change to R-devel, that a future version of R would break one of our tests, [#4769](https://github.com/Rdatatable/data.table/issues/4769). The characters "-alike" were added into one of R's error messages, so our too-strict test which expected the error `only defined on a data frame with all numeric variables` will fail when it sees the new error message `only defined on a data frame with all numeric-alike variables`. We have relaxed the pattern the test looks for to `data.*frame.*numeric` well in advance of the future version of R being released. Readers are reminded that CRAN is not just a host for packages. It is also a giant test suite for R-devel. For more information, [behind the scenes of cran, 2016](https://h2o.ai/blog/2016/behind-the-scenes-of-cran/). 2. `as.Date.IDate` is no longer exported as a function to solve a new error in R-devel `S3 method lookup found 'as.Date.IDate' on search path`, [#4777](https://github.com/Rdatatable/data.table/issues/4777). The S3 method is still exported; i.e. `as.Date(x)` will still invoke the `as.Date.IDate` method when `x` is class `IDate`. The function had been exported, in addition to exporting the method, to solve a compatibility issue with `zoo` (and `xts` which uses `zoo`) because `zoo` exports `as.Date` which masks `base::as.Date`. Happily, since zoo 1.8-1 (Jan 2018) made a change to its `as.IDate`, the workaround is no longer needed. @@ -1008,7 +1021,7 @@ has a better chance of working on Mac. * `colClasses` now supports `'complex'`, `'raw'`, `'Date'`, `'POSIXct'`, and user-defined classes (so long as an `as.` method exists), [#491](https://github.com/Rdatatable/data.table/issues/491) [#1634](https://github.com/Rdatatable/data.table/issues/1634) [#2610](https://github.com/Rdatatable/data.table/issues/2610). Any error during coercion results in a warning and the column is left as the default type (probably `"character"`). Thanks to @hughparsonage for the PR. * `stringsAsFactors=0.10` will factorize any character column containing under `0.10*nrow` unique strings, [#2025](https://github.com/Rdatatable/data.table/issues/2025). Thanks to @hughparsonage for the PR. * `colClasses=list(numeric=20:30, numeric="ID")` will apply the `numeric` type to column numbers `20:30` as before and now also column name `"ID"`; i.e. all duplicate class names are now respected rather than only the first. This need may arise when specifying some columns by name and others by number, as in this example. Thanks to @hughparsonage for the PR. - * gains `yaml` (default `FALSE`) and the ability to parse CSVY-formatted input files; i.e., csv files with metadata in a header formatted as YAML (https://csvy.org/), [#1701](https://github.com/Rdatatable/data.table/issues/1701). See `?fread` and files in `/inst/tests/csvy/` for sample formats. Please provide feedback if you find this feature useful and would like extended capabilities. For now, consider it experimental, meaning the API/arguments may change. Thanks to @leeper at [`rio`](https://github.com/leeper/rio) for the inspiration and @MichaelChirico for implementing. + * gains `yaml` (default `FALSE`) and the ability to parse CSVY-formatted input files; i.e., csv files with metadata in a header formatted as YAML (https://csvy.org/), [#1701](https://github.com/Rdatatable/data.table/issues/1701). See `?fread` and files in `/inst/tests/csvy/` for sample formats. Please provide feedback if you find this feature useful and would like extended capabilities. For now, consider it experimental, meaning the API/arguments may change. Thanks to @leeper at [`rio`](https://github.com/gesistsa/rio) for the inspiration and @MichaelChirico for implementing. * `select` can now be used to specify types for just the columns selected, [#1426](https://github.com/Rdatatable/data.table/issues/1426). Just like `colClasses` it can be a named vector of `colname=type` pairs, or a named `list` of `type=col(s)` pairs. For example: ```R diff --git a/README.md b/README.md index 8455602f1..562799db4 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ [![CRAN status](https://badges.cranchecks.info/flavor/release/data.table.svg)](https://cran.r-project.org/web/checks/check_results_data.table.html) [![R-CMD-check](https://github.com/Rdatatable/data.table/workflows/R-CMD-check/badge.svg)](https://github.com/Rdatatable/data.table/actions) [![AppVeyor build status](https://ci.appveyor.com/api/projects/status/kayjdh5qtgymhoxr/branch/master?svg=true)](https://ci.appveyor.com/project/Rdatatable/data-table) -[![Codecov test coverage](https://codecov.io/github/Rdatatable/data.table/coverage.svg?branch=master)](https://codecov.io/github/Rdatatable/data.table?branch=master) +[![Codecov test coverage](https://codecov.io/github/Rdatatable/data.table/coverage.svg?branch=master)](https://app.codecov.io/github/Rdatatable/data.table?branch=master) [![GitLab CI build status](https://gitlab.com/Rdatatable/data.table/badges/master/pipeline.svg)](https://gitlab.com/Rdatatable/data.table/-/pipelines) [![downloads](https://cranlogs.r-pkg.org/badges/data.table)](https://www.rdocumentation.org/trends) [![CRAN usage](https://jangorecki.gitlab.io/rdeps/data.table/CRAN_usage.svg?sanitize=true)](https://gitlab.com/jangorecki/rdeps) diff --git a/vignettes/datatable-faq.Rmd b/vignettes/datatable-faq.Rmd index f1deaba78..a2de14a2f 100644 --- a/vignettes/datatable-faq.Rmd +++ b/vignettes/datatable-faq.Rmd @@ -29,6 +29,7 @@ knitr::opts_chunk$set( tidy = FALSE, cache = FALSE, collapse = TRUE) +.old.th = setDTthreads(1) ``` The first section, Beginner FAQs, is intended to be read in order, from start to finish. It's just written in a FAQ style to be digested more easily. It isn't really the most frequently asked questions. A better measure for that is looking on Stack Overflow. @@ -615,3 +616,6 @@ Sure. You're more likely to get a faster answer from the Issues page or Stack Ov Please see [this answer](https://stackoverflow.com/a/10529888/403310). +```{r, echo=FALSE} +setDTthreads(.old.th) +``` \ No newline at end of file diff --git a/vignettes/datatable-intro.Rmd b/vignettes/datatable-intro.Rmd index 04fd79e50..3624a7c5b 100644 --- a/vignettes/datatable-intro.Rmd +++ b/vignettes/datatable-intro.Rmd @@ -18,6 +18,7 @@ knitr::opts_chunk$set( cache = FALSE, collapse = TRUE ) +.old.th = setDTthreads(1) ``` This vignette introduces the `data.table` syntax, its general form, how to *subset* rows, *select and compute* on columns, and perform aggregations *by group*. Familiarity with `data.frame` data structure from base R is useful, but not essential to follow this vignette. @@ -651,3 +652,6 @@ We will see how to *add/update/delete* columns *by reference* and how to combine *** +```{r, echo=FALSE} +setDTthreads(.old.th) +``` \ No newline at end of file diff --git a/vignettes/datatable-keys-fast-subset.Rmd b/vignettes/datatable-keys-fast-subset.Rmd index 3e9a4f23c..e73b71b92 100644 --- a/vignettes/datatable-keys-fast-subset.Rmd +++ b/vignettes/datatable-keys-fast-subset.Rmd @@ -17,6 +17,7 @@ knitr::opts_chunk$set( tidy = FALSE, cache = FALSE, collapse = TRUE) +.old.th = setDTthreads(1) ``` This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and group by using `by`. If you're not familiar with these concepts, please read the *"Introduction to data.table"* and *"Reference semantics"* vignettes first. @@ -494,3 +495,8 @@ In this vignette, we have learnt another method to subset rows in `i` by keying * combine key based subsets with `j` and `by`. Note that the `j` and `by` operations are exactly the same as before. Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next vignette, we will address this using a *new* feature -- *secondary indexes*. + + +```{r, echo=FALSE} +setDTthreads(.old.th) +``` \ No newline at end of file diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index c96ed090f..7a9990ba4 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -17,6 +17,7 @@ knitr::opts_chunk$set( tidy = FALSE, cache = FALSE, collapse = TRUE) +.old.th = setDTthreads(1) ``` This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the *"Introduction to data.table"* vignette first. @@ -348,6 +349,10 @@ However we could improve this functionality further by *shallow* copying instead * We can use `:=` for its side effect or use `copy()` to not modify the original object while updating by reference. +```{r, echo=FALSE} +setDTthreads(.old.th) +``` + # So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette *"Keys and fast binary search based subset"* to perform *blazing fast subsets* by *keying data.tables*. diff --git a/vignettes/datatable-reshape.Rmd b/vignettes/datatable-reshape.Rmd index 0b5d7a57d..d282bc7de 100644 --- a/vignettes/datatable-reshape.Rmd +++ b/vignettes/datatable-reshape.Rmd @@ -17,6 +17,7 @@ knitr::opts_chunk$set( tidy = FALSE, cache = FALSE, collapse = TRUE) +.old.th = setDTthreads(1) ``` This vignette discusses the default usage of reshaping functions `melt` (wide to long) and `dcast` (long to wide) for *data.tables* as well as the **new extended functionalities** of melting and casting on *multiple columns* available from `v1.9.6`. @@ -314,6 +315,10 @@ DT.c2 You can also provide *multiple functions* to `fun.aggregate` to `dcast` for *data.tables*. Check the examples in `?dcast` which illustrates this functionality. +```{r, echo=FALSE} +setDTthreads(.old.th) +``` + # *** diff --git a/vignettes/datatable-sd-usage.Rmd b/vignettes/datatable-sd-usage.Rmd index e7b08650e..ae0b5a84a 100644 --- a/vignettes/datatable-sd-usage.Rmd +++ b/vignettes/datatable-sd-usage.Rmd @@ -25,6 +25,7 @@ knitr::opts_chunk$set( out.width = '100%', dpi = 144 ) +.old.th = setDTthreads(1) ``` This vignette will explain the most common ways to use the `.SD` variable in your `data.table` analyses. It is an adaptation of [this answer](https://stackoverflow.com/a/47406952/3576984) given on StackOverflow. @@ -254,3 +255,7 @@ abline(v = overall_coef, lty = 2L, col = 'red') While there is indeed a fair amount of heterogeneity, there's a distinct concentration around the observed overall value. The above is just a short introduction of the power of `.SD` in facilitating beautiful, efficient code in `data.table`! + +```{r, echo=FALSE} +setDTthreads(.old.th) +``` \ No newline at end of file diff --git a/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd b/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd index 6f2474c11..ff50ba97e 100644 --- a/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd +++ b/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd @@ -17,6 +17,7 @@ knitr::opts_chunk$set( tidy = FALSE, cache = FALSE, collapse = TRUE) +.old.th = setDTthreads(1) ``` This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familiar with these concepts, please read the *"Introduction to data.table"*, *"Reference semantics"* and *"Keys and fast binary search based subset"* vignettes first. @@ -325,3 +326,8 @@ In recent version we extended auto indexing to expressions involving more than o We will discuss fast *subsets* using keys and secondary indices to *joins* in the next vignette, *"Joins and rolling joins"*. *** + +```{r, echo=FALSE} +setDTthreads(.old.th) +``` +