Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coerce 01 #3909

Merged
merged 43 commits into from
Sep 28, 2019
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
415689b
interim
mattdowle Sep 25, 2019
1b07fff
interim
mattdowle Sep 25, 2019
17d82df
interim
mattdowle Sep 25, 2019
6350d5a
interim
mattdowle Sep 25, 2019
92056d1
interim
mattdowle Sep 25, 2019
16e65e1
interim
mattdowle Sep 25, 2019
ce317ea
interim
mattdowle Sep 25, 2019
86a4c9a
interim
mattdowle Sep 26, 2019
d0b7ba7
coerce to logical warnings working again
mattdowle Sep 26, 2019
45f66cc
interim
mattdowle Sep 26, 2019
7a569ed
interim
mattdowle Sep 26, 2019
fc72fb6
interim
mattdowle Sep 26, 2019
ef1c6fd
interim
mattdowle Sep 26, 2019
78733d9
interim; passing
mattdowle Sep 26, 2019
b46e173
interm; passing
mattdowle Sep 26, 2019
8c3e064
tidy
mattdowle Sep 26, 2019
8497d02
extra test; passes all revdeps
mattdowle Sep 26, 2019
6a2519d
news item tweaked
mattdowle Sep 26, 2019
f5779c0
allNA coverage via :=
mattdowle Sep 26, 2019
328cefd
allNA used in bmerge, and covered
mattdowle Sep 26, 2019
94ffa0e
allNA empty tests
mattdowle Sep 26, 2019
2374b32
coverage
mattdowle Sep 26, 2019
60b3757
tidy
mattdowle Sep 26, 2019
510da29
don't create NA level when assigning NA_character_ to factor
mattdowle Sep 26, 2019
2be417d
rbindlist malformed NA factor level, #3915
mattdowle Sep 26, 2019
4d84982
coverage
mattdowle Sep 26, 2019
4ec649d
news item extra nx= fixed
mattdowle Sep 26, 2019
c0bad1a
snprintf
mattdowle Sep 26, 2019
b83d133
memcpy
mattdowle Sep 26, 2019
2525f9b
coverage
mattdowle Sep 27, 2019
a415095
NEWS improvements
jangorecki Sep 27, 2019
ba4b0cb
more verbose
jangorecki Sep 27, 2019
c8ce5a7
allNA now does not return false by default
jangorecki Sep 27, 2019
7cad29b
setnames news item
mattdowle Sep 27, 2019
9f78001
interim
mattdowle Sep 27, 2019
0e9e13f
saved coerce from NA to NA_character_
mattdowle Sep 27, 2019
8d871cc
assigning factor numbers works again
mattdowle Sep 27, 2019
6b521e1
type complex cases and tests
mattdowle Sep 27, 2019
475414a
added verbose message when coerce, more cases and tests
mattdowle Sep 28, 2019
47ad56e
more integer64 cases (expecting some lack of coverage to cover next),…
mattdowle Sep 28, 2019
598752a
any->all in comments
MichaelChirico Sep 28, 2019
5b799ea
Added CHECK_RANGE macro
mattdowle Sep 28, 2019
7f34680
code formatting only
mattdowle Sep 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 16 additions & 12 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,24 +192,24 @@
```R
set.seed(108)
x = rnorm(1e6); n = 1e3
rollfun = function(x, n, FUN) {
ans = rep(NA_real_, nx<-length(x))
base_rollapply = function(x, n, FUN) {
nx = length(x)
ans = rep(NA_real_, nx)
for (i in n:nx) ans[i] = FUN(x[(i-n+1):i])
ans
}
system.time(ans1<-rollfun(x, n, mean))
system.time(ans2<-zoo::rollapplyr(x, n, function(x) mean(x), fill=NA))
system.time(ans3<-zoo::rollmeanr(x, n, fill=NA))
system.time(ans4<-frollapply(x, n, mean))
system.time(ans5<-frollmean(x, n))
sapply(list(ans2,ans3,ans4,ans5), all.equal, ans1)
system.time(base_rollapply(x, n, mean))
system.time(zoo::rollapplyr(x, n, function(x) mean(x), fill=NA))
system.time(zoo::rollmeanr(x, n, fill=NA))
system.time(frollapply(x, n, mean))
system.time(frollmean(x, n))

### fun mean sum median
# base rollfun 8.815 5.151 60.175
# base_rollapply 8.815 5.151 60.175
# zoo::rollapply 34.373 27.837 88.552
# zoo::roll[fun] 0.215 0.185 NA
# zoo::roll[fun] 0.215 0.185 NA ## median not fully supported
# frollapply 5.404 1.419 56.475
# froll[fun] 0.003 0.002 NA
# froll[fun] 0.003 0.002 NA ## median not yet supported
mattdowle marked this conversation as resolved.
Show resolved Hide resolved
```

28. `setnames()` now accepts functions in `old=` and `new=`, [#3703](https://github.com/Rdatatable/data.table/issues/3703). Thanks @smingerson for the feature request and @shrektan for the PR.
Expand All @@ -221,9 +221,11 @@
# [1] "A" "B" "C"
setnames(DT, 2:3, tolower)
names(DT)
# [1] "A" "b" "c"
# [1] "a" "b" "c"
mattdowle marked this conversation as resolved.
Show resolved Hide resolved
```

29. `:=` and `set()` now use zero-copy type coercion. Accordingly, `DT[..., integerColumn:=0]` and `set(DT,i,j,0)` no longer warn about the `0` ('numeric') needing to be `0L` ('integer') because there is no longer any time or space used for this coercion. The old long warning was off-putting to new users ("what and why L?"), whereas advanced users appreciated the old warning so they could avoid the coercion. Although the time and space for one coercion in a single call is unmeasurably small, when placed in a loop the small overhead of any allocation on R's heap could start to become noticeable (more so for `set()` whose purpose is low-overhead looping). Further, when assigning a value across columns of varying types, it could be inconvenient to supply the correct type for every column. Hence, zero-copy coercion was introduced to satisfy all these requirements. A warning is still issued, as before, when fractional data is discarded; e.g. when 3.14 is assigned to an integer column. Zero-copy coercion applies to length>1 vectors as well as length-1 vectors.

## BUG FIXES

1. `first`, `last`, `head` and `tail` by group no longer error in some cases, [#2030](https://github.com/Rdatatable/data.table/issues/2030) [#3462](https://github.com/Rdatatable/data.table/issues/3462). Thanks to @franknarf1 for reporting.
Expand Down Expand Up @@ -327,6 +329,8 @@

44. Grouping could create a `malformed factor` and/or segfault when the factors returned by each group did not have identical levels, [#2199](https://github.com/Rdatatable/data.table/issues/2199) and [#2522](https://github.com/Rdatatable/data.table/issues/2522). Thanks to Václav Hausenblas, @franknarf1, @ben519, and @Henrik-P for reporting.

45. `rbindlist` (and printing a `data.table` with over 100 rows because that uses `rbindlist(head, tail)`) could error with `malformed factor` for unordered factor columns containing a used `NA_character_` level, [#3915](https://github.com/Rdatatable/data.table/issues/3915). This is an unusual input for unordered factors because NA_integer_ is recommended by default in R. Thanks to @sindribaldur for reporting.

## NOTES

1. `rbindlist`'s `use.names="check"` now emits its message for automatic column names (`"V[0-9]+"`) too, [#3484](https://github.com/Rdatatable/data.table/pull/3484). See news item 5 of v1.12.2 below.
Expand Down
4 changes: 2 additions & 2 deletions R/bmerge.R
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,12 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
if (xclass=="character" || iclass=="character" ||
xclass=="logical" || iclass=="logical" ||
xclass=="factor" || iclass=="factor") {
if (anyNA(i[[ic]]) && all(is.na(i[[ic]]))) { # TODO: allNA function in C
if (anyNA(i[[ic]]) && allNA(i[[ic]])) {
mattdowle marked this conversation as resolved.
Show resolved Hide resolved
if (verbose) cat("Coercing all-NA i.",names(i)[ic]," (",iclass,") to type ",xclass," to match type of x.",names(x)[xc],".\n",sep="")
set(i, j=ic, value=match.fun(paste0("as.", xclass))(i[[ic]]))
next
}
else if (anyNA(x[[xc]]) && all(is.na(x[[xc]]))) {
else if (anyNA(x[[xc]]) && allNA(x[[xc]])) {
if (verbose) cat("Coercing all-NA x.",names(x)[xc]," (",xclass,") to type ",iclass," to match type of i.",names(i)[ic],".\n",sep="")
set(x, j=xc, value=match.fun(paste0("as.", iclass))(x[[xc]]))
next
Expand Down
2 changes: 1 addition & 1 deletion R/fread.R
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir())
}
if (!missing(autostart)) warning("'autostart' is now deprecated and ignored. Consider skip='string' or skip=n");
if (is.logical(colClasses)) {
if (!all(is.na(colClasses))) stop("colClasses is type 'logical' which is ok if all NA but it has some TRUE or FALSE values in it which is not allowed. Please consider the drop= or select= argument instead. See ?fread.")
if (!allNA(colClasses)) stop("colClasses is type 'logical' which is ok if all NA but it has some TRUE or FALSE values in it which is not allowed. Please consider the drop= or select= argument instead. See ?fread.")
colClasses = NULL
}
if (!is.null(colClasses) && is.atomic(colClasses)) {
Expand Down
12 changes: 6 additions & 6 deletions R/test.data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ test.data.table = function(verbose=FALSE, pkg="pkg", silent=FALSE, with.other.pa
# inittime=PS_rss=GC_used=GC_max_used=NULL
# m = fread("memtest.csv")[inittime==.inittime]
# if (nrow(m)) {
# ps_na = all(is.na(m[["PS_rss"]])) # OS with no 'ps -o rss R' support
# ps_na = allNA(m[["PS_rss"]]) # OS with no 'ps -o rss R' support
# grDevices::png("memtest.png")
# p = graphics::par(mfrow=c(if (ps_na) 2 else 3, 2))
# if (!ps_na) {
Expand Down Expand Up @@ -282,9 +282,9 @@ test = function(num,x,y=TRUE,error=NULL,warning=NULL,output=NULL,notOutput=NULL,
if (!missing(error) && !missing(y))
stop("Test ",numStr," is invalid: when error= is provided it does not make sense to pass y as well") # nocov

string_match = function(x, y) {
length(grep(x,y,fixed=TRUE)) || # try treating x as literal first; useful for most messages containing ()[]+ characters
length(tryCatch(grep(x,y), error=function(e)NULL)) # otherwise try x as regexp
string_match = function(x, y, ignore.case=FALSE) {
length(grep(x, y, fixed=TRUE)) || # try treating x as literal first; useful for most messages containing ()[]+ characters
length(tryCatch(grep(x, y, ignore.case=ignore.case), error=function(e)NULL)) # otherwise try x as regexp
}

xsub = substitute(x)
Expand Down Expand Up @@ -364,10 +364,10 @@ test = function(num,x,y=TRUE,error=NULL,warning=NULL,output=NULL,notOutput=NULL,
fail = TRUE
# nocov end
}
if (length(notOutput) && string_match(notOutput, out)) {
if (length(notOutput) && string_match(notOutput, out, ignore.case=TRUE)) {
# nocov start
cat("Test",numStr,"produced output but should not have:\n")
cat("Expected absent: <<",gsub("\n","\\\\n",notOutput),">>\n",sep="")
cat("Expected absent (case insensitive): <<",gsub("\n","\\\\n",notOutput),">>\n",sep="")
cat("Observed: <<",gsub("\n","\\\\n",out),">>\n",sep="")
fail = TRUE
# nocov end
Expand Down
1 change: 1 addition & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ if (base::getRversion() < "3.5.0") {
}
isTRUEorNA = function(x) is.logical(x) && length(x)==1L && (is.na(x) || x)
isTRUEorFALSE = function(x) is.logical(x) && length(x)==1L && !is.na(x)
allNA = function(x) .Call(C_allNAR, x)
mattdowle marked this conversation as resolved.
Show resolved Hide resolved

if (base::getRversion() < "3.2.0") { # Apr 2015
isNamespaceLoaded = function(x) x %chin% loadedNamespaces()
Expand Down
Loading