Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test 1960 still failing (grouping sets output mismatch) #3267

Closed
MichaelChirico opened this issue Jan 11, 2019 · 13 comments
Closed

test 1960 still failing (grouping sets output mismatch) #3267

MichaelChirico opened this issue Jan 11, 2019 · 13 comments
Assignees
Milestone

Comments

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Jan 11, 2019

@jangorecki this came up earlier in #3168: #3168 (comment)

The output on my machine differs from that on yours and on Travis/Codecov...

It's still going on... if we can figure out and squash before CRAN release that'd be great

DT = data.table(
  color = c("yellow", "red", "green", "red", "green", "red",
            "yellow", "yellow", "green", "green", "green", "yellow",
            "red", "yellow", "red", "green", "yellow", "red", "yellow",
            "red", "green", "yellow", "green", "green"),
  year = structure(c(15340, 15340, 14975, 15706, 15706, 15340,
                     16436, 15340, 15340, 14975, 16436, 15706,
                     16436, 15340, 14975, 14975, 16071, 15340,
                     15706, 16071, 15706, 15340, 16436, 16071), class = "Date"),
  status = structure(c(4L, 3L, 4L, 3L, 2L, 1L, 3L, 4L, 4L, 3L, 4L, 4L,
                       4L, 4L, 1L, 3L, 3L, 2L, 1L, 2L, 3L, 4L, 2L, 4L),
                     .Label = c("active", "archived", "inactive", "removed"),
                     class = "factor"),
  amount = c(1L, 4L, 2L, 3L, 1L, 5L, 1L, 1L, 4L, 2L, 3L, 1L,
             5L, 4L, 2L, 2L, 4L, 3L, 3L, 2L, 4L, 4L, 1L, 2L),
  value = c(2.5, 2, 3, 3, 2.5, 3.5, 2.5, 3.5, 3, 2.5, 3.5, 2.5, 2,
            2.5, 3, 3, 3, 3, 3, 3, 2, 2.5, 3, 3)
)
groupingsets(DT[ , .(amount, value)], j = 5, by = character(0L),
             sets = list(character()), id=TRUE)

Output in test.data.table() (also for R --vanilla):

Running test id 1960      Test 1960 ran without errors but failed check that x equals y:
> x = ans 
   grouping V1  [Key= Types=int,dou Classes=int,num]
1:       NA  5                                      
> y = data.table(grouping = 0L, V1 = 5) 
   grouping V1  [Key= Types=int,dou Classes=int,num]
1:        0  5                                      
Column 'grouping': 'is.NA' value mismatch: 0 in current 1 in target

My sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xts_0.11-2        zoo_1.8-4         bit64_0.9-7       bit_1.1-14       
[5] data.table_1.11.9

loaded via a namespace (and not attached):
[1] compiler_3.5.1  tools_3.5.1     grid_3.5.1      lattice_0.20-38

Related: #3173

@MichaelChirico MichaelChirico added this to the 1.12.0 milestone Jan 11, 2019
@jangorecki
Copy link
Member

@jangorecki jangorecki commented Jan 11, 2019

I don't think it is an urgent issue for 1.12.0, using grouping sets and constant j doesn't make much sense.
You seems to have some extra packages attached, be sure to try on clean data.table-only session.
Grouping ID column is created using the following, check if you are getting NA or 0 already there

by = character(0L)
sets = list(character())
lapply(sets, function(by.set) strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
#[[1]]
#[1] 0

you can also try putting browser() to investigate this branch

if (!is.data.table(empty)) {
if (length(empty)>0) empty = empty[0L] # fix for #3173 when no grouping and j constant
empty = setDT(list(empty)) # improve after #648, see comment in aggregate.set
}

and then finally also

data.table/R/groupingsets.R

Lines 108 to 111 in 6346a88

if (id) {
# integer bit mask of aggregation levels: http://www.postgresql.org/docs/9.5/static/functions-aggregate.html#FUNCTIONS-GROUPING-TABLE
set(r, j = "grouping", value = strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
}

removing bug as of now as it is possible it is machine dependent

@jangorecki jangorecki removed the bug label Jan 11, 2019
@MichaelChirico
Copy link
Member Author

@MichaelChirico MichaelChirico commented Jan 11, 2019

@jangorecki extra packages are attached by test.data.table(), are they supposed to be un-attached by the time test.data.table() finishes?

This was run via

$ R --vanilla
> library(data.table)
> test.data.table()

And yes, indeed we've narrowed the issue:

by = character(0L)
sets = list(character())
lapply(sets, function(by.set) strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
#[[1]]
# [1] NA

and in fact:

by = character(0L)
sets = list(character())
lapply(sets, function(by.set) {
    dput(by.set)
    dput(by %chin% by.set + 1L)
    dput(c("1", "0")[by %chin% by.set + 1L])
    dput(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""))
    dput(strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
    strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L)
})
# character(0)
# integer(0)
# character(0)
# ""
# NA_integer_
# [[1]]
# [1] NA

From what I see I'm amazed other machines are coming out with 0!

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Jan 11, 2019

Manual of strtoi

An integer vector of the same length as x. Values which cannot be interpreted as integers or would overflow are returned as NA_integer_.

maybe macOS is not equipped with proper strtol library, you can try to debug that outside of R

@MichaelChirico
Copy link
Member Author

@MichaelChirico MichaelChirico commented Jan 11, 2019

I have a remote machine where 0L is coming out:

by = character(0L)
sets = list(character())
lapply(sets, function(by.set) {
    dput(by.set)
    dput(by %chin% by.set + 1L)
    dput(c("1", "0")[by %chin% by.set + 1L])
    dput(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""))
    dput(strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L))
    strtoi(paste(c("1", "0")[by %chin% by.set + 1L], collapse=""), base=2L)
})

# character(0)
# integer(0)
# character(0)
# ""
# 0L
# [[1]]
# [1] 0

so it's really down to strtoi output, not data.table...

sessionInfo() on a successful machine:

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /opt/conda/lib/R/lib/libRblas.so
LAPACK: /opt/conda/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.11.9

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Jan 11, 2019

yes,

strtoi("", base=2L)
#[1] 0

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Jan 11, 2019

try pasting this code into strtol.c

#include <stdio.h>
#include <stdlib.h>

int main () {
   char str[5] = "";
   char *ptr;
   int ans = strtol(str, &ptr, 2);
   printf("ans: %d\n", ans);
   printf("leftover: %s\n", ptr);
   return(0);
}

and

gcc strtol.c -o strtol.out
./strtol.out
#ans: 0
#leftover: 

@MichaelChirico
Copy link
Member Author

@MichaelChirico MichaelChirico commented Jan 11, 2019

Hmm I get ans 0 from the C program...

Maybe time to raise a bug on R-devel... at very least the documentation needs to be clarified...

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Jan 11, 2019

Values which cannot be interpreted as integers or would overflow are returned as NA_integer_.

doc is correct, the problem is lack of consistency to C lib

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Jan 11, 2019

anyway we can ignore that and just add extra branch for !nzchar

@MichaelChirico
Copy link
Member Author

@MichaelChirico MichaelChirico commented Jan 11, 2019

Yes sounds like the right solution. And I'll follow up with r-devel separately.

@mattdowle
Copy link
Member

@mattdowle mattdowle commented Jan 11, 2019

I'm not fully following. It sounds like some sort of change is needed to data.table in this 1.12.0 to ensure tests pass consistently on all platforms?

@MichaelChirico
Copy link
Member Author

@MichaelChirico MichaelChirico commented Jan 11, 2019

@mattdowle yes, a very simple fix I think, testing now

@MichaelChirico
Copy link
Member Author

@MichaelChirico MichaelChirico commented Jan 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants