Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gmedian failing on simple case #2046

Closed
caneff opened this issue Feb 28, 2017 · 4 comments · Fixed by #2480
Closed

gmedian failing on simple case #2046

caneff opened this issue Feb 28, 2017 · 4 comments · Fixed by #2480
Assignees
Labels
Milestone

Comments

@caneff
Copy link

caneff commented Feb 28, 2017

On 1.10.4:

dt <- data.table(x=c(1,1,1),y=c(3,0,0))
setkey(dt, x)
dt[, median(y), by=x]

Yields the following error:
Error in gmedian(y) : negative length vectors are not allowed

If dt doesn't have a key set, it doesn't throw the error (but I can't tell if gmedian is even being called in that case). Also calling stats::median instead of median above works because it bypasses gmedian directly.

@franknarf1
Copy link
Contributor

I'm running 1.10.4 on R 3.2.5 and don't see that error. To see if gmedian is being used, dt[, median(y), by=x, verbose=TRUE] .. not sure if that's what you meant. There's also the global option datatable.verbose.

Fyi, they prefer that you test on the devel version, currently 1.10.5 https://github.com/Rdatatable/data.table/wiki/Support

@caneff
Copy link
Author

caneff commented Feb 28, 2017

Sorry, yes I should have updated to the latest version. Updated to 1.10.5, still happens. But it is weird.

I start with R --vanilla. sessionInfo() yields:

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] gmotd_1.0

When I start it up, I paste in this exact block:

library(data.table)
dt <- data.table(x=c(1,1,1),y=c(3,0,0))
setkey(dt, x)
dt[, median(y), by=x]

Approximately 3/4s of the time I do this, I get the error mentioned above. The other 1/4 of the times I try it works for me. I have ran other code that uncovered this issue for me in the first place on networked machines and it produced the same errors, so I don't think this is some weird memory corruption thing. Thoughts?

@kmillar
Copy link

kmillar commented Mar 1, 2017

What appears to be happening is that in [.data.table, if 'on' is missing, then it gets set to integer(0), which eventually gets passed down as the 'o' parameter to the C implementation of gforce.

In there we have:
maxgrpn = INTEGER(getAttrib(o, install("maxgrpn")))[0]
but since integer(0) has no attributes, this becomes INTEGER(R_NilValue)[0], which could return anything. When it returns a negative value, we get the error seen above.

While I've been able to track this down, I don't understand the code well enough to fix it. Can someone work out a patch for it?

@MichaelChirico
Copy link
Member

Linking this knitr issue I created which may be related (though it seems to be a knitr issue in my case):

yihui/knitr#1457

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants