Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong units displayed in data.table with POSIXct arithmetic #761

Closed
boethian opened this issue Aug 9, 2014 · 8 comments
Closed

Wrong units displayed in data.table with POSIXct arithmetic #761

boethian opened this issue Aug 9, 2014 · 8 comments
Milestone

Comments

@boethian
Copy link

boethian commented Aug 9, 2014

Recently posted on StackOverflow because didn't know how to report bug report / make feature request here.

http://stackoverflow.com/questions/25214170/wrong-units-displayed-in-data-table-with-posixct-arithmetic

Problem reproduced here for completeness, I can take down SO "question" if this bug is deemed legitimate here:

When durations are computed in data.table (v1.9.2), the wrong units can be printed with POSIXct arithmetic. It seems the first units are chosen.

require("data.table")
dt <- data.table(id=c(1,1,2,2), 
                  event=rep(c("start", "end"), times=2), 
                  time=c(as.POSIXct(c("2014-01-31 06:05:30", 
                                      "2014-01-31 06:45:30", 
                                      "2014-01-31 08:10:00", 
                                      "2014-01-31 09:30:00"))))
dt$time[2] - dt$time[1]  # in minutes
dt$time[4] - dt$time[3]  # in hours
dt[ , max(time) - min(time), by=id]  # wrong units printed for id 2

I realize that one of these is the correct way to do it to get expected behavior, but wanted to report this behavior. Not sure if it is really a data.table problem or POSIXct problem.

dt[ , difftime(max(time), min(time), units="mins"), by=id]  # both mins
dt[ , difftime(max(time), min(time), units="hours"), by=id]  # both hours
@mattdowle mattdowle added the bug label Aug 11, 2014
@mattdowle
Copy link
Member

Agreed it's a bug. Thanks for the nice report. I tested in latest dev just now and it occurs there too.

Good to have the S.O. question as well i.e. don't take it down. More chance people will find it and see the workarounds etc.

As a first step, the attributes on each group result could be checked for consistency and a warning issued if not. I thought it did that already but clearly not. Tagged High for the this detection and warning aspect which should be relatively straightforward and could be an issue in other cases unrelated to POSIXct. Making it return the right result without speed penalty could be hard, so likely best to have user call difftime(,units=), and that could be a suggestion in the warning message.

@MichaelChirico
Copy link
Member

MichaelChirico commented Aug 2, 2016

Was just trying to figure out if I couldn't fix this, but it seems the mistake is done entirely in C (everything's fine up until .Call(Cdogroups, ...) AFAICT), so it's above my weight class for now... basically the group concatenation must be ignoring the attr of the difftime objects

@MichaelChirico
Copy link
Member

MichaelChirico commented Apr 16, 2018

Some more details copied as this re-surfaced for me:

library(data.table)
DT = data.table(
  a = c(1, 1, 2, 2),
  b = structure(
    c(0, 10, 0, 100000000), 
    class = c('POSIXct', 'POSIXt')
  )
)
DT[ , diff(b), keyby = a]
#    a            V1
# 1: 1   10.000 secs
# 2: 2 1157.407 secs

The problem is that diff.POSIXt silently invokes -.POSIXt, which chooses units automatically given its inputs.

These units are ignored on group wrapup:

DT[ , {
  x = diff(b)
  dput(x)
  x
}, keyby = a]
# structure(10, units = "secs", class = "difftime")
# structure(1157.40740740741, units = "days", class = "difftime")
#    a            V1
# 1: 1   10.000 secs
# 2: 2 1157.407 secs

@mattdowle mattdowle removed this from the Candidate milestone May 10, 2018
@mattdowle mattdowle added this to the 1.12.4 milestone Sep 11, 2019
@mattdowle
Copy link
Member

Closed by #3837.
("Closes #n and #m" doesn't close #m. Have to use "closes" separately before each issue number.)

@MichaelChirico
Copy link
Member

oh really? I believe that works in the commit message... or maybe that's just gitlab?

@mattdowle
Copy link
Member

Yep. I don't know about gitlab or the commit message (I don't think so), but this one did not close this one. Before merge you can see if the merge is going to close it or not via a hyperlink / tooltip that appears in the linked PR on the issue labeled "will be closed by" (or similar).

@MichaelChirico
Copy link
Member

OK. something to raise with GitHub

@mattdowle
Copy link
Member

mattdowle commented Sep 12, 2019

I did look once and it has been raised already. Now that Microsoft bought GitHub for $7bn, perhaps things like that will get fixed faster and better.
Here's an image showing the tooltip I referred to ... the faint underline under the word close and then hover over it to see a pop-up. That's how to know, before merge, if the close signal has hooked up properly.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants