Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculating time difference by group might get the units messed up #3694

Closed
oliver-oliver opened this issue Jul 10, 2019 · 2 comments · Fixed by #3837
Closed

calculating time difference by group might get the units messed up #3694

oliver-oliver opened this issue Jul 10, 2019 · 2 comments · Fixed by #3837
Milestone

Comments

@oliver-oliver
Copy link

@oliver-oliver oliver-oliver commented Jul 10, 2019

# Minimal reproducible example

The objective is to calculate the time between events grouped by some id. Here is an example:

library(data.table)
library(lubridate)

dt <- data.table(id = c(1,1:3), 
                 start = c("2015-01-01 12:00:00", "2015-12-01 12:00:00", "2019-01-01 12:00:00", NA),
                 end = c("2016-01-01 12:00:01", "2016-01-01 12:00:01", "2019-01-01 12:00:01", "2019-01-01 12:00:02"))

dt[, start := ymd_hms(start)]
dt[, end := ymd_hms(end)]

dt[, time_diff_1 := min(end) - max(start), by = .(id)]
dt[, time_diff_2 := end - start]

which results in:

   id               start                 end   time_diff_1   time_diff_2
1:  1 2015-01-01 12:00:00 2016-01-01 12:00:01 31.00001 secs 31536001 secs
2:  1 2015-12-01 12:00:00 2016-01-01 12:00:01 31.00001 secs  2678401 secs
3:  2 2019-01-01 12:00:00 2019-01-01 12:00:01  1.00000 secs        1 secs
4:  3                <NA> 2019-01-01 12:00:02       NA secs       NA secs

Both columns time_diff_1 and time_diff_2 display the time difference in seconds. However the time_diff_1 which resulted from the grouped calculation mixed up the units. The result for id == 1 is 31 days and one second. It seems as if the units were choosen automatically by group and then gotten overwritten.

To prevent this one can use difftime(). However I think there is room for improvment, e.g. a warning message when units do not match for different groups.

# Output of sessionInfo()

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.6.0      data.table_1.10.4    RevoUtilsMath_10.0.0

loaded via a namespace (and not attached):
[1] compiler_3.4.0   magrittr_1.5     RevoUtils_10.0.4 tools_3.4.0      stringi_1.1.5    stringr_1.2.0
@oliver-oliver
Copy link
Author

@oliver-oliver oliver-oliver commented Jul 10, 2019

Just saw on my stack overflow question that this issue is known.

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Jul 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants