Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does padr has a maximum limit on year? #51

Closed
dareneiri opened this issue Oct 11, 2017 · 6 comments
Closed

does padr has a maximum limit on year? #51

dareneiri opened this issue Oct 11, 2017 · 6 comments
Labels

Comments

@dareneiri
Copy link

dareneiri commented Oct 11, 2017

It seems that padr has a maximum limit on year which can be processed.
Some datasets, like the MIMICIII database have datetimes shifted in the future randomly. So some years are set in 2100 for example. If the year is greater than 20 from the current year, then padr cannot thicken.

For now it seems that I can subtract year since it's not relevant to my analysis.

If I try to thicken the data without changing the year, then I get an error:
Here's some sample data

> packageVersion("tidyverse")
[1] ‘1.1.1> packageVersion("lubridate")
[1] ‘1.6.0> packageVersion("padr")
[1] ‘0.3.0> library(tidyverse)
> library(lubridate)
> library(padr)
> 
> df <- read.csv("padr_data.csv")
> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))- dyears(63)))
> 
> df$sbp <- as.numeric(df$sbp)
> 
> summary(df)
   charttime                        sbp       
 Min.   :2038-11-04 18:30:00   Min.   : 62.0  
 1st Qu.:2038-11-04 19:33:45   1st Qu.: 84.5  
 Median :2038-11-04 20:52:30   Median : 95.0  
 Mean   :2038-11-04 21:08:22   Mean   :100.9  
 3rd Qu.:2038-11-04 22:26:15   3rd Qu.:102.0  
 Max.   :2038-11-05 00:42:00   Max.   :217.0  
                               NA's   :12     
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt" 

$sbp
[1] "numeric"

> df$charttime %>% get_interval
[1] "min"
> 
> # this does not work
> df[!is.na(df$charttime),] %>%
+   thicken(interval = 'hour')
Error in if (to_date) x <- as.Date(x, tz = attr(x, "tzone")) : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In round_down_core(a, b) : NAs introduced by coercion to integer range
2: In round_down_core(a, b) : NAs introduced by coercion to integer range

Change dyears(63) to dyears(64)

> df <- mutate_at(df, vars(ends_with("time")), funs(ymd_hms(., tz = "UTC", locale = Sys.getlocale("LC_TIME"))- dyears(64)))
> 
> df$sbp <- as.numeric(df$sbp)
> 
> summary(df)
   charttime                        sbp       
 Min.   :2037-11-04 18:30:00   Min.   : 62.0  
 1st Qu.:2037-11-04 19:33:45   1st Qu.: 84.5  
 Median :2037-11-04 20:52:30   Median : 95.0  
 Mean   :2037-11-04 21:08:22   Mean   :100.9  
 3rd Qu.:2037-11-04 22:26:15   3rd Qu.:102.0  
 Max.   :2037-11-05 00:42:00   Max.   :217.0  
                               NA's   :12     
> lapply(df, class)
$charttime
[1] "POSIXct" "POSIXt" 

$sbp
[1] "numeric"

> df$charttime %>% get_interval
[1] "min"
> 
> # this does work
> df[!is.na(df$charttime),] %>%
+   thicken(interval = 'hour')
             charttime sbp      charttime_hour
1  2037-11-04 18:30:00  NA 2037-11-04 18:00:00
2  2037-11-04 18:45:00  62 2037-11-04 18:00:00
3  2037-11-04 19:00:00  66 2037-11-04 19:00:00
4  2037-11-04 19:12:00  NA 2037-11-04 19:00:00
5  2037-11-04 19:14:00  NA 2037-11-04 19:00:00
6  2037-11-04 19:15:00 217 2037-11-04 19:00:00
7  2037-11-04 19:26:00  NA 2037-11-04 19:00:00
8  2037-11-04 19:30:00 102 2037-11-04 19:00:00
9  2037-11-04 19:45:00  94 2037-11-04 19:00:00
10 2037-11-04 19:59:00  NA 2037-11-04 19:00:00
11 2037-11-04 20:00:00  80 2037-11-04 20:00:00
12 2037-11-04 20:04:00  NA 2037-11-04 20:00:00
13 2037-11-04 20:15:00  91 2037-11-04 20:00:00
14 2037-11-04 20:30:00  86 2037-11-04 20:00:00
15 2037-11-04 20:45:00  96 2037-11-04 20:00:00
16 2037-11-04 21:00:00  73 2037-11-04 21:00:00
17 2037-11-04 21:15:00  84 2037-11-04 21:00:00
18 2037-11-04 21:30:00  96 2037-11-04 21:00:00
19 2037-11-04 21:45:00 100 2037-11-04 21:00:00
20 2037-11-04 21:51:00  NA 2037-11-04 21:00:00
21 2037-11-04 22:00:00  NA 2037-11-04 22:00:00
22 2037-11-04 22:15:00 123 2037-11-04 22:00:00
23 2037-11-04 22:30:00 125 2037-11-04 22:00:00
24 2037-11-04 22:45:00 132 2037-11-04 22:00:00
25 2037-11-04 23:00:00  88 2037-11-04 23:00:00
26 2037-11-04 23:15:00  NA 2037-11-04 23:00:00
27 2037-11-04 23:45:00  NA 2037-11-04 23:00:00
28 2037-11-05 00:00:00 102 2037-11-05 00:00:00
29 2037-11-05 00:28:00  NA 2037-11-05 00:00:00
30 2037-11-05 00:42:00  NA 2037-11-05 00:00:00
@EdwinTh
Copy link
Owner

EdwinTh commented Oct 12, 2017

Thank you for informing me. It honestly never occurred to me to check so far in the future. I am not sure if this is a padr thing or that it is due to the underlying R POSIX mechanism. I will dig into it as soon schedule allows.

@Blundys
Copy link

Blundys commented Apr 16, 2019

I was also having this problem. I chased it down to round_down_core.cpp or round_up_core.cpp so looks like its a c++ problem. Works prior to 19th Jan 2038 and not after so looks like its related to the Year 2038 problem

@EdwinTh
Copy link
Owner

EdwinTh commented Apr 16, 2019

Thanks for your digging, hope to schedule some time for maintenance soon to look further into it.

@EdwinTh EdwinTh added the bug label Apr 21, 2019
@EdwinTh
Copy link
Owner

EdwinTh commented May 23, 2019

Looked into it, it is indeed the year 2038 problem. Meaning that when using POSIXt there is integer overflow from a moment in this year. Alas, research seemed to show there is no universal fix for 32bit machines. Switching to int64 would result in the package only working on 64bit machines, which I am reluctant to do. For the moment I tend towards an informed error and leaving the work around for the user.

@EdwinTh
Copy link
Owner

EdwinTh commented May 23, 2019

These are the unit tests that should pass once the problem is resolved.

a <- as.numeric(ymd_h(c("20380601 00", "20390601 00")))
b <- as.numeric(ymd_h(c("20380101 00", "20390101 00", "20400101 00")))

test_that("round_down_core works after 2038 in posix", {
  expect_equal(round_down_core(a, b), b[1:2])
})

test_that("round_down_core_prev works after 2038 in posix", {
  expect_equal(round_down_core_prev(a, b), b[1:2])
})

test_that("round_up_core works after 2038 in posix", {
  expect_equal(round_up_core(a, b), b[2:3])
})

test_that("round_down_core_prev works after 2038 in posix", {
  expect_equal(round_up_core_prev(a, b), b[2:3])
})

EdwinTh added a commit that referenced this issue May 23, 2019
EdwinTh added a commit that referenced this issue May 23, 2019
@Blundys
Copy link

Blundys commented May 28, 2019

Thanks for looking into it. Yeah at least with an informative error people would understand what has gone wrong so it a good idea at least in the short term

EdwinTh added a commit that referenced this issue Jul 3, 2021
@EdwinTh EdwinTh closed this as completed Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants