Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDate storage mode not preserved in data frames #2008

Closed
rmcgehee opened this issue Jan 30, 2017 · 7 comments · Fixed by #3602
Closed

IDate storage mode not preserved in data frames #2008

rmcgehee opened this issue Jan 30, 2017 · 7 comments · Fixed by #3602
Milestone

Comments

@rmcgehee
Copy link

@rmcgehee rmcgehee commented Jan 30, 2017

Hi,
To save space, I've been using IDate within my data frames (not data tables) to save space. Maybe this is a user bug right here, if IDate isn't meant to be used outside of data.tables, but I'll continue.

Seems that rbind() does not preserve the integer storage mode of IDate inside of data.frames, which causes problems when using IDate arithmetic. Instead, rbind coerces IDate into storage.mode double. My code has worked perfectly for a while. I only started seeing errors after updating to the latest data.table 1.10.0 and R 3.3.2. However, I'm not sure if the coercion to double didn't happen in the earlier versions, or just that it wasn't checked before.

Here's an example:

x <- data.frame(date=as.IDate(Sys.Date()))
y <- rbind(x, x)
x$date -1
# [1] "2017-01-29"
y$date -1

Error in -.IDate(y$date, 1) :
Internal error: storage mode of IDate is somehow no longer integer

storage.mode(x$date)
# [1] "integer"
storage.mode(y$date)
# [1] "double"
@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Jan 30, 2017

Interesting. Somehow this line in base::rbind is converting to double:

#Approximately line 188
value[[jj]][ri] <- if (is.factor(xij)) 
          as.vector(xij)

xij is:

dput(xij)
# structure(17196L, class = c("IDate", "Date"))
is.factor(xij)
# [1] FALSE

So as expected. And as.vector didn't break anything:

dput(as.vector(xij))
# 17196L

Prior to running that line, value[[jj]][ri] is the proper storage mode:

dput(value[[jj]][ri])
# structure(17196L, class = c("IDate", "Date"))

However, after running that line, value[[jj]][ri] is silently converted to double:

dput(value[[jj]][ri])
# structure(17196, class = c("IDate", "Date"))

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Jan 30, 2017

Even stranger, on a fresh R session without data.table loaded:

x <- data.frame(date=structure(17196L, class = c("IDate", "Date")))
y <- rbind(x, x)
dput(y$date)
# structure(c(17196L, 17196L), class = c("IDate", "Date"))
library(data.table)
y2 <- rbind(x, x)
dput(y2$date)
# structure(c(17196, 17196), class = c("IDate", "Date"))

@lbilli
Copy link

@lbilli lbilli commented Mar 26, 2017

I have encountered an issue that may be related.
Say I want to append an element at the end of an IDate vector.
Concatenation works fine:

dd <- as.IDate("2017-03-26")
dd <- c(dd, dd[1] + 1L)
storage.mode(dd)
# [1] "integer"

However, this changes the storage mode:

dd <- as.IDate("2017-03-26")
dd[2] <- dd[1] + 1L
storage.mode(dd)
# [1] "double"

Numerically are the same but in the second case the storage mode changes to double.
This doesn't happen if dd is a regular integer vector:

dd <- 1L
dd[2] <- dd[1] + 1L
storage.mode(dd)
# [1] "integer"

@franknarf1
Copy link
Contributor

@franknarf1 franknarf1 commented Apr 13, 2018

I ran into @lbilli 's case today:

library(data.table)
d = structure(c(17473L, NA), class = c("IDate", "Date"))
new_d2 = structure(17623L, class = c("IDate", "Date"))

d[2] <- new_d2
dput(d)
# structure(c(17473, 17623), class = c("IDate", "Date"))

I guess this means IDate vectors can only be safely edited with :=? Surprised to see it breaking with something so common as [<-.

In case you're wondering why I'm doing this: I have a function that takes a length-2 idate vector representing a range. If the upper bound is missing, it is filled in by taking the max date observed in some other vectors.


Another related case:

d = structure(c(17473L, NA), class = c("IDate", "Date"))
is.na(d) <- c(FALSE, TRUE)
dput(d)
# structure(c(17473, NA), class = c("IDate", "Date"))

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented May 26, 2019

The whackiness continues:

i = 1L
class(i) = 'hey'
DF = data.frame(i)

Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ‘"hey"’ to a data.frame

So you apparently can't put any old class into a data.frame...

But if you make it also inherit a base class:

i = 1L
class(i) = c('hey', 'Date')
DF = data.frame(i)

No error, and integer mode is respected under rbind (this was basically noted above by applying class = c('IDate', 'Date') without data.table loaded.

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented May 26, 2019

Potential lead in do_cbind:

     * The dispatch rule here is as follows:
     *
     * 1) For each argument we get the list of possible class
     *	  memberships from the class attribute.
     *
     * 2) We inspect each class in turn to see if there is an
     *	  applicable method.
     *
     * 3) If we find an applicable method we make sure that it is
     *	  identical to any method determined for prior arguments.
     *	  If it is identical, we proceed, otherwise we immediately
     *	  drop through to the default code.
     */

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented May 26, 2019

Adding rbind.IDate didn't solve anything.

However, in desperation I tried to catch any time any IDate-related method is called:

sapply(ls(envir = asNamespace('data.table'), pattern = 'IDate'), function(x) debugonce(get(x, envir = asNamespace('data.table'))))

Then ran rbind(x, x) and was taken to a call of as.Date.IDate.

Really have no idea why that's being called, but that would certainly run as.numeric. And then it still ends up as an IDate somehow? Peculiar.

Anyway, if we eliminate as.numeric in as.Date.IDate, this bug is fixed. Is there any reason we convert to numeric explicitly in this method? It would seem to be safe to leave it as integer but just change the class...

Would still like to know the root cause of why as.Date is being called, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

5 participants