Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] data.table object with auto indexes saved under data.table 1.9.4 is not usable by data.table 1.9.6. #1396

Closed
GRandom opened this issue Oct 15, 2015 · 7 comments
Assignees
Milestone

Comments

@GRandom
Copy link

GRandom commented Oct 15, 2015

Apparently, 1.9.6 adds double underscore (__) to auto indexes and checks existence of these double underscores when accessing data.tables. However, 1.9.4. does not do either of these things. Thus, objects saved under 1.9.4 would not be usable under 1.9.6 and would throw an error in R. The bug will affect people who created and saved (perhaps, a lot of) objects under 1.9.4 and try to use them in later versions.

Minimal example:
Under R >3.2.1 and data.table <=1.9.4 run the following

abc.temp<-data.table(a=c(1,2),b=c(3,4))
abc.temp[a%in%c(1),b:=1]
save(abc.temp, file=”WhateverFolderWhateverFile.RData”)

str(abc.temp) output is:
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ a: num 1 2
$ b: num 1 4

  • attr(*, ".internal.selfref")=
  • attr(, "index")= atomic
    ..- attr(
    , "a")= int

Under R>=3.2.1 and data.table 1.9.6 open the file and try to use the object
load(”WhateverFolderWhateverFile.RData”, verbose=TRUE)
abc.temp[,c:=5]

R throws error
Error in [.data.table(abc.temp, , :=(c, 5)) :
Internal error: __ not found in index name

The same code under data.table 1.9.6 gives auto index with underscore

abc.temp<-data.table(a=c(1,2),b=c(3,4))
abc.temp[a%in%c(1),b:=1]

str(abc.temp) output is
Classes ‘data.table’ and 'data.frame': 2 obs. of 2 variables:
$ a: num 1 2
$ b: num 1 4

  • attr(*, ".internal.selfref")=
    • attr(, "index")= atomic
      ..- attr(
      , "__a")= int
@jangorecki
Copy link
Member

I'm not sure if it should be addresses now. It is definitely worth to remember on future changes, to handle them somehow, like rename the index attribute, but now when developing 1.9.7/1.9.8 it is already too late.
Assuming you want to keep your dependencies up-to-date, the best practice you can have is to setup CI on your project including dev version of dependencies.

@gsee
Copy link

gsee commented Oct 22, 2015

Please clarify what causes the attribute to be created. I've got a lot of RData files created over the past several years with 1.9.4 or earlier and it would be costly to re-create them. I haven't updated data.table on the server that creates them yet, but when I copy an RData file to a computer that has 1.9.6 installed they load just fine. I set auto.index to FALSE all over the place because of some issues it had in the beginning. Does that prevent the attribute from being created? I need ot be very careful not to do something that will make these files unreadable as it would take at least several months to re-create them. Thanks in advance.

Also, @jangorecki can you elaborate on how setting up CI would help. Would it just let me know that I have a problem, or would it somehow solve it?

@jangorecki
Copy link
Member

@gsee it would only let know about the problem.

Assuming you have auto.index to FALSE then AFAIK only set2key should create index.

About the migration workflow it should be something like:
on 1.9.6

  1. load RData
  2. look for the index attr
  3. copy index colnames to variable
  4. use set2keyv and your index-cols-var to get 1.9.6 index created
  5. save RData

@gsee
Copy link

gsee commented Oct 22, 2015

thanks @jangorecki

@djhurio
Copy link

djhurio commented Oct 27, 2015

I am still getting the same error with data.table version 1.9.6. Other option how to deal with it is to delete the index attribute. It seems to work, but I am not sure if it does not cause other issues.

attr(dat, "index") <- NULL

@arunsrinivasan
Copy link
Member

The bug was spotted on v1.9.6 which is already on CRAN. We can't change that. The fix will be available for next release, v1.9.8.

Yes, you can set the attribute to NULL and that wouldn't cause other issues. Although I'd do it using setattr().

@jangorecki
Copy link
Member

@djhurio if you need it now you can always install 1.9.7 version, see Installation. But it may be in fact easier to just recreate the index and save new objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants