Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load not working #75

Closed
timothyslau opened this issue Jun 24, 2020 · 10 comments
Closed

load not working #75

timothyslau opened this issue Jun 24, 2020 · 10 comments
Labels

Comments

@timothyslau
Copy link

In R:
titanic <- data.frame(Titanic); save(titanic, file = "titanic.rdata")

In Julia:
titanic = RData.load("titanic.rdata", convert=true)

Does not result in a DataFrame as I think convert=true suggests it would.

@alyst
Copy link
Collaborator

alyst commented Jun 24, 2020

load() should return a dictionary with keys being the names of the variables you provided to the save().
What does titanic[”titanic"] give you?
So far it looks like you just need to fix your script, but if it's a real bug, we would need the details: R/Julia/RData versions, some log of the Julia output and ”titanic.rdata” file, if possible.

@timothyslau
Copy link
Author

I understood that from the README.md file that when convert = true it should "...automatically convert R objects into Julia equivalent."
image

But instead it makes a dictionary instead of a DataFrame like the README.md suggests.

@alyst
Copy link
Collaborator

alyst commented Jun 25, 2020

It does what it says, but, as you might know, .rdata file can store multiple objects. So load() tries to convert all of them into appropriate Julia equivalents and returns the result as a dictionary.

What you want is probably .rds file, which only stores the single object. But you have to use saveRDS() function in R to generate it, and the filename should have .rds extension.

@alyst alyst closed this as completed Jun 30, 2020
@alyst alyst added the invalid label Jun 30, 2020
@timothyslau
Copy link
Author

timothyslau commented Jul 8, 2020

I tried the method you described of creating and loading .rds files.
If I run in R:
saveRDS(object = Orange, file = "C:\\Orange.rds")

and then in Julia:
RData.load("C:\\Orange.rds")

I get 2 warnings about elements in lists and "assuming it's the last element"
image

But it does create a dataframe now.

Is this warning because I've specified some argument incorrectly or is it a bug?

@alyst
Copy link
Collaborator

alyst commented Jul 8, 2020

It's not a bug.
R 3.5 introduced "alternative representation" for some of the objects.
I fixed the RData.jl to work with most of the cases that I encountered.
But since I don't have time to reverse-engineer the updated R save/load() code, and there is no specification for the format changes, it could be that in some of the cases Rdata.jl might fail to read some .RData files.
So this warning is just the diagnostic in case something goes wrong.
However, in most cases, the new .Rdata files are read just fine.

I assume that in your case the Julian dataframe matches the one in R.

@timothyslau
Copy link
Author

Yes, the Julia DataFrame looks correct,
I just wasn't sure why I was getting warnings.
Thanks for the explanation.
If I understand correctly, essentially there's some changes to newer versions of R that the RData.jl package hasn't been updated to accommodate. If this isn't a bug, should it go in a backlog of issues to correct by someone at some point in the future?

@alyst
Copy link
Collaborator

alyst commented Jul 8, 2020

If this isn't a bug, should it go in a backlog of issues to correct by someone at some point in the future?

There are some comments in the code. But since there's no specification of the format, neither a clear understanding where unsupported features may appear in real life, it's difficult to write an issue that somebody can pick up and work on. We will fix the format issues when people will discover them.

But if there's anyone volunteering to review the format changes and implement them, she/he is most eagerly welcome.

@reumle
Copy link

reumle commented Nov 7, 2020

Hi,

  1. Is it ok if I open an issue with this format problem, so as to flag the help needed to review it?
  2. Is this the relevant part of the 3.5 release notes, as far as you can tell?
R has new serialization format (version 3) which supports custom serialization of ALTREP framework objects. 
These objects can still be serialized in format 2, but less efficiently. Serialization format 3 also records the current 
native encoding of unflagged strings and converts them when de-serialized in R running under different 
native encoding. Format 3 comes with new serialization magic numbers (RDA3, RDB3, RDX3). Format 3
 can be selected by version = 3 in save(), serialize() and saveRDS(), but format 2 remains the default for 
all serialization and saving of the workspace. Serialized data in format 3 cannot be read by versions 
of R prior to version 3.5.0.

On my side, I might be able to help analyze the issue, but I am mainly an R user, never looked into its source code...

Cheers!

@alyst
Copy link
Collaborator

alyst commented Nov 7, 2020

@reumle Sure. You can open the issue, and the PRs that improve the behaviour are welcome.
The v3 format is, in principle, supported, but there are special situations (like the ones that cause warnings or #74) that need to be fixed. It would be nice if the issue(s) are focused on those.

@reumle
Copy link

reumle commented Nov 8, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants