-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds support for .rds files with readRDS function #33
Conversation
…of data frames now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for your contribution! I think we would need to make one more iteration and change readRDS
into load
(+ necessary changes to FileIO
) to ensure API uniformity.
src/RData.jl
Outdated
@@ -87,4 +88,22 @@ end | |||
|
|||
load(s::Stream{format"RData"}; kwoptions...) = load(s, kwoptions) | |||
|
|||
function readRDS(f::AbstractString; kwoptions...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried to add RDS format to FileIO.jl? It should be straightforward and almost identical to what is there for RData (with the exception of "RDSX" header line). So readRDS()
would change into load(s::File{format"RDataSingle"}/Stream{format"RDataSingle"}, ...)
pair of methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've gotten things in the state you wished: factored out decompress, reduced the test set for the rds to just a data.frame but in binary compressed and decompressed and ascii formats, and duplicated the load interface and removed readRDS.
This all seems to work for me locally with local changes to FileIO to register the format. I couldn't figure out how to get the format to register in the RData package itself to make testing easier before the FileIO pull.
On that note...
If you know anything about the file formats, can you take a look at the detect_rdata_single function I'm proposing in my FileIO branch before requesting the pull? I don't know anything about the formats, and it seems that when the RDSX header line is removed, not much is left before the file format itself begins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think we're almost there!
I couldn't figure out how to get the format to register in the RData package itself to make testing easier before the FileIO pull.
AFAIR, you can do FileIO.add_format()
to add the format to the FileIO
registry externally, but I'm not sure it really works since it happens after the package initialization.
Anyway, adding RDS support directly to FileIO
is the right thing to do.
Your FileIO branch looks ok to me. Do you know whether ".rds" is the only extension or other alternatives (".RDS") also exist? If so, I would list them to add_format()
call too.
For the FileIO PR to be merged you also need to add the test (see test/query.jl
for how it was done for RData). As long as detect_rdata_single()
works for the simple test file, that function is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also mention this PR in NEWS.md
?
src/RData.jl
Outdated
function readRDS(f::AbstractString; kwoptions...) | ||
io = open(f, "r") | ||
try | ||
gzipped = read(io, UInt8) == 0x1F && read(io, UInt8) == 0x8B # check GZip magic number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a duplication of the code from load(format{"RData"})
. I think it's a good idea to extract it into a separate function, especially since we are going to extend the supported compression formats. E.g.
# check whether the stream is compressed (Gzip) and return the decompressed stream
function decompress(io::IO)
gzipped = read(io, UInt8) == 0x1F && read(io, UInt8) == 0x8B # check GZip magic number
seekstart(io)
if gzipped
return GzipDecompressorStream(io) # if compressed, transcode gzipped stream
else
return io
end
end
It should be called then within try
block: io = decompress(io)
test/generate_rda.R
Outdated
|
||
# for converting rda files to rds to test with readRDS | ||
rdafiles = list.files("data/", pattern="*.rda", full.names=T) | ||
for (rdafile in rdafiles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary to have the equivalents of RDA test files as RDS: the serialization format is the same and the code that reads these objects is also the same. So these tests would just test the same code with the same data.
I think it should be enough to test some moderately complex object (say, data.frame), but check that both binary (compressed and uncompressed) and ASCII versions of RDS format are supported.
* requires FileIO rdata_single branch to be pulled https://github.com/jsams/FileIO.jl/tree/rdata_single/
test/RDS.jl
Outdated
chr = ["ab", "c"], | ||
factor = pool(["ab", "c"]), | ||
cplx = Complex128[1.1+0.5im, 1.0im]) | ||
rdf = sexp2julia(load("$testdir/data/types.rds",convert=false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to test both convert=false
and convert=true
(the default) modes. convert=true
should return DataFrame
directly
Ok, I've made a PR in FileIO, updated the news (though it's not clear how you see the versions moving forward, please modify to your taste), and had the tests also test the default convert=true condition. |
test/RDS.jl
Outdated
factor = pool(["ab", "c"]), | ||
cplx = Complex128[1.1+0.5im, 1.0im]) | ||
rdf = sexp2julia(load("$testdir/data/types.rds",convert=false)) | ||
@test eltypes(rdf) == eltypes(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed that: I think before calling eltypes()
or comparing with df
it makes sense to do @test rdf isa DataFrame
(for the other load()
as well). If load
goes completely wrong, it would help to diagnose it properly, otherwise @test eltype...
will fail with MethodError
.
@jsams Thanks! Let's see what happens with FileIO PR. I will merge once that one is merged. Maybe by that time there would be some progress with POSIXct, then we can tag 0.3.0 with both these features. |
Your PR to FileIO is merged and is available in v0.6.0. |
I've brought this in line with the latest master. runtests throws some errors in the master, but my additions don't add any errors and all of my tests pass. Any reason to not pull? |
I think this would more-or-less close #22 though it doesn't integrate with the load functionality
Note that the tests are ever so slightly different. This is on purpose. rds files do not name the objects they contain. I converted all the rda test files into named lists, and saved them as rds files. So, while the list itself does not contain a name, it contains objects which do have names, which is what you are accessing after conversion into the julia type. With convert=false, a bare R type is returned.