Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from GZip to CodecZlib #31

Merged
merged 1 commit into from
Nov 4, 2017
Merged

Switch from GZip to CodecZlib #31

merged 1 commit into from
Nov 4, 2017

Conversation

alyst
Copy link
Collaborator

@alyst alyst commented Oct 30, 2017

CodecZlib.jl will eventually completely replace GZip.jl.
It also offers more complete IO interface implementation, e.g. GzipCompressorStream could be used by CSV.jl to read gzipped tables.
So by switching to CodecZlib we avoid having two different packages providing gzip support in different contexts.
See JuliaIO/CodecZlib.jl#7 for more details.

Gzip.jl checks whether the input stream is actually compressed, and silently switches to no-op wrapper if it's not.
CodecZlib throws an exception for uncompressed streams, so we have to check for it before.

@alyst
Copy link
Collaborator Author

alyst commented Oct 30, 2017

CodecZlib requires Julia v0.6, so it will probably have to wait until we switch to DataFrames v0.11.

@dmbates
Copy link
Contributor

dmbates commented Oct 30, 2017

I'm not sure I understand the comment that requiring Julia v0.6 will have to wait on DataFrames v0.11. Can you elaborate?

Also, if checking the stream to determine the compression type will be required, it would be worthwhile allowing for other compression types. An .RData file saved with xz compression can be considerably smaller than a corresponding file with gz compression.

@alyst
Copy link
Collaborator Author

alyst commented Oct 30, 2017

I'm not sure I understand the comment that requiring Julia v0.6 will have to wait on DataFrames v0.11. Can you elaborate?

There's PR #28 that both drops Julia v0.5 compatibility and updates to DataFrames v0.11. So we can merge this PR as soon as #28 lands.
I don't know what's the status of pre-0.11 DataFrames and its dependencies on v0.6. If it's working on v0.6, we can cherry-pick from #28 the minimal changes that drop v0.5 and enable v0.6 support.

Also, if checking the stream to determine the compression type will be required, it would be worthwhile allowing for other compression types.

Yes, that would be nice. So far I just implemented the "native" R compression formats: uncompressed (default for ASCII) and gzip (default for binary).
If we would like to extend the range of supported archivers, maybe there's some package around to automatically identify the compression type, I don't like much the current ad hoc magic number check.

@dmbates
Copy link
Contributor

dmbates commented Oct 30, 2017

Thanks for the reply. I'm not sure if I was clear that the R save function has an optional argument compress that defaults to gzip but also allows bzip2 and xz. If you check the datasets saved with many of the larger R packages they use xz to save storage.

@alyst
Copy link
Collaborator Author

alyst commented Oct 30, 2017

Thanks for the reply. I'm not sure if I was clear that the R save function has an optional argument compress that defaults to gzip but also allows bzip2 and xz. If you check the datasets saved with many of the larger R packages they use xz to save storage.

Something new to learn today, never read past compress = isTRUE(!ascii) line in the docs! :)
I think the support for CodecXz.jl and CodecBzip2 should be optional (via Require.jl).

@ararslan
Copy link
Member

I don't know what's the status of pre-0.11 DataFrames and its dependencies on v0.6

DataFrames currently works just fine on 0.6.

@alyst alyst mentioned this pull request Nov 1, 2017
@alyst alyst merged commit d5abbd2 into master Nov 4, 2017
@alyst alyst deleted the codec_zlib branch November 4, 2017 13:30
@alyst alyst mentioned this pull request Jul 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants