Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Handle .mtx.gz #28

Open
HaoZeke opened this issue Oct 22, 2023 · 4 comments
Open

ENH: Handle .mtx.gz #28

HaoZeke opened this issue Oct 22, 2023 · 4 comments
Labels
enhancement New feature or request
Milestone

Comments

@HaoZeke
Copy link
Member

HaoZeke commented Oct 22, 2023

future suggestion: support reading .mtx.gz files as I can download from for example https://math.nist.gov/MatrixMarket/data/NEP/quebec/qh1484.html ?

First noted here: ropensci/software-review#606 (comment).

@HaoZeke HaoZeke added the enhancement New feature or request label Oct 22, 2023
@HaoZeke HaoZeke added this to the v1.1.0 milestone Oct 22, 2023
@HaoZeke HaoZeke added ropensci-review Issues related to the ROpenSci review and removed ropensci-review Issues related to the ROpenSci review labels Oct 24, 2023
@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 24, 2023

This is a significant feature request, and can be done without much difficulty outside of R at the moment. Will be handled in a later release.

@alugowski
Copy link
Contributor

It's possible that this could be easy.

A big reason FMM uses iostreams is to enable use of existing libraries for specialized uses like this one. Here are two ideas.

Idea one: Use a GZip iostream wrapper. There are a bunch of lightweight-ish ones on GitHub, just note that some have dependencies on zlib. If your users are largely using precompiled binaries then Boost has a good one: https://www.boost.org/doc/libs/1_83_0/libs/iostreams/doc/classes/gzip.html . This would be a 4-line solution, like the example on the bottom of that page.

Idea two: Do what the Python binds do: provide an adapter between the stream types of Python and iostreams, then use Python's GZip decompressor. This adapter may or may not exist for R, I was lucky to find one for Python. The upside is that you can also use it to adapt all streams for that language (Python users often use StringIO/ByteIO objects), though I'm not sure if that's a common usage pattern in R. The upside is the extra flexibility and not having to maintain gzip/bz2 or whatever dependencies. The downside is that the adapter is likely slower than native C++ file IO, so you'll want two code paths. Gzip decompression is slow anyway.

@HaoZeke
Copy link
Member Author

HaoZeke commented Nov 3, 2023

Thanks a ton for looking into this.

It's possible that this could be easy.

A big reason FMM uses iostreams is to enable use of existing libraries for specialized uses like this one. Here are two ideas.

Idea one: Use a GZip iostream wrapper. There are a bunch of lightweight-ish ones on GitHub, just note that some have dependencies on zlib. If your users are largely using precompiled binaries then Boost has a good one: https://www.boost.org/doc/libs/1_83_0/libs/iostreams/doc/classes/gzip.html . This would be a 4-line solution, like the example on the bottom of that page.

Idea two: Do what the Python binds do: provide an adapter between the stream types of Python and iostreams, then use Python's GZip decompressor. This adapter may or may not exist for R, I was lucky to find one for Python. The upside is that you can also use it to adapt all streams for that language (Python users often use StringIO/ByteIO objects), though I'm not sure if that's a common usage pattern in R. The upside is the extra flexibility and not having to maintain gzip/bz2 or whatever dependencies. The downside is that the adapter is likely slower than native C++ file IO, so you'll want two code paths. Gzip decompression is slow anyway.

I think idea two is feasible for R as well (due to its transparent support of Gzip files)

However, from a performance perspective idea one is way nicer, and if the dependencies are not too heavy (i.e. the resulting library with dependencies is small enough for CRAN and builds quickly enough, within 30 min) then that would the best option probably.

I will investigate both :)

@alugowski
Copy link
Contributor

Curious what you decide on!

I bet the performance will be comparable, since it'll likely be zlib doing the work either way. Just who wraps it better :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants