Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Under certain circumstances it is advantageous to supply a
buffer
type to the compression and decompression routines. My use case is decompressing Numpy arrays from string in Bloscpack. The input data is a string and during decompression the individual chunks need to be taken from the string. In order to do this you could for example slice the string. However, AFAIU this necessitates a memory copy. The current implementation is such that I use a cStringIO, since much of the stuff in Bloscpack is implemented using file pointers. However, from what I understand, reading from a cStringIO object also requires a memory copy, it would seem. The alternative is to use thebuffer
builtin to emulate a file:esc/bloscpack@dd77456
And then use that as a source for the compressed data.
Initial benchmarks look promising:
Where the
fast_unpack_ndarray_str
is using thebuffer
under the hood. In fact this solution is of the same order as the plaincompress_ptr
method:Support was easy to implemet since we use
s#
in thePyArg_ParseTuple
which can accept abuffer
as input and will cast that into a c-string:So they only thing left to do, was to allow this from the
toplevel.py
A remaining issue is that
buffer
was removed in Python 3 and the documentation seems to suggest using amemoryview
but I can't get it to work. The error is something about requiring a 'pinnned' read-only buffer. I hope to look into a Py3 solution soon but wanted to float this idea already anyway.