forked from Arachnid/aetycoon
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add a property which stores its data in a compressed format
- Loading branch information
Showing
1 changed file
with
50 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ae5844c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Written as part of an answer for a StackOverflow question on the effectiveness of compression.
ae5844c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I'll integrate it. The only thing I would suggest is making it clear that this is for blobs, not text, since it doesn't specify encoding for unicode strings.
ae5844c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a really good point. It works as-is on "str" objects as well unicode strings which only contain ASCII characters. Any unicode encoding containing non-ASCII characters causes zlib to raise UnicodeDecodeError though.
It looks like app engine encodes all strings as UTF-8 before sending them to the datastore (regardless of how they are encoded initially). We could use that same strategy here -- just encode "value" as UTF-8 before passing it to zlib.compress() and the decode the value from UTF-8 after zlib.decompress().
This might add a little overhead for ASCII values which don't need to be UTF-8 encoded, but I suspect it wouldn't be a noticeable overhead (encoding could even be just a no-op for str objects at least).
Any thoughts?
ae5844c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct way to handle this, and the way App Engine does it, is to have separate properties for text (TextProperty, StringProperty) and binary data (BlobProperty). The former requires encoding and decoding, while the latter does not. I'd suggest either breaking out the CompressedProperty the same way (CompressedBlobProperty and CompressedTextProperty), or using a flag that specifies the nature of the data.
ae5844c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I've pushed this up in the latest commit.