Skip to content

Commit

Permalink
add a property which stores its data in a compressed format
Browse files Browse the repository at this point in the history
  • Loading branch information
dound committed Apr 6, 2010
1 parent 9534888 commit ae5844c
Showing 1 changed file with 50 additions and 0 deletions.
50 changes: 50 additions & 0 deletions __init__.py
Expand Up @@ -3,6 +3,7 @@
import logging
import os
import pickle
import zlib
from google.appengine.api import users
from google.appengine.ext import db

Expand Down Expand Up @@ -581,3 +582,52 @@ def make_value_from_datastore(self, value):
if value is None:
return None
return self.index_to_choice[value]


class CompressedDataProperty(db.Property):
"""A property for storing compressed data or text.
Example usage:
>>> class CompressedDataModel(db.Model):
... ct = CompressedDataProperty()
You create a compressed data property, simply specifying the data or text:
>>> model = CompressedDataModel(ct='example uses text too short to compress well')
>>> model.ct
'example uses text too short to compress well'
>>> model.ct = 'green'
>>> model.ct
'green'
>>> model.put() # doctest: +ELLIPSIS
datastore_types.Key.from_path(u'CompressedDataModel', ...)
>>> model2 = CompressedDataModel.all().get()
>>> model2.ct
'green'
Compressed data is not indexed and therefore cannot be filtered on:
>>> CompressedDataModel.gql("WHERE v = :1", 'green').count()
0
"""
data_type = db.Blob

def __init__(self, level=6, *args, **kwargs):
"""Constructor.
Args:
level: Controls the level of zlib's compression (between 1 and 9).
"""
super(CompressedDataProperty, self).__init__(*args, **kwargs)
self.level = level

def get_value_for_datastore(self, model_instance):
value = self.__get__(model_instance, model_instance.__class__)
if value is not None:
return db.Blob(zlib.compress(value, self.level))

def make_value_from_datastore(self, value):
if value is not None:
return zlib.decompress(value)

5 comments on commit ae5844c

@dound
Copy link
Owner Author

@dound dound commented on ae5844c Apr 6, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Written as part of an answer for a StackOverflow question on the effectiveness of compression.

@Arachnid
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I'll integrate it. The only thing I would suggest is making it clear that this is for blobs, not text, since it doesn't specify encoding for unicode strings.

@dound
Copy link
Owner Author

@dound dound commented on ae5844c Apr 9, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a really good point. It works as-is on "str" objects as well unicode strings which only contain ASCII characters. Any unicode encoding containing non-ASCII characters causes zlib to raise UnicodeDecodeError though.

It looks like app engine encodes all strings as UTF-8 before sending them to the datastore (regardless of how they are encoded initially). We could use that same strategy here -- just encode "value" as UTF-8 before passing it to zlib.compress() and the decode the value from UTF-8 after zlib.decompress().

This might add a little overhead for ASCII values which don't need to be UTF-8 encoded, but I suspect it wouldn't be a noticeable overhead (encoding could even be just a no-op for str objects at least).

Any thoughts?

@Arachnid
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct way to handle this, and the way App Engine does it, is to have separate properties for text (TextProperty, StringProperty) and binary data (BlobProperty). The former requires encoding and decoding, while the latter does not. I'd suggest either breaking out the CompressedProperty the same way (CompressedBlobProperty and CompressedTextProperty), or using a flag that specifies the nature of the data.

@dound
Copy link
Owner Author

@dound dound commented on ae5844c Apr 10, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I've pushed this up in the latest commit.

Please sign in to comment.