Permalink
Browse files

add a property which stores its data in a compressed format

  • Loading branch information...
1 parent 9534888 commit ae5844cdd69ee3aa4c0f223b75930a3d6e6a5cb7 @dound committed Apr 6, 2010
Showing with 50 additions and 0 deletions.
  1. +50 −0 __init__.py
View
@@ -3,6 +3,7 @@
import logging
import os
import pickle
+import zlib
from google.appengine.api import users
from google.appengine.ext import db
@@ -581,3 +582,52 @@ def make_value_from_datastore(self, value):
if value is None:
return None
return self.index_to_choice[value]
+
+
+class CompressedDataProperty(db.Property):
+ """A property for storing compressed data or text.
+
+ Example usage:
+
+ >>> class CompressedDataModel(db.Model):
+ ... ct = CompressedDataProperty()
+
+ You create a compressed data property, simply specifying the data or text:
+
+ >>> model = CompressedDataModel(ct='example uses text too short to compress well')
+ >>> model.ct
+ 'example uses text too short to compress well'
+ >>> model.ct = 'green'
+ >>> model.ct
+ 'green'
+ >>> model.put() # doctest: +ELLIPSIS
+ datastore_types.Key.from_path(u'CompressedDataModel', ...)
+
+ >>> model2 = CompressedDataModel.all().get()
+ >>> model2.ct
+ 'green'
+
+ Compressed data is not indexed and therefore cannot be filtered on:
+
+ >>> CompressedDataModel.gql("WHERE v = :1", 'green').count()
+ 0
+ """
+ data_type = db.Blob
+
+ def __init__(self, level=6, *args, **kwargs):
+ """Constructor.
+
+ Args:
+ level: Controls the level of zlib's compression (between 1 and 9).
+ """
+ super(CompressedDataProperty, self).__init__(*args, **kwargs)
+ self.level = level
+
+ def get_value_for_datastore(self, model_instance):
+ value = self.__get__(model_instance, model_instance.__class__)
+ if value is not None:
+ return db.Blob(zlib.compress(value, self.level))
+
+ def make_value_from_datastore(self, value):
+ if value is not None:
+ return zlib.decompress(value)

5 comments on commit ae5844c

Owner

dound replied Apr 6, 2010

Written as part of an answer for a StackOverflow question on the effectiveness of compression.

Looks good - I'll integrate it. The only thing I would suggest is making it clear that this is for blobs, not text, since it doesn't specify encoding for unicode strings.

Owner

dound replied Apr 9, 2010

Thanks, this is a really good point. It works as-is on "str" objects as well unicode strings which only contain ASCII characters. Any unicode encoding containing non-ASCII characters causes zlib to raise UnicodeDecodeError though.

It looks like app engine encodes all strings as UTF-8 before sending them to the datastore (regardless of how they are encoded initially). We could use that same strategy here -- just encode "value" as UTF-8 before passing it to zlib.compress() and the decode the value from UTF-8 after zlib.decompress().

This might add a little overhead for ASCII values which don't need to be UTF-8 encoded, but I suspect it wouldn't be a noticeable overhead (encoding could even be just a no-op for str objects at least).

Any thoughts?

The correct way to handle this, and the way App Engine does it, is to have separate properties for text (TextProperty, StringProperty) and binary data (BlobProperty). The former requires encoding and decoding, while the latter does not. I'd suggest either breaking out the CompressedProperty the same way (CompressedBlobProperty and CompressedTextProperty), or using a flag that specifies the nature of the data.

Owner

dound replied Apr 10, 2010

Good idea. I've pushed this up in the latest commit.

Please sign in to comment.