This project is now called Schematics and exists over here: https://github.com/j2labs/schematics
This code and documentation is here to preserve amusing history of this project.
Aside from being a cheeky excuse to make people say things that sound sorta dirty, DictShield is a database-agnostic modeling system. It provides a way to model, validate and reshape data easily. All without requiring any particular database.
A blog model might look like this:
from dictshield.document import Document
from dictshield.fields import StringField
class BlogPost(Document):
title = StringField(max_length=40)
body = StringField(max_length=4096)
DictShield objects serialize to JSON by default. Store them in Memcached, MongoDB, Riak, whatever you need.
>>> from dictshield.document import Document
>>> from dictshield.fields import StringField
>>> class Comment(Document):
... name = StringField(max_length=10)
... body = StringField(max_length=4000)
...
>>> data = {'name':'a hacker', 'body':'DictShield makes validation easy'}
>>> Comment(**data).validate()
True
Let's see what happens if we try using invalid data.
>>> data['name'] = 'a hacker with a name that is too long'
>>> Comment(**data).validate()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/to/site-packages/dictshield/document.py", line 280, in validate
field._validate(value)
File "/path/to/site-packages/dictshield/fields/base.py", line 99, in _validate
self.validate(value)
File "/path/to/site-packages/dictshield/fields/base.py", line 224, in validate
self.field_name, value)
dictshield.base.ShieldException: String value is too long - name:a hacker with a name who is too long
Combining dictshield with JSON coming from a web request is quite natural as well. Say we have some data coming in from an iPhone:
json_data = request.post.get('data')
data = json.loads(json_data)
Validating the data then looks like this: Comment(**data).validate()
.
Easy.
DictShield aims to provides helpers for a few types of common needs for modeling. It has been useful on the server-side so far, but I believe it could also serve for building an RPC.
-
Creating Flexible Documents
-
Easy To Use With Databases Or Caches
-
A Type System
-
Validation Of Types
-
Input / Output Shaping
DictShield also allows for object hierarchy's to be mapped into dictionaries too. This is useful primarily to those who use DictShield to instantiate classes representing their data instead of just filtering dictionaries through the class's static methods.
There are a few ways to use DictShield. A simple case is to create a class
structure that has typed fields. DictShield offers multiple types in
fields.py
, like an EmailField or DecimalField.
Below is an example of a Media class with a single field, the title.
from dictshield.document import Document
from dictshield.fields import StringField
class Media(Document):
"""Simple document that has one StringField member
"""
title = StringField(max_length=40)
You create the class just like you would any Python class. And we'll see how that class is represented when serialized to a Python dictionary.
m = Media()
m.title = 'Misc Media'
m.to_python()
The output from this looks like:
{
'_types': ['Media'],
'_cls': 'Media',
'title': u'Misc Media'
}
All the meta information is removed and we have just a barebones representation
of our data. Notice that the class information is still there as _cls
and
_types
.
We see two keys that come from Media's meta class: _types
and _cls
.
_types
stores the hierachy of Document classes used to create the
document. _cls
stores the specific class instance. This becomes more
obvious when I subclass Media to create the Movie document below.
import datetime
from dictshield.fields import IntField
class Movie(Media):
"""Subclass of Foo. Adds bar and limits publicly shareable
fields to only 'bar'.
"""
_public_fields = ['title','year']
year = IntField(min_value=1950,
max_value=datetime.datetime.now().year)
personal_thoughts = StringField(max_length=255)
Here's an instance of the Movie class:
mv = Movie()
mv.title = u'Total Recall'
mv.year = 1990
mv.personal_thoughts = u'I wish I had three hands...'
This is the document serialized to a Python dictionary:
{
'personal_thoughts': u'I wish I had three hands...',
'_types': ['Media', 'Media.Movie'],
'title': u'Total Recall',
'_cls': 'Media.Movie',
'year': 1990
}
Notice that _types
has kept track of the relationship between Movie
and
Media
.
We could pass this directly to Mongo to save it.
>>> db.test_collection.save(m.to_python())
Or if we were using Riak.
>>> media = bucket.new('test_key', data=m.to_python())
>>> media.store()
Or maybe we're storing json in a memcached.
>>> mc["test_key"] = m.to_json()
DictShield has its own type system - every field within a Document
is defined with a specific type, for example a string will be defined as StringField
. This "strong typing" makes serialising/deserialising semi-structured data to and from Python much more robust.
A complete list of the types supported by DictShield:
TYPE | DESCRIPTION |
---|---|
Text fields | |
StringField |
A unicode string |
URLField |
A valid URL |
EmailField |
A valid email address |
ID fields | |
UUIDField |
A valid UUID value, optionally auto-populates empty values with new UUIDs |
ObjectIDField |
Wraps a MongoDB "BSON" ObjectId |
Numeric fields | |
NumberField |
Any number (the parent of all the other numeric fields) |
IntField |
An integer |
LongField |
A long |
FloatField |
A float |
DecimalField |
A fixed-point decimal number |
Hashing fields | |
MD5Field |
An MD5 hash |
SHA1Field |
An SHA1 hash |
'Native type' fields | |
BooleanField |
A boolean |
DateTimeField |
A datetime |
GeoPointField |
A geo-value of the form x, y (latitude, longitude) |
Containers | |
ListField |
Wraps a standard field, so multiple instances of the field can be used |
SortedListField |
A ListField which sorts the list before saving, so list is always sorted |
DictField |
Wraps a standard Python dictionary |
MultiValueDictField |
Django's implementation of a MultiValueDict. |
EmbeddedDocumentField |
Stores a DictShield EmbeddedDocument |
Fields can also receive some arguments for customizing their behavior. The currently accepted arguments are:
ARGUMENT | DESCRIPTION |
---|---|
field_name=None | The name of the field in serialized form. |
required=False | This field must have a value or validation and serialization will fail. |
default=None | Either a default value or callable that produces a default. |
id_field=False | Set to True if this field should be used as the id field. |
validation=None | Supply an alternate function for validation for this field. |
choices=None | Limit the possible values for this field by passing a list. |
description=None | Set an alternate field description for serialization to jsonschema. |
minimized_field_name=None | Name of the field to use when serializing the document with short names. |
uniq_field=None | Legacy arg. Will be removed soon. |
This is what the MD5Field looks like. Notice that it's basically just
an implementation of a validate()
function, which raises a ShieldException
exception if validation fails.
class MD5Field(BaseField):
"""A field that validates input as resembling an MD5 hash.
"""
hash_length = 32
def validate(self, value):
if len(value) != MD5Field.hash_length:
raise ShieldException('MD5 value is wrong length',
self.field_name, value)
try:
x = int(value, 16)
except:
raise ShieldException('MD5 value is not hex',
self.field_name, value)
You might notice that the field which failed is also reported. It's available on
the exception as field_name
and field_value
.
The exception prints in this pattern field_name(field_value): reason
.
ShieldException caught: secret(whatevz): MD5 value is wrong length
If you think the overhead of validation is unnecessary for some use cases, you
can skip it by never calling validate()
.
As we saw above, we know we can validate Document
instances by calling
validate()
. Let's generate a User
instance with seed data and validate it.
First, here is the User model:
class User(Document):
_public_fields = ['name']
secret = MD5Field()
name = StringField(required=True, max_length=50)
bio = StringField(max_length=100)
url = URLField()
Next, we seed the instance with some data and validate it.
user = User(**{'secret': 'whatevs', 'name': 'test hash'})
try:
user.validate()
except ShieldException, se:
print 'ShieldException caught: %s' % (se)
This calling validate()
on a model validates an instance by looping through
it's fields and calling field.validate()
on each one.
We can still be leaner. DictShield also allows validating input without instantiating any objects.
Let's say we get this JSON string from a user.
{"bio": "Python, Erlang and guitars!", "secret": "e8b5d682452313a6142c10b045a9a135", "name": "J2D2"}
We might write some server code that looks like this:
json_string = request.get_arg('data')
user_input = json.loads(json_string)
User(**user_input).validate()
This method builds a User instance out of the input, which also throws away keys that aren't in the User definition.
We then call validate()
on that User
instance to validate each field against
what the dictionary contained. If the data doesn't pass exception, a
ShieldException
is thrown and we handle the error.
If validation passed, we're done. We know the data looks good.
Input is coming from everyone online, so who knows what it's in there. We do, however, know exactly what fields we want to be there. Same goes for output.
A web system typically has tiers involved with data access, depending on the user logged in. My most common need is to differentiate between internal system data (the raw document), data fields for the owner of the data (internal data removed) and the data fields that are shareable with the general public.
Unrecognized fields, in user input, are thrown away. This makes handling input fairly easy because you are generally working with a list of fields, what they look like and how to turn them into Python or JSON. Not much else.
So here's how you can reduce the user input into just the fields found on a
User
document.
Consider the following string:
{
"rogue_field": "MWAHAHA",
"bio": "Python, Erlang and guitars!",
"secret": "e8b5d682452313a6142c10b045a9a135",
"name": "J2D2"
}
Parse it just like before.
user_doc = User(**total_input).to_python()
The values in total_input are matched against fields found in the DictShield Document class and everything else is discarded.
user_doc
now looks like below with rogue_field
removed.
{
'_types': ['User'],
'bio': u'Python, Erlang and guitars!,
'secret': 'e8b5d682452313a6142c10b045a9a135',
'name': u'J2D2',
'_cls': 'User'
}
Here is our Movie
document safe for transmitting to the owner of the document.
We achieve this by calling Movie.make_json_ownersafe
. This function is a
classmethod available on the Document
class. It knows to remove _cls
and
_types
because they are in Document._internal_fields
. You can add any
fields that should be treated as internal to your system by adding a list named
_private_fields
to your Document and listing each field.
{
"personal_thoughts": "I wish I had three hands...",
"title": "Total Recall",
"year": 1990
}
This is dictionary safe for transmitting to the public, not just the owner.
Get this by calling make_json_publicsafe
.
{
"title": "Total Recall",
"year": 1990
}
The structure of documents can also be serialized into JSON Schema. Again, with our Movie
document.
>>> Movie.to_jsonschema()
'{
"title": "Movie"
"type": "object",
"properties": {
"year": {
"minimum": 1950,
"type": "number",
"maximum": 2012,
"title": "year"
},
"title": {
"title": "title",
"type": "string",
"maxLength": 40
}
},
}'
Consider a user updating some of their settings. Rather than validate the entire document, you want to check validation for just the field the client is updating and tell your database to store just that field.
DictShield offers a few classmethods to facilitate this.
validate_class_fields
gives us that by checking if some dictionary matches
the pattern it needs, including required fields. Notice, it's also a
classmethod. No need to instantiate anything.
user_input = {
'url': 'http://j2labs.tumblr.com'
}
try:
User.validate_class_fields(user_input)
except ShieldException, se:
print(' Validation failure: %s\n' % (dp))
This particular code would throw an exception because the name
field is
required, but not present.
validation_class_partial
lets you validate only the fields present in the
input. This is useful for updating one or two fields in a document at a time,
like we attempted above.
...
User.validate_class_partial(user_input)
...
DictShield's validation methods can also give you a list of which individual fields
failed validation. Calling a Document's validate()
method with validate_all=True
will raise a ShieldDocException
whose errors_list
attriute is a list of 0 or more
exceptions, and calling validate_class_fields
with validate_all=True
will return the
same list.
exceptions = User.validate_class_fields(total_input, validate_all=True)
if exceptions:
# Validation was not successful
DictShield is in pypi so you can use easy_install
or pip
.
pip install dictshield
- James Dennis
- Andrew Gwozdziewycz
- Dion Paragas
- Tom Waits
- Chris McCulloh
- Sean O'Connor
- Alexander Dean
- Rob Spychala
- Ben Beecher
- John Krauss
- Titusz
- Nicola Iarocci
- Justin Lilly
- Jonathan Halcrow
BSD!