Colander serializes numbers & bools as strings. #80

Closed
AndreLouisCaron opened this Issue Feb 6, 2013 · 18 comments

Comments

Projects
None yet
9 participants

When I use the Bool and Int types in my schemas like so:

class TestSchema(MappingSchema):
    interested = SchemaNode(Boolean(), missing=False, default=False)

And I use it to serialize some data like so:

schema = TestSchema()
data = schema.serialize(dict(interested=True))

Then data['interested']' is"true"` (a string).

Now, if I json.dumps(data), I get strings in my JSON data instead of the boolean I requested. This is really annoying because there's no way to fix this reliably after the fact! At best, I can supply a custom JSONEncoder, but I can still accidentally convert strings that shouldn't be converted.

The same problem exists with numbers.

If anyone else has the same problem, you can temporarily work around the issue by defining the following overrides:

class Boolean(colander.Boolean):
    def serialize(self, node, appstruct):
        result = super(Boolean, self).serialize(node, appstruct)
        if result is not colander.null:
            result = bool(result)
        return result

class Float(colander.Float):
    def serialize(self, node, appstruct):
        result = super(Float, self).serialize(node, appstruct)
        if result is not colander.null:
            result = float(result)
        return result

class Int(colander.Int):
    def serialize(self, node, appstruct):
        result = super(Int, self).serialize(node, appstruct)
        if result is not colander.null:
            result = int(result)
        return result

Keep in mind that you'll have to put these in some custom package/module if you intend on reusing them for schemas in several Python modules.

Owner

mcdonc commented Feb 6, 2013

All builtin colander types generate strings as cstruct values except mapping and sequence, by design. Deserialization converts them from strings to appropriate types.

mcdonc closed this Feb 6, 2013

My point is precisely that deserialization using colander converts them back to the appropriate types. The point of generating a standard format like JSON is precisely to inter-operate with other systems.

Can you at least consider introducing an optional argument to the decoding process defaulting to the current behavior?

Owner

mcdonc commented Feb 6, 2013

Apologies, no. Colander serializations are meant to be deserialized by colander. Colander deserializations are meant to be serialized by colander. Neither result is meant to be JSON.

First, if colander serialization is not meant to generate something that inter-operable with anything else, colander serialization is utterly useless because I can just pickle my Python objects.

Second, colander is really close to offering that kind of support. Proof is that I can write ~20 lines of monkey patching to find the only two issues I've found preventing it from being used this way.

Just to give you an idea: I'm working on a REST API using Pyramid and Cornice. Colander is by far the best choice to validate incoming JSON.

For basic/intermediate use cases, it's also the best choice to format outgoing JSON.

The only two things that are getting in the way of making it perfect for outgoing JSON are issues #60 and #80.

Note that by patching the result of the following 4 functions, I can generate proper JSON output with all my schemas:

  • Boolean.serialize()
  • Int.serialize()
  • Float.serialize()
  • MappingSchema.serialize() (see issue #80)

I just apply the following code:

def monkey_patch_colander():
    # Recover boolean values which were coerced into strings.
    serialize_boolean = getattr(colander.Boolean, 'serialize')
    def patched_boolean_serialization(*args, **kwds):
        result = serialize_boolean(*args, **kwds)
        if result is not colander.null:
            result = result == 'true'
        return result
    setattr(colander.Boolean, 'serialize', patched_boolean_serialization)

    # Recover float values which were coerced into strings.
    serialize_float = getattr(colander.Float, 'serialize')
    def patched_float_serialization(*args, **kwds):
        result = serialize_float(*args, **kwds)
        if result is not colander.null:
            result = float(result)
        return result
    setattr(colander.Float, 'serialize', patched_float_serialization)

    # Recover integer values which were coerced into strings.
    serialize_int = getattr(colander.Int, 'serialize')
    def patched_int_serialization(*args, **kwds):
        result = serialize_int(*args, **kwds)
        if result is not colander.null:
            result = int(result)
        return result
    setattr(colander.Int, 'serialize', patched_int_serialization)

    # Remove optional mapping keys which were associated with 'colander.null'.
    serialize_mapping = getattr(colander.MappingSchema, 'serialize')
    def patched_mapping_serialization(*args, **kwds):
        result = serialize_mapping(*args, **kwds)
        if result is not colander.null:
            result = {k:v for k,v in result.iteritems() if v is not colander.null}
        return result
    setattr(colander.MappingSchema, 'serialize', patched_mapping_serialization)

Although this works for me and I'll keep applying it as long as it will be necessary, I don't see the purpose in intentionally limiting the purpose of serialization to creating Colander's proprietary serialization format.

I think supporting this would be a major feature and a good selling point for Colander. Please reopen this issue.

Contributor

abrookins commented Feb 26, 2013

I also use Colander to validate JSON, which it does very well. Being able to serialize objects into data structures whose boolean, integer and float values retain their data types would be a huge benefit.

I understand that this isn't the intended usage of serialize and that I probably don't fully appreciate the implications of making the behavior of Andre's monkey-patch the default. However, I do wish this was the case, for what it's worth.

tvrtkos commented Mar 13, 2013

I'm also working on a REST API using Pyramid and Cornice. I was unpleasantly surprised when all my values were converted to strings. I'm considering using the aforementioned monkey-patch or just plain json library. I can't find any other similar library: there is jsonschema but it reminds me to XML, maybe formencode?

Contributor

abrookins commented Mar 15, 2013

While Colander is useful for the purpose that Cornice allows -- input validation for a web service view -- my experience is that you start wanting to use Schema objects to house serialization and deserialization logic. But that doesn't appear to be an intended use of the library.

Still, the awesomeness of having a Schema object that manages validation, serializing and deserializing data structures between JSON and Python is so great, I too am using a version of the patch @AndreLouisCaron posted. Colander is just too useful with it...

I also came across jsonschema, but it seems much less powerful and too focused on validating the (draft) JSON Schema spec. There doesn't seem to be a way to express that you have this custom Python type that needs to serialize/deserialize a certain way. OTH it would be great to be able to pass off the schema to the client for client-side validation -- something Colander schemas can't do, but something like jsonschema dict-based schemas could.

my experience is that you start wanting to use Schema objects to house serialization and deserialization logic. But that doesn't appear to be an intended use of the library.

This statement is kind of weird to me. Serialization/deserialization logic is the only thing the library allows you to do.

OTH it would be great to be able to pass off the schema to the client for client-side validation -- something Colander schemas can't do, but something like jsonschema dict-based schemas could.

You can't technically pass the schema an object in itself, but I find it really convenient to expose the schema in a Python package so that Python-based clients get the schema definition for free.

Contributor

abrookins commented Mar 15, 2013

This statement is kind of weird to me. Serialization/deserialization logic is the only thing the library allows you to do.

You're right. I should have been more specific. As the docs say, Colander is useful for validating and deserializing JSON (and other data). But since serialization is only meant to create data consumed by Colander, deserialization, as a separate step from validation, is less useful.

You can add deserialization logic to a schema, but you can't use the schema to serialize the Python data structure back to JSON. And when I'm talking about "serialization logic," I'm thinking of custom SchemaNode types that e.g. transform NumPy data structures into JSON and back again.

So without something like your patch, the serialization part of that logic ends up somewhere else, which I find less than ideal. When all that logic is contained in one place, I think it's easier to reason about and maintain.

I don't need Colander to support serializing objects to a JSON-friendly format, but it would be great if there were a mechanism other than monkey-patching to control the serialization output.

You can't technically pass the schema an object in itself, but I find it really convenient to expose the schema in a Python package so that Python-based clients get the schema definition for free.

Right. This is a great use case for Colander. Now that you can subclass schemas, it's even nicer. My desire to share the schema with a JavaScript client isn't something I expect Colander to provide, but I could see a third-party library consuming a Colander schema and outputting something like a JSON Schema document.

I've been starting to evaluate colander as a replacement for Django forms, specifically for validating and deserializing incoming JSON, which it has been really awesome for so far. I've also been looking at it for serializing objects into outgoing JSON structures as well. However this thread caught my attention that I (along with several other people in this thread) may be misusing the library. Particularly this statement from @mcdonc:

Colander serializations are meant to be deserialized by colander. Colander deserializations are meant to be serialized by colander. Neither result is meant to be JSON.

Perhaps I'm misunderstanding you, but if this is true, you may want to update the docs, as it seem it seems that this philosophy is at least somewhat in conflict with the first sentence on http://docs.pylonsproject.org/projects/colander/en/latest/:

Colander is useful as a system for validating and deserializing data obtained via XML, JSON, an HTML form post or any other equally simple data serialization.

Or at least it could use some clarification around the fact that while colander will happily deserialize XML/JSON/form post data, it is not intended for direct 2-way interpolation with these formats (ie deserializing some incoming JSON, manipulating it and turning around serializing it back into JSON). As the docs led me to believe that this was the general purpose of the library. I didn't realize that it was strictly for sending data between end points that are all using colander to serialize and deserialize the data.

@mikeocool This comment also confused me. In retrospect however, I think the idea behind @mcdonc's comment is that Colander in itself was not meant to be directly tied to the serialization format (e.g. the output from Schema().serialize(...) will never result in a JSON-encoded payload that you can send/store).

However, the library seems to be tied to at least one specific serialization format that understands nothing but strings, lists and maps, which probably explains why Colander behaves the way it does.

I wholeheartedly agree that if the library hapily says that it's useful for processing data across multiple formats, it should either naturally support the formats it claims to support or be clear about its limitations.

What is the real reason why serialization can't be in json format. Is it deform? As far as I know @iElectric is working on something like defrom2.
Now i can't simply change colander alchemy to use colander serializers. This would be really really usefull if implemented. What about some flag like json=True to serializer function?

Member

hathawsh commented Aug 22, 2013

Having used colander extensively, I see both sides of the argument:

  1. colander is designed primarily for consuming and producing a simpler data model than JSON. The documentation states that fact clearly upfront. Introducing more data types to that model would cause subtle bugs and probably security holes for consumers of colander (such as deform).

  2. OTOH, colander is really close to being a great way to convert objects to/from JSON, and people who are already using colander for form validation probably want to use it for JSON validation and serialization as well.

I sort of like the suggestion by @offlinehacker, but the issue isn't really specific to JSON. Serialization to YAML or other formats would run into the same issue, so the parameter shouldn't be called "json=True".

I propose a resolution based on a current experiment I'm doing: let's add new schema types, 'colander.NativeInteger' and 'colander.NativeBoolean'. These are like the standard Integer and Boolean schema types, but they serialize to 'int' and 'bool' respectively. Then we should document that when people want to use colander for JSON (de)serialization, they should use NativeInteger and NativeBoolean instead of Integer and Boolean. This advice would apply equally to other serialization formats.

I was a bit disappointed when I found out that colander did not do that. I thought I had found the ultimate data serialization tool for python, but turned out it doesn't do what I expected, e.g. convert data to and from json, xml. Now that I understand I have to write the logic to convert data to and from json/xml, I am not sure how colander will be useful to my my app. I'm not criticizing the project in any way. All I am saying is that reading part of the documentation gave me expectations that were not fulfilled.

I do too wish colander would provide JSON serialization and deserialization.

Just want to throw what little weight I may have in support of this issue as I have been watching it since the days I too was working in Pyramid and Cornice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment