Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow exportation of the Marshmallow Schema of a Document #34

Closed
1 of 3 tasks
lafrech opened this issue Jul 21, 2016 · 13 comments
Closed
1 of 3 tasks

Allow exportation of the Marshmallow Schema of a Document #34

lafrech opened this issue Jul 21, 2016 · 13 comments

Comments

@lafrech
Copy link
Collaborator

lafrech commented Jul 21, 2016

Currently, the Schema of a Document can be obtained from Document.Schema. However, this Schema is made to [|de]serialize between "DB / data_proxy" and "OO world", not between "OO world" and "client/JSON world". (See diagram in the docs).

In other words, uMongo is made to be used like this:

    document.dump(schema=schema)
    # which is equivalent to
    schema.dump(document._data._data)
    # but not equivalent to
    schema.dump(document)

The difference being that the data_proxy does not behave like the document:

  • It may have keys named differently is "attribute" is used
  • document returns None if a key is missing
  • ...

Therefore, using the Schema to serialize a Document may work but it currently has corner cases.

@touilleMan confirms that the ability to export a Marshmallow Schema without the uMongo specificities is in the scope of uMongo and is a feature we want to have.

The idea could be to add a method or attribute to the document to provide that "cleaned up" Schema.

I'm opening this issue to centralize the reflexions about that.

Currently, here are the issues I found:

#33 drafts a way of exporting the Marshmallow Schema from the document.

@touilleMan
Copy link
Member

I've worked a bit on this today.

Considering a simple User model:

class User(Document):
    # Say we use user's name as id
    id = fields.StrField(required=True, attribute='_id')
    age = fields.IntField()
    role = fields.StrField(missing='user')

If we want to provide a user API, the data flow will be the following:

# json data #
{'name': 'John', 'age': 42}
  /\
  ||  Marshmallow's Schema
  \/
# deserialized data #
{'id': 'John', 'age': 42}
  /\
  ||  Umongo's dump/load
  \/
# Umongo document instance #
<document User data={...}>
  /\
  ||  Umongo to_mongo_serialize/deserialize
  \/
# pymong/MongoDB world
{'_id': 'John', 'age': 42, 'role': 'user'}

The trouble is we have 3 places were serialization/deserialization can occur:

  • from mongodb to umongo's "OO world"
  • from umongo's "OO world" to client world
  • from client world to json world

Often json world and client world are the same, however when it's not the case (in the example the API provide name field which ends up as _id in database and is provided as id in OO world) a marshmallow Schema is really useful.

From my point of view, there should be a clean distinction between Umongo's fields and Marshmallow's fields. Mixing both will works 90% of the time, which means blowing up during a dependency update or a minor change...

Hence, the solution I see would be to provide a way to get pure marshmallow version of any umongo field.

Another benefit I see of doing this is to get rid of the schema parameter in Document.update given we can expect the user to do it own API de/serialization using marshmallow before passing to umongo (and doing my_schema.dump(my_umongo_document) for serialization).
This make the guarantees on the data stronger given you can no longer pass a broken schema to update an umongo document instance.
The drawback is the data would be validated two times (marshmallow + umongo), but it's easy to solve that by asking marshmallow to (de)serialize to the mongodb representation of the data, then calling build_from_mongo to create the umongo Document instance.

So here is the idea:

class BaseField(ma_fields.Field):
    ...
    def as_marshmallow_field(self, to_mongo=False, params=None):
        """
        Return a pure-marshmallow version of this field.

        :param to_mongo: If True, field will (de)serialize as mongodb world instead of as oo world
        :param params: Additional parameters passed to the mashmallow field
            class constructor.
        """
       ...

Depending of to_mongo parameter, field's attributes should be kept or not:

attribute oo world mongo world comment
default ?? ?? see #36
attribute no yes
validate yes yes umongo validators only add i18n support so far, no conversion needed (maybe in the future ?)
required yes yes
load_only yes yes
dump_only yes yes
missing ?? ?? see #36
allow_none yes yes
error_messages yes yes error_messages provide i18n support, should work ok but be careful

note: As discussed in #36, default and missing could be merge into a single default attribute un umongo given the two are misleading. So the marshmallow default&missing attributes should be both determined from umongo's default field

Another as_marshmallow_schema should be also needed for Document and all the BaseDataObject children in order to easily provide the marshmallow Class.

Most umongo fields are just rip-off of marshmallow ones and should not cause any trouble.
For the custom umongo fields (i.g. ObjectIdField) I think we should first provide them as regular marshmallow fields (in a marshmallow_bonus_fields.py file for example) in order to rip them off just like the others ones.

@lafrech
Copy link
Collaborator Author

lafrech commented Aug 23, 2016

I totally agree about providing pure Marshmallow schema, for all the reasons you pointed, plus another one:

I'm trying to use apispec to generate the API doc. To document the type of each attribute, it relies on the type of the field, so I need the field to be a true Marshmallow field. (This does not work in my current patched version as I return umongo fields.)

For custom field, I should extend the field -> type mapping but keep the logic, thus I like the idea of providing custom fields as regular Marshmallow fields first.

@touilleMan
Copy link
Member

I've done some big work on this feature, see branch https://github.com/Scille/umongo/tree/export_schema and 0a35ce841

Basically I provide as_marshmallow_schema and as_marshmallow_field to get regular marshmallow field/schema form umongo ones.

Two points:

An umongo.fields.EmbeddedDocumentField will generate a marshamallow.fields.Nested which de-serializes data into a dict (and not a umongo.EmbeddedDocument !). Then this dict can be passed to the umongo.Document field assignation (or during the construction of the document) to be turned into a real umongo.EmbeddedDocument.

It's the same thing with the tricky fields (like GenericReferenceField too).

A parameter mongo_world is provided to configure the generated schema to work with the mongo world. Thus advanced user can directly do the unserialization with the generated marshamallow schema, then build the umongo object with build_from_mongo (and then don't have to do two validations of the same data)

@lafrech Can you play a bit with this (see the tests/test_marshmallow.py for small examples) and give me your feedback ? I think I still have to update the documentation & examples before merging this on the master

@lafrech
Copy link
Collaborator Author

lafrech commented Aug 29, 2016

Great!

I've got a few things to do before, but I'll test this as soon as I can on my own use case, as a replacement for my export attempt, to see how it goes.

@touilleMan
Copy link
Member

Since I was pretty happy with the way I implemented this feature, I've corrected the examples and document and merged it into master 😃

Regarding the serialization of missing fields trouble, I've added a missing_accessor attribute to as_mashmallow_schema. If set to True (the default), a special get_attribute method will be added to the marshmallow schema that will take care of retrieving the real field value (instead of returning None in place of missing)

Same thing with the check_unknow_fields attribute which add a schema validation for unknown fields in the marshmallow schema.

@lafrech reopen if you think I've missed something 👍

@lafrech
Copy link
Collaborator Author

lafrech commented Sep 7, 2016

I rebased my patched branch to latest master and tested the new schema export feature.

First, I should say that I'm pretty happy with the result. My code is simpler and clearer.

I have two points to raise:

check_unknow_fields

I still face #18, for the same reason as before. webargs sets dump_only fields to missing and then check_unknown_fields detects value passed for a dump_only field, which we should ignore when this value is missing. Should I rebase #19?

missing_accessor

To be honest, I don't really understand the purpose of schema_from_umongo_get_attribute.

I have a problematic use case.

Consider a Document with a property:

@db_instance.register
class User(Document):

    @property
    def whatever(self):
        return random stuff

Let's say we export the Marshmallow Schema and add a field to dump this property:

user_schema = User.schema.as_marshmallow_schema()

class UserSchema(user_schema):

    class Meta:
        strict = True

    whatever = ma.fields.String(
        dump_only=True
    )

Then this line fails:

raw_ret = obj._data.get(attr)

but replacing it with

raw_ret = getattr(obj, attr)

would defeat the point of schema_from_umongo_get_attribute, I guess (the result wouldn't be "raw".

I'm not sure what to do about this.

I don't think adding properties to a Document and dumping them using the Schema is an invalid use case.

Currently, as a workaround, I disabled the feature:

user_schema = User.schema.as_marshmallow_schema(missing_accessor=False)

@lafrech
Copy link
Collaborator Author

lafrech commented Sep 7, 2016

(I'm afraid I can't reopen the issue since you're the one who closed it.)

@lafrech
Copy link
Collaborator Author

lafrech commented Sep 7, 2016

Regarding the first point (check_unknow_fields), I guess I should just set it to False when exporting the Schema. After all, that's what this flag is here for. I wouldn't get the check for "really unknown" fields being passed, but I don't really mind.

In this case, we may close #18 and #19.

@touilleMan
Copy link
Member

touilleMan commented Sep 8, 2016

To be honest, I don't really understand the purpose of schema_from_umongo_get_attribute.

To access attributes (see https://github.com/marshmallow-code/marshmallow/blob/dev/marshmallow/utils.py#L317), by default marshmallow will first try to get by item (doc['item']), then by attribute (doc.item, if the attribute is callable, it return the result of it call). Finally it will return default value.

Applied to a umongo document, you try first to get the field data, then try to get a function/property defined in the class.
However if the field data is missing, umongo will return None (that was your trouble in the first place).
To solve this, schema_from_umongo_get_attribute will check if the return None is really a None or a missing.

Back to your example:

@db_instance.register
class User(Document):
    a = fields.IntField()

    @property
    def whatever(self):
        return None

user_schema = User.schema.as_marshmallow_schema()

class UserSchema(user_schema):

    class Meta:
        strict = True
    a = ma.fields.Int()
    whatever = ma.fields.String(
        dump_only=True
    )


UserSchema().dump(User())
# Both 'whatever' and 'a' return 'None', however custom accessor can determine
# 'a' real value is 'missing'
# {'whatever': None}

So there should be no trouble whit this...

@touilleMan
Copy link
Member

In this case, we may close #18 and #19.

Commit 857d730 implement the right behavior: fields with missing value are considered as... missing 😄

@lafrech
Copy link
Collaborator Author

lafrech commented Sep 8, 2016

Thanks, it is clearer now.

However, it seems your example does not work. In the tests, the property returns "I'm a property". If we change this to let it return None, we get a crash:

    ret = MaSchema.get_attribute(self, attr, obj, default)
    # Here, ret is None because that's what the property returns
    if ret is None and ret is not default:
        # We're going in there, and we get a KeyError
        raw_ret = obj._data.get(attr)
        return default if raw_ret is missing else raw_ret
    else:
        return ret

Possible fix:

    ret = MaSchema.get_attribute(self, attr, obj, default)
    if ret is None and ret is not default:
        try:
            raw_ret = obj._data.get(attr, to_raise=KeyError)  # Ask for a KeyError explicitly in case the default changes?
        except KeyError:
            # We got None and attr is not in _fields, let's assume None was really a None
            return ret  # Or equivalently return None
        return default if raw_ret is missing else raw_ret
    else:
        return ret

I just realized that missing fields are not serialized (not in the json) instead of being written as None/null and I remember this is what I wanted. I guess that's what you're referring to when you write "that was your trouble in the first place" and indeed, I think it is nicer this way.

Thanks.

@touilleMan
Copy link
Member

fixed 6c406d6

@lafrech
Copy link
Collaborator Author

lafrech commented Sep 9, 2016

Great. Your fix is much better than my hack.

@lafrech lafrech mentioned this issue Sep 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants