Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DictField limitations #99

Closed
lafrech opened this issue Mar 12, 2017 · 9 comments · Fixed by #245
Closed

DictField limitations #99

lafrech opened this issue Mar 12, 2017 · 9 comments · Fixed by #245
Milestone

Comments

@lafrech
Copy link
Collaborator

lafrech commented Mar 12, 2017

DictField suffers from a few limitations.

  • Must be marked as modified manually
  • Does not deserialize values into objects (no Schema)
  • Can't use non-string keys

It might be useful to have a schema-less structure, but sometimes, I'd like to have DictField enforce a schema. Maybe we could add it extra features or create a TypedDictField (or StrcturedDictField, I don't know about the naming).

See this discussion in Marshmallow about nesting a Schema into a Dict.

This comment illustrates the limits of DictField.

The enhancement proposal is to add the possibility to do this:

    fields.Dict(keys=fields.Str(), values=fields.Nested(SomeSchema))

Benefits:

  • Serialization/deserialization of keys and values
  • Allows to use non-string keys (hashable and serializable)
  • Keep track of is_modified, no need to mark the field as modified manually

Note: uMongo does not complain if a Dict is passed non-string key, but pymongo will raise an Exception at commit time. This could be protected by a validator at assignment time in current DictField. Obviously, adding the feature described above would solve this by allowing non-string keys.

Not sure which approach would be best. Try to solve upstream marshmallow-code/marshmallow#483 and ultimately if rejected implement TypedDictField as a marshmallow-bonus field, or create TypedDictField here as a proof-of-concept and try to have it integrated upstream. The more we can provide upstream, the best, and I saw no objection to the idea, there.

@touilleMan
Copy link
Member

Must me marked as modified manually

Yeah DictField is just a quick hack so far waiting for somebody to rework it ;-)

Does not deserialize values into objects (no Schema)

Don't get it... it precisely the point of DictField to return a python dict instead of an object

Can't use non-string keys

That's a mongo limitation, a workaround would be to serialize all the dict as a single string, but this is doing magic behind the user so I prefer he does this by himself when needed (given it's pretty simple to do with json.loads/dumps)

Note: uMongo does not complain if a Dict is passed non-string key, but pymongo will raise an Exception at commit time. This could be protected by a validator at assignment time in current DictField. Obviously, adding the feature described above would solve this by allowing non-string keys.

👍

The more we can provide upstream, the best, and I saw no objection to the idea, there.

👍 we should try to stick as close as possible to the upstream marshmallow for the sake of simplicity

@lafrech
Copy link
Collaborator Author

lafrech commented Mar 12, 2017

Sorry if I was unclear. I was too focused on the use case I presented in marshmallow-code/marshmallow#483.

Let's put it this way, hopefully it will be clearer:

class Album(EmbeddedDocument):
    name = fields.StrField()
    duration = fields.FloatField()

class Artist(Document):
    name = fields.StrField()
    albums = ...

(Assume only one album per year for example sake...)

I want albums to be a dict of Album, so that Artist serializes as

{'name': 'David Bowie',
 'albums': {
     '1971': {'name': 'Hunky Dory', 'duration': 2364},
     '1970': {'name': 'The Man Who Sold the World', 'duration': 2456},
 }}

The only options we have right now for the use case I exposed are

  • use a DictField and live with said limitations (no schema, no is_modified)
  • add year in Album and use a ListField of EmbeddedFields (makes sense in this example but not so much in other use cases where year should not be in the object) and lose key indexing (you have to go through the list to find the year).
  • create an intermediate schema embedding value + key and lose key indexing:
class Album(Document):
    name = fields.StringField()
    duration = fields.Float()

class DatedAlbum(Document):
    year = fields.IntField()
    album = fields.EmbeddedField(Album)

class ArtistSchema(Schema):
    name = fields.Str()
    albums = fields.ListField(fields.EmbeddedField(DatedAlbum))

This is a relatively common use case in my application, since we record data history.

What I'm after is a field that would act like a dict (indexation by key) but also enforce a schema on the keys. And as suggested by a commenter in the Marshmallow issue I link to, why not also enforce a schema on the keys?

I'm aware of the MongoDB limitation on keys (strings only). But with schemas on the keys, you could use ints as keys and the schema would serialize them as strings.

The typed dict thing is not specific to uMongo, so it could belong in Marshmallow. We can wait for Marshmallow or better do the work there. Then add Mongo stuff here.

Meanwhile, we may also add that string validator on keys in DictField.

@touilleMan
Copy link
Member

Ok I get it ;-)

I'm aware of the MongoDB limitation on keys (strings only). But with schemas on the keys, you could use ints as keys and the schema would serialize them as strings.

I'm not sure allowing not string type for the key is a good idea: if you declare a float as key, you should expect all the (de)serialization stuff done on umongo side, but this is not possible (or would require hard work and less readable code...) for the querying (e.g. Artist.find({'$exists': {'albums.12.7'}}) would try to find a nested 7 field inside a 12 one instead of finding the 12.7 field)

Beside, this would open the door to more complex field serialization like EmbeddedDocument, which are totally crazy to try to request against.

I think we should only accept string field and validate it then (not $ or . in the field). Thus the user is forced to think about how to serialize his data and can reuse this knowledge to do the query.

The typed dict thing is not specific to uMongo, so it could belong in Marshmallow. We can wait for Marshmallow or better do the work there. Then add Mongo stuff here.

Yup

What I'm after is a field that would act like a dict (indexation by key) but also enforce a schema on the keys.

I would say your example needs dict field that does validation on value while allowing any key.
Maybe we should create a new type of field for this usecase (DynamicEmbeddedDocument ? EmbeddedDocumentsDict ?). This way we would avoid adding complexity into the Dict field.

@lafrech
Copy link
Collaborator Author

lafrech commented Mar 13, 2017

I'm not sure allowing not string type for the key is a good idea [...]

Hmmm, right. Let's stick to string keys.

I would say your example needs dict field that does validation on value while allowing any key.
Maybe we should create a new type of field for this usecase (DynamicEmbeddedDocument ? EmbeddedDocumentsDict ?). This way we would avoid adding complexity into the Dict field.

Yes. Or more like TypedDict. The value can be any field, not just an EmbeddedField:

class Example(Document):
    embd_f = TypedDict(EmbeddedField(EmbeddedExample))
    list_f = TypedDict(ListField(FloatField())
    int_f = TypedDict(IntField)

I think MongoEngine calls it MapField but MapField(StringField()) is just a shortcut to DictField(field=StringField()). Maybe it's not worth adding a new field, and adding an extra field attribute to to DictField would be just as good.

Ideally, we'd solve marshmallow-code/marshmallow#483 but adding a field to uMongo like described above might be an easier path.

@touilleMan
Copy link
Member

Ok then I'm trusting you on this issue, choose the path you feel the more appropriate (custom TypedDict vs marshmallow's , or maybe 1rst as a workaround, then marshmallow stuff when available)

@lafrech
Copy link
Collaborator Author

lafrech commented Mar 13, 2017

Yes, that's the alternative. I'm pretty busy right now. I opened this as a note for later, and to get some feedback.

If I get a little bit of time, I'll try to add a validation in DictField to check keys are strings. Should be easy.

Then if I have more time, I'd like to give Marshmallow a try and if it is too complicated, go with a custom field here.

@bfax
Copy link

bfax commented Mar 29, 2017

Could you do something like:

class StructuredDict(Dict):
"""
This field is used to validate both the keys and values of a dict

Example usage:
    birthdays = fields.StructuredDict(
        keys=fields.Date(),
        values=fields.Nested(PersonSchema)
        )
"""

def __init__(self, keys=None, values=None, *args, **kwargs):
    """
    Args:
        - keys: The Field type for the given dict's keys
        - values: The Field type for the given dict's values
    """
    super(StructuredDict, self).__init__(*args, **kwargs)
    self.key_schema = keys
    self.value_schema = values

def _serialize(self, value, attr, obj):
    serialized = {}
    for k, v in value.items():
        key = self.key_schema._serialize(k, attr, obj)
        val = self.value_schema._serialize(v, attr, obj)
        serialized[key] = val
    return serialized

@ribx
Copy link

ribx commented Jan 12, 2018

From marshmallow-code/marshmallow#483:

This is released in version 3.0.0b5. Thanks everyone for your feedback!

I could also use this feature, I will look into it, maybe I find some spare time.

@bfax: see the comments above: mongodb does not allow keys other than strings, and saving them in another structure would cause confusion for the users.

@lafrech
Copy link
Collaborator Author

lafrech commented Jan 12, 2018

I think our best move is to port umongo to Marshmallow 3 and use the new Dict field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants