Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #934 - Added option to disable field existence check. #953

Closed
wants to merge 5 commits into from

Conversation

gordol
Copy link

@gordol gordol commented Apr 11, 2015

I threw together this patch as an idea to fix #934.

I haven't dug too deep into Mongoengine to know what else this might affect, but I used _from_son in BaseDocument object as my injection point. The idea being that if one tries to define in code a field that does not exist, an error is still raised, but if one pulls in data from a PyMongo object, we don't raise an error.

Review on Reviewable

@landscape-bot
Copy link

Code Health
Repository health decreased by 0.00% when pulling 4738e63 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.0% when pulling 4738e63 on gordol:943_ideas into 19dc312 on MongoEngine:master.

6 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.0% when pulling 4738e63 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.0% when pulling 4738e63 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.0% when pulling 4738e63 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.0% when pulling 4738e63 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.0% when pulling 4738e63 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.0% when pulling 4738e63 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 73.02% when pulling 4738e63 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@thedrow
Copy link
Contributor

thedrow commented Apr 11, 2015

What do you guys think?
@rozza @DavidBord @yograterol

@yograterol
Copy link
Contributor

test for that?

@DavidBord
Copy link
Contributor

Pulling data from pymongo to mongoengine can be a bit problematic. For instance:

    class A(Document):
        a = StringField(db_field='b')
    A.objects.delete()
    A(a='a').save()
    doc = A._get_collection().find_one()
    a = A(**doc)

This won't work because of the mappings _id -> id & b -> a. So, I am having trouble in understanding the use case for it

In addition, when fetching data from pymongo, why not just bring the fields that are needed?

@gordol
Copy link
Author

gordol commented Apr 12, 2015

I'm not sure what you are talking about exactly, but perhaps it will help if I share some examples.

First, let's load some data into mongo, using the current release without my patch.

from mongoengine import *
connect('testing')

class Foobar(Document):
     foo = StringField()
     bar = StringField()

foobar = Foobar(foo='foo', bar='bar').save()

foobar.foo
#u'foo'

foobar.bar
#u'bar'

Now, define the same model, less one field. We want to connect to the same collection, but we only want to work with part of the data.

from mongoengine import *
connect('testing')

class Foobar(Document):
    foo = StringField()

foobar = Foobar.objects.get()
'''
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mongoengine/queryset/base.py", line 235, in get
    result = queryset.next()
  File "mongoengine/queryset/base.py", line 1359, in next
    _auto_dereference=self._auto_dereference, only_fields=self.only_fields)
  File "mongoengine/base/document.py", line 709, in _from_son
    obj = cls(__auto_convert=False, _created=created, __only_fields=only_fields, **data)
  File "mongoengine/base/document.py", line 81, in __init__
    raise FieldDoesNotExist(msg)
mongoengine.errors.FieldDoesNotExist: The field 'bar' does not exist on the document 'Foobar'
"""

Since bar exists in the mongo data, but not in our model, an error is raised.

Now, apply my patch, and try again.

This time, we can load the data in, but we are unable to reference the bar property since it was not defined in the model. This is expected.

from mongoengine import *
connect('testing')

class Foobar(Document):
    foo = StringField()

foobar = Foobar.objects.get()
foobar.foo
u'foo'

foobar.bar
"""
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Foobar' object has no attribute 'bar'
"""

Still using my patch, if we try to put data into an undefined field, an error is raised, as expected.

from mongoengine import *
connect('testing')
class Foobar(Document):
    foo = StringField()

foobar = Foobar()
foobar.foo = 'test'
foobar.bar = 'test'
"""
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Foobar' object has no attribute 'bar'
"""

@gordol
Copy link
Author

gordol commented Apr 12, 2015

My patch allows one to load data from mongo collections in which all fields are not defined in the model. This was always the behavior of Mongoengine until recently. As @noirbizarre has said already in #934, #457 broke backwards compatibility. My patch restores expected functionality.

@gordol
Copy link
Author

gordol commented Apr 12, 2015

@noirbizarre

Preventing constructor bad usage, which is good and the purpose of #457, is different than preventing loading data from a schema-less database into a typed and structured model

@DavidBord
Copy link
Contributor

Thanks for providing a use case
But why change the framework and implicitly hide stuff from the rest of the developers of the project (wouldn't it be a surprise if someone else will add bar to the schema in the future and all all of a sudden data will appear?) when you can change your data?

@thedrow
Copy link
Contributor

thedrow commented Apr 12, 2015

@DavidBord This is exactly what I asked. I also asked why using dynamic documents is not good enough.

@gordol
Copy link
Author

gordol commented Apr 12, 2015

Mongo is supposed to be flexible. One shouldn't have to make copies of data
and delete fields to use mongoengine with a collection that contains legacy
data. What if other applications share the database and use the legacy data
and the new data? I'm not saying it's a great architectural decision, but
it happens often unfortunately and it's outside of my control. Mongoengine
is now forcing a rigid structure into and out of mongo when it's supposed
to be schemaless.

Dynamic documents do solve this, but not without other side effects. What
if one wants to enforce a certain structure on new data going into mongo
from one application without modifying the legacy data?

What if it's not even legacy data, but just data that is of no concern to
one specific application. Pipelines are a good example. Let's say one
application starts a document and passes it off to another application that
does more with the data and appends to it. Sure, one could argue that
downstream should only read the data from upstream and store output and
modifications in a new collection, but mongoengine shouldn't be enforcing
this.

There are other ways to fix this, like defining read only fields or
something so mongoengine can be used while still enforcing model structure,
but one shouldn't have to define every potential field and put additional
validation in front of models to prevent those fields from being written
just to access a collection that contains data they don't want to work with.

I'm not saying it has to be solved this way as in my patch, but this was
fastest and easiest to implement.

#457 was a major breaking change that was too wide in its scope. It's fine
to prevent undefined fields from being created, but it's absurd to force a
strict schema on a schemaless database by disallowing data to be read if it
contains undefined fields.

Mongokit has options to let the user decide whether they want to enforce a
strict schema or not, mongoengine should too.

I'm on mobile at the moment, apologies for brevity. If I need to elaborate
further later I will. Hopefully this helps to understand the conundrum
though.

Thomas Gordon Lowrey IV
Software Systems Engineer / Developer
205.409.4901
https://www.linkedin.com/in/gordol

@gordol
Copy link
Author

gordol commented Apr 12, 2015

(wouldn't it be a surprise if someone else will add bar to the schema in the future and all all of a sudden data will appear?)

Developers should be familiar with the data they are interfacing, so no.

@gordol
Copy link
Author

gordol commented Apr 12, 2015

Or the opposite could be argued, if y'all are gonna enforce strict schemas why not add an option to delete undefined fields?

@DavidBord
Copy link
Contributor

I get what you are saying, but it is still does not change my concern with this PR - it creates a surprising and implicit behaviour
I agree with @thedrow, the solution should be either DynamicDocument or changes in the data layer

@gordol
Copy link
Author

gordol commented Apr 12, 2015

Mongoengine isn't my project, and I respect your decisions to decline this pull request, but to say this is implicit and surprising is kind of silly when it has been the default behavior all along; it's an attempt to restore backwards compatibility.

The only surprising thing here is updating to 0.9 and finding that one can no longer load data where-as one could previously.

I'm not saying the way I patched to fix this behavior is great, all I'm looking for is acknowledgement that #457 broke backwards compatibility. Would this pull request be acceptable if it was a non-default option that can be toggled in Document meta?

@thedrow
Copy link
Contributor

thedrow commented Apr 12, 2015

It can be considered if it was an option in meta.
After that we can discuss if turning it on by default makes sense until we reach 1.0.

@gordol
Copy link
Author

gordol commented Apr 12, 2015

Great, I'll update that branch soon.

@landscape-bot
Copy link

Code Health
Code quality remained the same when pulling f7c85e8 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 73.02% when pulling f7c85e8 on gordol:943_ideas into 19dc312 on MongoEngine:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 73.02% when pulling f7c85e8 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 73.02% when pulling f7c85e8 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@landscape-bot
Copy link

Code Health
Code quality remained the same when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 72.99% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

3 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 72.99% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 72.99% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 72.99% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

6 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

7 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.02%) to 73.01% when pulling 1806668 on gordol:943_ideas into 19dc312 on MongoEngine:master.

@mmellison
Copy link
Contributor

TL;DR

In my opinion the current behavior is desired and can be worked around if needed (see below). I am opposed to adding in an option to override the behavior or explicitly allowing 0.8.x behavior, for either a 0.9.x release or in future releases.

Why? Because I think this is a symptom of a larger issues with MongoEngine's schema validation layer and the current Git Workflow. And these are discussion that should take place elsewhere.


Here is my opinion on this issue, sorry if it is long, I am a very verbose and up-front individual:

  • As a developer, I am always aware that going from major version A to major version B of a third-party dependency may break things due to non-backwards compatible behavior. Is the git workflow perfect with MongoEngine? No, but it should be expected that 0.9 brings breaking + non-backwards compatible changes.
  • Explicit > Implicit. I am still not clear why DynamicDocument does not solve the issue here. If there is something about using a DynamicDocument that is undesirable, why don't we look at fixing that issue instead? Using a DynamicDocument is an explicitly semantic way to say 'Hey, I'm going to be saving/loading a schema-less version of my document from the database'.

So looking at some of the comments made in this issue:

Dynamic documents do solve this, but not without other side effects. What
if one wants to enforce a certain structure on new data going into mongo
from one application without modifying the legacy data? -- @gordol

If I understand your problem correctly, its one that I have personally ran into in my work as well. (Internally my company has swapped out the validation mechanism of MongoEngine with Cerberus and that helps solve some of these issues but I digress).
Our solution before that was to use a tuple of a Document for inserting data into the collection and a DynamicDocument to retrieve it.

The discussion is not about wether or not this behavior should be restored implicitly but about "allowing explicitly" the previous behavior because, for those who were using this ability because they known it worked this way, it's not possible anymore to use mongoengine. -- @noirbizarre

I would argue that it is still completely possible to use MongoEngine with version 0.9, it may work differently now, but you can still achieve the same result. (Again maybe pointing towards this being a documentation issue, not necessarily a functional one).

I've already heard complaints from individuals on how "heavy" MongoEngine is and I have people at work constantly asking me for a more "light-weight" alternative. Adding knobs and buttons to adjust every feature of MongoEngine is going to lead to some serious feature bloat and eventually regression issues.

I agree with the other maintainers that DynamicDocument is the correct choice for this situation but I also believe that the current behavior (0.9) is ideal and should remain the default and only behavior. I believe the discussion here points to a larger issue of flexibility in the validation layer of MongoEngine.

Sorry if I come across argumentative -- I love this sort of discussion because it helps drive innovation!

@gordol
Copy link
Author

gordol commented Apr 24, 2015

Yeah, you came across argumentative, and I will raise a few points in retort, but I really don't want to argue about this, honestly.

Firstly, a simple toggle to restore backwards compatibility is not adding any sort of heaviness or bloat to MongoEngine. Adding a single option to a condition statement to prevent an exception from being raised is not going to slow down MongoEngine at in any remotely perceptible way. It's negligible. You realize that's all my pull request does, right? It just prevents the field validation exception from being raised when reading data from Mongo, and that's it! That exception is not preventing any sort of breakage, clearly, it's totally pointless when loading data into the models.

So, you say we don't want to add lots of knobs and toggles and things, because "heaviness" but then what are you proposing as an alternative? Do nothing? "It's a feature, not a bug"? If anything is to be done about it, whether in the validation layer or elsewhere, some code has to be written somewhere, and there will need to be either toggles or implicit default behavior. To avoid the latter, we have a meta toggle. What are you proposing, then? Instead of adding a meta toggle to control one conditional as I've done, I guess we could add some other toggle elsewhere and add a lot more code to the validation layer. That would be even heavier than my pull request. Either way, if someone has a problem with the "heaviness" of an ORM and they aren't capable of falling back to PyMongo where needed, for speed... well, that's their own problem, clearly, not MongoEngine's problem.

What, so we're going to cripple this software, by making it backwards incompatible, and then use some weird strawman hearsay argument about bloat and heaviness as the reason for not having an option to toggle backwards compatibility while still preventing writes to undefined fields, as originally intended? C'mon...

If you really think MongoEngine is too heavy, then we need to start stripping it down... Why don't we start with the half-baked validation that was added in 0.9 that is causing all these issues? I'm not saying the intent behind the validation isn't sound... it did fix a major bug, but it also introduced more.

Frankly, at this point I'm starting to care less and less whether this pull request is merged or not, it is a few very simple lines of code and I can easily maintain my own fork for existing projects. I don't have time to play politics or argue about stuff. It's a simple pull request, and I've said all I have to say.

@mmellison
Copy link
Contributor

I don't want you to misunderstand -- I completely understand where you're coming from. And more or less I used this issue as a way to get my thoughts out about changing behavior in MongoEngine, (especially since there seems to be a few different issues/PRs open about the FieldDoesNotExist exception) and the fact that there was a lack of communication / documentation surrounding the change.

Please don't feel as if I was singling you out and I'm sorry if it came out that way. That was never my intention. I have been dwelling on this change since back before 0.9 was officially released and this felt like a good opportunity to archive a few of my thoughts regardless of it gets merged. I am definitely not going to make that call for this one.

The "strawman hearsay" argument was my opinion to the team of the situation and how it relates to the larger picture of backward compatibility. In my opinion, this will eventually lead to code bloat and regression difficulties, but you are free to disagree and so are the other maintainers. I understand that your PR simply adds in a toggle, but so will the next person's and the person's after that, and eventually it all adds up. Again, this is simply my opinion -- not singling out specifically your changes.

As for the politics, I think debate is healthy for projects. It helps both users and maintainers to gain better perspectives about design and development challenges in front of them and possible solutions both seen and unseen. I am not trying to be 'argumentative' for argument's sake, because that serves no purpose for me or the project. No one is perfect, no project is perfect, but the beauty of open source is if you feel you would be better off maintaining your own fork, you are freely able to do so.

@noirbizarre
Copy link
Collaborator

@seglberg : you should look at #957 (same problem, different solution):

  • The modification is very very lightweight (one line)
  • The default 0.9 behavior is untouched
  • It's not possible to bypass the constructor validation
  • Data loading validation is only deactivatable explicitly by Document/EmbeddedDocument with a meta attribute
  • The behavior is documented (main point on this issue)
  • The behavior is tested (very important)

I agree with the other maintainers that DynamicDocument is the correct choice for this situation but I also believe that the current behavior (0.9) is ideal and should remain the default and only behavior. I believe the discussion here points to a larger issue of flexibility in the validation layer of MongoEngine. -- @seglberg

I explained why DynamicDocument can't be my solution without a refactoring (every extra attribute set at runtime is persisted, which is not the case with Document).
I can't refactor my (not very small) application just to upgrade to MongoEngine 0.9 but I need to upgrade (for many reason, at least new geospatial types support, for which I maintain a fork while a solution is provided).

I would agree with you if there was a transition between the 0.8.7 and the long awaited 0.9, with, as exemple, a warning raised instead of an exception and if it was documented

@noirbizarre
Copy link
Collaborator

As for the politics, I think debate is healthy for projects. It helps both users and maintainers to gain better perspectives about design and development challenges in front of them and possible solutions both seen and unseen. -- @seglberg

Agreed !

@gordol
Copy link
Author

gordol commented Apr 24, 2015

I agree, discourse among users and developers is important. However, this has all been discussed already in other threads like #934, and in this one.

I really don't care if @noirbizarre's patch is accepted, or mine, they are practically the same, it's just a matter of inverse connotation (ignore_undefined_fields: True vs strict: False). But I do find it rather funny that @seglberg rejects my pull request, because he disagree's with the behavior and addition of more options and flags, but he can get behind the other one... when they both do the same thing in the same way (prevent the exception with a meta flag). Am I the only one who sees the irony there? Haha.

If there is something about using a DynamicDocument that is undesirable, why don't we look at fixing that issue instead? Using a DynamicDocument is an explicitly semantic way to say 'Hey, I'm going to be saving/loading a schema-less version of my document from the database'.

Because I don't necessarily want my model to be schemaless. I want to define a rigid model against the schemaless data. That model may not touch all the data that is available, but only a subset of it. If I define a model, I don't want other fields to be readable or writeable... DynamicModel does not solve that problem.

Perhaps if DynamicDocuments allowed you to set a strict schema, but that seems silly, just to be able to read data. Basically it's the same either way... Some behavior flag would have to be set somewhere, whether it's on normal Documents, DynamicDocuments, or both.

@gordol
Copy link
Author

gordol commented Apr 24, 2015

I would agree with you if there was a transition between the 0.8.7 and the long awaited 0.9, with, as exemple, a warning raised instead of an exception and if it was documented

So much, this.

@gordol
Copy link
Author

gordol commented Apr 28, 2015

Closing this I guess. Hopefully #957 gets merged.

@gordol gordol closed this Apr 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#457 breaks legacy/undeclared fields
8 participants