Skip to content

Conversation

@jonathansp
Copy link

Using a cursos in an if statement:

cursor = Collection.objects
if cursor:
    (...)

Will open all documents, because there are not an __nonzero__ method.
This change check only one document (if present) and returns True or False.

Using a cursos in an if statement:

cursor = Collection.objects

	if cursor:
		(...)

Will open all documents, because there are not an __nonzero__ method.
This change check only one document (if present) and returns True or False.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. If the query set contains some data, it should be truthy regardless of the position of the cursor. Note that for obj in qs will always rewind the cursor and start from the beginning. It would be very weird to have something like produce output:

if not qs:
    for obj in qs:
        print obj

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, __nonzero__ can rewind the cursor. The main problem is performance. For example, if Test.objects is large (I tested a collection with 900k documents), a simple if cursor: will open all documents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work. An alternative would be to make use of objects.first() in __nonzero__. It would also be cool to clear the ordering on a query set (because you don't need the performance overhead of sorting if you're only interested in checking if a query set is non-empty).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, check first() appears to be better option than for loop with rewind. Changed.

@wojcikstefan
Copy link
Member

Would be cool to add a test case that shows the performance increase. E.g. you could use mongoengine.context_manager.query_counter to ensure the limit value is 1. I'm thinking:

    def test_bool_performance(self):
        class Person(Document):
            name = StringField()

        Person.drop_collection()
        for i in xrange(100):
            Person(name="No: %s" % i).save()

        with query_counter() as q:
            if Person.objects:
                pass

            self.assertEqual(q, 1)
            op = q.db.system.profile.find({"ns": {"$ne": "%s.system.indexes" % q.db.name}})[0]
            self.assertEqual(op['nreturned'], 1)

Indeed, right now it fails on the master branch.

How about some bonus points for clearing ordering? :) You'd have to check that calling if Person.objects.order_by('name') doesn't have a $orderby key in op['query'] and that the subsequent queries on the same query set preserve the ordering (i.e. ordering's only cleared temporarily for the conditional statements).

@jonathansp
Copy link
Author

I cloned queryset to clear _ordering list. So, test first() again

@wojcikstefan
Copy link
Member

Looks good - one last thing I'd check is if clearing $orderby works well with default ordering - basically add an extra test_bool_with_ordering, but using this doc:

class Person(Document):
    name = StringField()

    meta = {
        'ordering': ['name']
    }

and using if Person.objects (w/o explicit .order_by(...))

@jonathansp
Copy link
Author

Default ordering is treated on cursor_obj. Test added.

@tmpapageorgiou
Copy link

You could make _bool a public method and change the name to has_data and docstring to "Retrieves whether cursor has any data".

def has_data(self):
    "Retrieves whether cursor has any data"

    queryset = self.clone()
    queryset._ordering = []
    return False if queryset.first() is None else True

@wojcikstefan
Copy link
Member

@jonathansp @tmpapageorgiou quoting The Zen of Python (> import this):

There should be one-- and preferably only one --obvious way to do it.

That being said, I personally dislike the ability to choose between if Person.objects and if Person.objects.has_data(). The former seems much simpler and more elegant. If there's no need to expose an extra method, then let's not do it - I'd keep the _bool name or at least add an underscore prefix to has_data (i.e. _has_data), so it's obvious that's it's for the library's internal use.

As for the default ordering specified via meta, I'm 100% sure there's a way to clear it, too. Let me get back to you on this one after I play around with it.

@wojcikstefan
Copy link
Member

@jonathansp clearing the default ordering needed a fix indeed. See #657.

@jonathansp
Copy link
Author

@wojcikstefan cool! I will change to:

def _has_data(self):
    "Retrieves whether cursor has any data"

    queryset = self.clone()
    queryset.order_by()
    return False if queryset.first() is None else True

Do I need wait for your PR?

@wojcikstefan
Copy link
Member

@jonathansp you don't need to wait for my PR, although - combined with yours - it'll make things faster.

@jonathansp
Copy link
Author

Added PR #657

@jonathansp
Copy link
Author

What do you think about ?
@wojcikstefan @lig

@wojcikstefan
Copy link
Member

I think it's fine now, but I don't have the power to merge it :)

@yograterol yograterol added this to the 0.9 milestone Jun 25, 2014
@yograterol yograterol self-assigned this Jun 25, 2014
@yograterol
Copy link
Contributor

Good job @jonathansp and @wojcikstefan ! Merge!

yograterol added a commit that referenced this pull request Jun 25, 2014
Avoid to open all documents from cursors in an if stmt
@yograterol yograterol merged commit 2d5280f into MongoEngine:master Jun 25, 2014
rozza added a commit that referenced this pull request Jun 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants