Skip to content
This repository has been archived by the owner on Nov 5, 2019. It is now read-only.

default limits of 101 objects? #7

Closed
hhuuggoo opened this issue Jun 8, 2011 · 13 comments
Closed

default limits of 101 objects? #7

hhuuggoo opened this issue Jun 8, 2011 · 13 comments

Comments

@hhuuggoo
Copy link

hhuuggoo commented Jun 8, 2011

using the limit kw arg into find, can give you more results, however when it is not specified, asyncmongo defaults to 101? the code does not say anything about this.

import asyncmongo
import pymongo
import threading
import time
import tornado

def start_loop_timeout(timeout=0.05):
def kill():
time.sleep(timeout)
tornado.ioloop.IOLoop.instance().stop()
t = threading.Thread(target=kill)
t.start()
tornado.ioloop.IOLoop.instance().start()

class AsyncDBTester(object):
def init(self):
self.responses = []
self.errors = []

def async_callback(self, response, error):
    self.responses.append(response)
    self.errors.append(error)

db = asyncmongo.Client(pool_id='test', host='127.0.0.1', port=27017, dbname='test')
AsyncDBTester = AsyncDBTester
dbtester = AsyncDBTester()
db.dummy.remove(callback=dbtester.async_callback)
start_loop_timeout(0.1)
count = 0
for b in range(10):
for c in range(100):
db.dummy.insert({'a' : b * c}, callback=dbtester.async_callback)
count += 1
start_loop_timeout(0.1)

dbtester = AsyncDBTester()
db.dummy.find(callback=dbtester.async_callback, limit=1000)
start_loop_timeout(0.2)
print 'inserted', count
print "can get by specifying limit", len(dbtester.responses[0])

dbtester = AsyncDBTester()
db.dummy.find(callback=dbtester.async_callback)
start_loop_timeout(0.2)
print count
print "but why 101 here?", len(dbtester.responses[0])

blockingdb = pymongo.connection.Connection(host = '127.0.0.1', port = 27017)['test']
result = list(blockingdb.dummy.find())
print "pymongo gives me", len(result)

@energycsdx
Copy link
Contributor

limit is not actualy limit. it amount of objects that MongoDB will send in first chunk.
and asyncmongo do not request next chunk. And looks like will never request due to async design

@hhuuggoo
Copy link
Author

Is there a way using mongodb to know that there are more chunks coming in and to go get them?

@energycsdx
Copy link
Contributor

about asyncmongo:

Features not supported: some features from pymongo are not currently implemented...... and retrieving results in batches instead of all at once.

@chaselee
Copy link

Is there any way to override the default? I tried what I thought might work like passing in limit=0 to no avail.

@energycsdx
Copy link
Contributor

setting limit=0 require get answer in several chuncks

@chaselee
Copy link

So effectively no for limit 0. Thanks! What's the override otherwise?
On Nov 16, 2011 4:52 AM, "energycsdx" <
reply@reply.github.com>
wrote:

setting limit=0 require get answer in several chuncks


Reply to this email directly or view it on GitHub:
#7 (comment)

@ceymard
Copy link

ceymard commented Dec 19, 2011

Is there any reliable way to iterate over a whole set of elements matching a query, without risking losing one or having duplicates ?

@ajdavis
Copy link
Contributor

ajdavis commented Dec 19, 2011

If no limit is set, the mongo server by default will return about 101 documents or 1mb of data, whichever is less, in the first batch. As @energycsdx points out, asyncmongo doesn't support retrieving batches of data.

A hacky way to ensure that the first batch contains all the data is to do find(callback=mycallback, limit=1000000).

However, MongoDB enforces a hard limit of 4MB per batch of data, and there is no override.

@ceymard
Copy link

ceymard commented Dec 19, 2011

Right now, I am iterating over the collection using skip and a sort order on _id to fetch it all.

However, I am not entirely sure of the robustness of this method (works quite well when the data doesn't change in between).

@ceymard
Copy link

ceymard commented Dec 20, 2011

This is the code I use to iterate over my collection. It should work pretty well, even if the DB is being modified at the same time (in theory, there would be no duplicate elements, nor should it skip over elements if there were deletions).

You use it as such :

next = mongo_find_all(db.mycollection, callback=mycallback)

def mycallback(response, error):
    # do_stuff
    next() # Call next when done to get next batch.

next() # Get first batch.

Please comment or spot bugs :

def mongo_find_all(collection, spec=None, fields=None, **kwargs):
    """ A generator over a mongodb query.

        When querying mongodb with `find()`, we do not get systematically
        all the elements that matched.

        This generator iterates over **all** the documents that match
        the description, making several calls to `find()` while doing so.

        The trick is that it orders the query thanks to the `_id` element
        of the objects, making subsequent queries from the last `_id` encountered.
        It will also inspect the asked sort order to maintain the sorting in
        the resulting query.

        This can only work on collections with the `_id` attribute.

        :param collection: The database.collection object
        :param spec: The specification of the query.
        :param kwargs: all the keywords arguments that will be given
            to the find() method. 
    """
    direction = 1
    found = False

    sort = kwargs.get("sort", [])
    for field, order in sort:
        if field == "_id":
            direction = order
            found = True

    spec = dict(**spec) # We take a copy since we're going to alter it.

    callback = kwargs.get("callback", None)
    if not callable(callback):
        raise Exception("Please provide a callback")

    # We make sure we have a sorting on the _id
    if not found:
        sort = sort + [('_id', 1)]
    kwargs["sort"] = sort

    last = None

    def _next():
        collection.find(spec, **kwargs)

    def _callback(response, error=None):
        if not response:
            callback([], error)
            return

        last = response[-1]
        for field, order in sort:
            spec[field] = { "$gt" if order == 1 else "$lt" : last[field] }

        callback(response, error)

    kwargs["callback"] = _callback
    return _next

@ajdavis
Copy link
Contributor

ajdavis commented Dec 20, 2011

@ceymard that looks like it'll work; there will be performance penalties compared to iterating a synchronous pymongo cursor. You might also consider a development branch in my fork:

https://github.com/ajdavis/asyncmongo/commit/5d210aa25805f69843f1a8a5d3e74c0787d0fe81

See the unittest at the bottom for usage. Essentially, find() will now return a cursor, and you can do

if cursor.alive: cursor.getmore(callback)

It's very far from production-ready, but could be a basis for implementing getmore() and tailable cursors in asyncmongo.

@ajdavis
Copy link
Contributor

ajdavis commented Feb 6, 2012

Patch:

#39

@jehiah
Copy link
Member

jehiah commented Jan 12, 2013

closing in favor of the patch in #39 (or something close to it)

@jehiah jehiah closed this as completed Jan 12, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

6 participants