default limits of 101 objects? #7

hhuuggoo · 2011-06-08T18:18:12Z

using the limit kw arg into find, can give you more results, however when it is not specified, asyncmongo defaults to 101? the code does not say anything about this.

import asyncmongo
import pymongo
import threading
import time
import tornado

def start_loop_timeout(timeout=0.05):
def kill():
time.sleep(timeout)
tornado.ioloop.IOLoop.instance().stop()
t = threading.Thread(target=kill)
t.start()
tornado.ioloop.IOLoop.instance().start()

class AsyncDBTester(object):
def init(self):
self.responses = []
self.errors = []

def async_callback(self, response, error):
    self.responses.append(response)
    self.errors.append(error)

db = asyncmongo.Client(pool_id='test', host='127.0.0.1', port=27017, dbname='test')
AsyncDBTester = AsyncDBTester
dbtester = AsyncDBTester()
db.dummy.remove(callback=dbtester.async_callback)
start_loop_timeout(0.1)
count = 0
for b in range(10):
for c in range(100):
db.dummy.insert({'a' : b * c}, callback=dbtester.async_callback)
count += 1
start_loop_timeout(0.1)

dbtester = AsyncDBTester()
db.dummy.find(callback=dbtester.async_callback, limit=1000)
start_loop_timeout(0.2)
print 'inserted', count
print "can get by specifying limit", len(dbtester.responses[0])

dbtester = AsyncDBTester()
db.dummy.find(callback=dbtester.async_callback)
start_loop_timeout(0.2)
print count
print "but why 101 here?", len(dbtester.responses[0])

blockingdb = pymongo.connection.Connection(host = '127.0.0.1', port = 27017)['test']
result = list(blockingdb.dummy.find())
print "pymongo gives me", len(result)

The text was updated successfully, but these errors were encountered:

energycsdx · 2011-06-21T13:49:19Z

limit is not actualy limit. it amount of objects that MongoDB will send in first chunk.
and asyncmongo do not request next chunk. And looks like will never request due to async design

hhuuggoo · 2011-06-22T23:18:57Z

Is there a way using mongodb to know that there are more chunks coming in and to go get them?

energycsdx · 2011-06-23T08:06:28Z

about asyncmongo:

Features not supported: some features from pymongo are not currently implemented...... and retrieving results in batches instead of all at once.

chaselee · 2011-11-10T17:18:45Z

Is there any way to override the default? I tried what I thought might work like passing in limit=0 to no avail.

energycsdx · 2011-11-16T09:52:05Z

setting limit=0 require get answer in several chuncks

chaselee · 2011-11-16T14:36:01Z

So effectively no for limit 0. Thanks! What's the override otherwise?
On Nov 16, 2011 4:52 AM, "energycsdx" <
reply@reply.github.com>
wrote:

setting limit=0 require get answer in several chuncks

Reply to this email directly or view it on GitHub:
#7 (comment)

ceymard · 2011-12-19T17:55:34Z

Is there any reliable way to iterate over a whole set of elements matching a query, without risking losing one or having duplicates ?

ajdavis · 2011-12-19T21:06:24Z

If no limit is set, the mongo server by default will return about 101 documents or 1mb of data, whichever is less, in the first batch. As @energycsdx points out, asyncmongo doesn't support retrieving batches of data.

A hacky way to ensure that the first batch contains all the data is to do find(callback=mycallback, limit=1000000).

However, MongoDB enforces a hard limit of 4MB per batch of data, and there is no override.

ceymard · 2011-12-19T22:52:24Z

Right now, I am iterating over the collection using skip and a sort order on _id to fetch it all.

However, I am not entirely sure of the robustness of this method (works quite well when the data doesn't change in between).

ceymard · 2011-12-20T14:07:29Z

This is the code I use to iterate over my collection. It should work pretty well, even if the DB is being modified at the same time (in theory, there would be no duplicate elements, nor should it skip over elements if there were deletions).

You use it as such :

next = mongo_find_all(db.mycollection, callback=mycallback)

def mycallback(response, error):
    # do_stuff
    next() # Call next when done to get next batch.

next() # Get first batch.

Please comment or spot bugs :

def mongo_find_all(collection, spec=None, fields=None, **kwargs):
    """ A generator over a mongodb query.

        When querying mongodb with `find()`, we do not get systematically
        all the elements that matched.

        This generator iterates over **all** the documents that match
        the description, making several calls to `find()` while doing so.

        The trick is that it orders the query thanks to the `_id` element
        of the objects, making subsequent queries from the last `_id` encountered.
        It will also inspect the asked sort order to maintain the sorting in
        the resulting query.

        This can only work on collections with the `_id` attribute.

        :param collection: The database.collection object
        :param spec: The specification of the query.
        :param kwargs: all the keywords arguments that will be given
            to the find() method. 
    """
    direction = 1
    found = False

    sort = kwargs.get("sort", [])
    for field, order in sort:
        if field == "_id":
            direction = order
            found = True

    spec = dict(**spec) # We take a copy since we're going to alter it.

    callback = kwargs.get("callback", None)
    if not callable(callback):
        raise Exception("Please provide a callback")

    # We make sure we have a sorting on the _id
    if not found:
        sort = sort + [('_id', 1)]
    kwargs["sort"] = sort

    last = None

    def _next():
        collection.find(spec, **kwargs)

    def _callback(response, error=None):
        if not response:
            callback([], error)
            return

        last = response[-1]
        for field, order in sort:
            spec[field] = { "$gt" if order == 1 else "$lt" : last[field] }

        callback(response, error)

    kwargs["callback"] = _callback
    return _next

ajdavis · 2011-12-20T15:01:57Z

@ceymard that looks like it'll work; there will be performance penalties compared to iterating a synchronous pymongo cursor. You might also consider a development branch in my fork:

https://github.com/ajdavis/asyncmongo/commit/5d210aa25805f69843f1a8a5d3e74c0787d0fe81

See the unittest at the bottom for usage. Essentially, find() will now return a cursor, and you can do

if cursor.alive: cursor.getmore(callback)

It's very far from production-ready, but could be a basis for implementing getmore() and tailable cursors in asyncmongo.

ajdavis · 2012-02-06T13:35:06Z

Patch:

#39

jehiah · 2013-01-12T02:26:56Z

closing in favor of the patch in #39 (or something close to it)

FlorianLudwig mentioned this issue Feb 8, 2012

Implement cursor.get_more() and tailable cursors, add tailing sample app #39

Open

jehiah closed this as completed Jan 12, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

default limits of 101 objects? #7

default limits of 101 objects? #7

hhuuggoo commented Jun 8, 2011

energycsdx commented Jun 21, 2011

hhuuggoo commented Jun 22, 2011

energycsdx commented Jun 23, 2011

chaselee commented Nov 10, 2011

energycsdx commented Nov 16, 2011

chaselee commented Nov 16, 2011

ceymard commented Dec 19, 2011

ajdavis commented Dec 19, 2011

ceymard commented Dec 19, 2011

ceymard commented Dec 20, 2011

ajdavis commented Dec 20, 2011

ajdavis commented Feb 6, 2012

jehiah commented Jan 12, 2013

default limits of 101 objects? #7

default limits of 101 objects? #7

Comments

hhuuggoo commented Jun 8, 2011

energycsdx commented Jun 21, 2011

hhuuggoo commented Jun 22, 2011

energycsdx commented Jun 23, 2011

chaselee commented Nov 10, 2011

energycsdx commented Nov 16, 2011

chaselee commented Nov 16, 2011

ceymard commented Dec 19, 2011

ajdavis commented Dec 19, 2011

ceymard commented Dec 19, 2011

ceymard commented Dec 20, 2011

ajdavis commented Dec 20, 2011

ajdavis commented Feb 6, 2012

jehiah commented Jan 12, 2013