QueryStream ending early, after only 1/14 of the query result set #1956

Closed
rickhuizinga opened this Issue Mar 11, 2014 · 5 comments

Projects

None yet

3 participants

@rickhuizinga

I have a collection with 700K+ records. It contains a legacy geo-location field and I wrote a script, using a QueryStream, to update the field into a GeoJSON compatible format.

The problem is that when using a QueryStream, the stream is ended after only 50635 documents.

The script code is as follows:

var stream = db.propertyModel.find()
.where("address.location").exists(true)
.where("address.location.type").exists(false)
.select("address.location")
.stream();

var count = 0;
stream.on("data", function(property) {
  stream.pause();

  var location = {
    type: "Point",
    coordinates: property.address.location
  };

  property.update({ $set: { "address.location": location }}).exec(function(err, numberAffected, rawResponse) {
    if (err) {
      console.log("\n" + err.message);
    }

    count += 1;
    util.print("\rUpdated property # " + count);

    stream.resume();
  });
}).on("error", function(err) {
  console.log(err);
}).on("close", function() {
  db.mongoose.disconnect();
  console.log("\nStream closed\n");
});

From the Mongo shell, I have confirmed that the same query has a count of 721,938 documents:

db.properties.count({ "address.location": { $exists: true }, "address.location.type": { $exists: false }})

Response: 721938

Why is the QueryStream only streaming 50635 documents? Is it an internal cursor limitation?

@farhanpatel

I'm also getting something very similar.

@vkarpov15
Collaborator

Hi,

Try getting rid of the disconnect() call in on("close") handler. I'm not 100% sure, but I suspect that pause() only pauses "data" events, not "close" events.

@rickhuizinga

I moved the disconnect() to the on("end") event handler in another very similar project and the stream is still ending very early. I.e. It always ends after exactly 39300 records of about 1M.

I also implemented the method recommended in issue #1673, but it made no difference.

@rickhuizinga

I've found the solution for long running cursors: the timeout option needs to be set to false.

Calling setOptions({ timeout: false }) on the query prior to calling stream() will prevent long-running streams from terminating prematurely.

@vkarpov15
Collaborator

@rickhuizinga thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment