Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryStream ending early, after only 1/14 of the query result set #1956

Closed
rickhuizinga opened this issue Mar 11, 2014 · 5 comments
Closed

Comments

@rickhuizinga
Copy link

I have a collection with 700K+ records. It contains a legacy geo-location field and I wrote a script, using a QueryStream, to update the field into a GeoJSON compatible format.

The problem is that when using a QueryStream, the stream is ended after only 50635 documents.

The script code is as follows:

var stream = db.propertyModel.find()
.where("address.location").exists(true)
.where("address.location.type").exists(false)
.select("address.location")
.stream();

var count = 0;
stream.on("data", function(property) {
  stream.pause();

  var location = {
    type: "Point",
    coordinates: property.address.location
  };

  property.update({ $set: { "address.location": location }}).exec(function(err, numberAffected, rawResponse) {
    if (err) {
      console.log("\n" + err.message);
    }

    count += 1;
    util.print("\rUpdated property # " + count);

    stream.resume();
  });
}).on("error", function(err) {
  console.log(err);
}).on("close", function() {
  db.mongoose.disconnect();
  console.log("\nStream closed\n");
});

From the Mongo shell, I have confirmed that the same query has a count of 721,938 documents:

db.properties.count({ "address.location": { $exists: true }, "address.location.type": { $exists: false }})

Response: 721938

Why is the QueryStream only streaming 50635 documents? Is it an internal cursor limitation?

@farhanpatel
Copy link

I'm also getting something very similar.

@vkarpov15
Copy link
Collaborator

Hi,

Try getting rid of the disconnect() call in on("close") handler. I'm not 100% sure, but I suspect that pause() only pauses "data" events, not "close" events.

@rickhuizinga
Copy link
Author

I moved the disconnect() to the on("end") event handler in another very similar project and the stream is still ending very early. I.e. It always ends after exactly 39300 records of about 1M.

I also implemented the method recommended in issue #1673, but it made no difference.

@rickhuizinga
Copy link
Author

I've found the solution for long running cursors: the timeout option needs to be set to false.

Calling setOptions({ timeout: false }) on the query prior to calling stream() will prevent long-running streams from terminating prematurely.

@vkarpov15
Copy link
Collaborator

@rickhuizinga thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants