-
-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform mapreduce without trying to find / retrieve results? #146
Comments
I dug into the code and discovered this. I realized the ->find() function is being called after every mapreduce. What is the thought behind it? Read my comments inline in the code below. In _Doctrine/MongoDB/Collection.php_ - Line _135_ ( https://github.com/doctrine/mongodb/blob/master/lib/Doctrine/MongoDB/Collection.php#L375 ) <?php
protected function doMapReduce($map, $reduce, array $out, array $query, array $options)
{
// ..........
if (isset($out['inline']) && $out['inline'] === true) {
return new ArrayIterator($result['results']);
// ^^^ this is as expected since results are asked for inline
}
return $this->database->selectCollection($result['result'])->find();
// ^^^ why are we doing this find(..) ? the user has categorically mentioned that he wants the output go to a particular collection.
} Don't you think it would be better to do away with the ->find(..) all together? |
I am not sure. what do you expect the behavior to be? |
If I'm performing a MapReduce whose results are being stored in a separate collection, there's no guarantee that I want to retrieve the results later on. Sometimes, you run MapReduce's to just aggregate and keep the results for later use. If the MapReduce does not return the result inline, I believe it should either return nothing or return the statistics of the mapreduce (time taken, emits, etc). |
Closing this, as we'll track it in doctrine/mongodb#95 |
I'm trying to perform a mapreduce in which the output is stored to another collection. I've set the configuration for the 'out' option. The mapreduce works fine when I run execute, but what seems to happen is that it also tries to _find / retrieve_ the results from the results collection after executing the map/reduce. This is an unnecessary overhead.
Ideally, it should simply run the mapreduce and do nothing or return the statistics of the mapreduce (time taken, emits, etc). Instead, it queries for all the results of the mapreduce.
If I log the commands / queries executed, there's a MapReduce command _followed by a "find" on the collection tmp.mr.ActiveUsers_foo_. The second query shouldn't happen. Is this an intentional behavior? If so, how do I prevent it from happening?
The text was updated successfully, but these errors were encountered: