Perform mapreduce without trying to find / retrieve results? #146

epicwhale · 2011-08-08T07:14:27Z

I'm trying to perform a mapreduce in which the output is stored to another collection. I've set the configuration for the 'out' option. The mapreduce works fine when I run execute, but what seems to happen is that it also tries to _find / retrieve_ the results from the results collection after executing the map/reduce. This is an unnecessary overhead.

Ideally, it should simply run the mapreduce and do nothing or return the statistics of the mapreduce (time taken, emits, etc). Instead, it queries for all the results of the mapreduce.

<?php 

// $qb is the query builder

$qb->map('function(){
            emit(this.from.userId, {
                sex: this.from.sex,
                circle: this.from.circle,
                age: this.from.age
            });
        }')
        ->reduce('function(k, values){
            return values[values.length - 1];
        }')
        ->out(array('replace' => 'tmp.mr.ActiveUsers_foo'));

$query = $qb->getQuery();

var_dump($query->execute()); // is a LoggableCursor, why?

If I log the commands / queries executed, there's a MapReduce command _followed by a "find" on the collection tmp.mr.ActiveUsers_foo_. The second query shouldn't happen. Is this an intentional behavior? If so, how do I prevent it from happening?

epicwhale · 2011-08-08T16:05:10Z

I dug into the code and discovered this. I realized the ->find() function is being called after every mapreduce. What is the thought behind it? Read my comments inline in the code below.

In _Doctrine/MongoDB/Collection.php_ - Line _135_ ( https://github.com/doctrine/mongodb/blob/master/lib/Doctrine/MongoDB/Collection.php#L375 )

<?php
    protected function doMapReduce($map, $reduce, array $out, array $query, array $options)
    {

       // ..........

        if (isset($out['inline']) && $out['inline'] === true) {
            return new ArrayIterator($result['results']);
           // ^^^ this is as expected since results are asked for inline
        }

        return $this->database->selectCollection($result['result'])->find();
        // ^^^ why are we doing this find(..) ? the user has categorically mentioned that he wants the output go to a particular collection. 
    }

Don't you think it would be better to do away with the ->find(..) all together?

jwage · 2012-02-07T04:43:07Z

I am not sure. what do you expect the behavior to be?

epicwhale · 2012-02-26T15:12:17Z

If I'm performing a MapReduce whose results are being stored in a separate collection, there's no guarantee that I want to retrieve the results later on. Sometimes, you run MapReduce's to just aggregate and keep the results for later use.

If the MapReduce does not return the result inline, I believe it should either return nothing or return the statistics of the mapreduce (time taken, emits, etc).

jmikola · 2013-02-27T17:30:38Z

Closing this, as we'll track it in doctrine/mongodb#95

jmikola mentioned this issue Feb 27, 2013

Allow map/reduce to run without querying results from the output collection doctrine/mongodb#95

Closed

jmikola closed this as completed Feb 27, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform mapreduce without trying to find / retrieve results? #146

Perform mapreduce without trying to find / retrieve results? #146

epicwhale commented Aug 8, 2011

epicwhale commented Aug 8, 2011

jwage commented Feb 7, 2012

epicwhale commented Feb 26, 2012

jmikola commented Feb 27, 2013

Perform mapreduce without trying to find / retrieve results? #146

Perform mapreduce without trying to find / retrieve results? #146

Comments

epicwhale commented Aug 8, 2011

epicwhale commented Aug 8, 2011

jwage commented Feb 7, 2012

epicwhale commented Feb 26, 2012

jmikola commented Feb 27, 2013