Skip to content
This repository has been archived by the owner on Nov 11, 2020. It is now read-only.

Allow map/reduce to run without querying results from the output collection #95

Closed
jmikola opened this issue Feb 27, 2013 · 2 comments
Closed

Comments

@jmikola
Copy link
Member

jmikola commented Feb 27, 2013

Quoting @epicwhale from: doctrine/mongodb-odm#146


I'm trying to perform a mapreduce in which the output is stored to another collection. I've set the configuration for the 'out' option. The mapreduce works fine when I run execute, but what seems to happen is that it also tries to _find / retrieve_ the results from the results collection after executing the map/reduce. This is an unnecessary overhead.

Ideally, it should simply run the mapreduce and do nothing or return the statistics of the mapreduce (time taken, emits, etc). Instead, it queries for all the results of the mapreduce.

<?php 

// $qb is the query builder

$qb->map('function(){
            emit(this.from.userId, {
                sex: this.from.sex,
                circle: this.from.circle,
                age: this.from.age
            });
        }')
        ->reduce('function(k, values){
            return values[values.length - 1];
        }')
        ->out(array('replace' => 'tmp.mr.ActiveUsers_foo'));

$query = $qb->getQuery();

var_dump($query->execute()); // is a LoggableCursor, why?

If I log the commands / queries executed, there's a MapReduce command _followed by a "find" on the collection tmp.mr.ActiveUsers_foo_. The second query shouldn't happen. Is this an intentional behavior? If so, how do I prevent it from happening?


I dug into the code and discovered this. I realized the ->find() function is being called after every mapreduce. What is the thought behind it? Read my comments inline in the code below.

In _Doctrine/MongoDB/Collection.php_ - Line _135_ ( https://github.com/doctrine/mongodb/blob/master/lib/Doctrine/MongoDB/Collection.php#L375 )

<?php
    protected function doMapReduce($map, $reduce, array $out, array $query, array $options)
    {

       // ..........

        if (isset($out['inline']) && $out['inline'] === true) {
            return new ArrayIterator($result['results']);
           // ^^^ this is as expected since results are asked for inline
        }

        return $this->database->selectCollection($result['result'])->find();
        // ^^^ why are we doing this find(..) ? the user has categorically mentioned that he wants the output go to a particular collection. 
    }

Don't you think it would be better to do away with the ->find(..) all together?

@jmikola
Copy link
Member Author

jmikola commented Aug 2, 2013

@epicwhale: Having researched this, I don't think we should change the behavior in 1.x, as it'd be a significant BC break for anyone relying on this. That said, although the method does create a cursor, there should be no real overhead unless you start iterating on it (cursors don't actually hit MongoDB with a query until you request the first result).

@epicwhale
Copy link
Contributor

Fair enough. Its maybe an over-optimization then.

@jmikola jmikola closed this as completed Aug 2, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants