Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform mapreduce without trying to find / retrieve results? #146

Closed
epicwhale opened this issue Aug 8, 2011 · 4 comments
Closed

Perform mapreduce without trying to find / retrieve results? #146

epicwhale opened this issue Aug 8, 2011 · 4 comments

Comments

@epicwhale
Copy link

I'm trying to perform a mapreduce in which the output is stored to another collection. I've set the configuration for the 'out' option. The mapreduce works fine when I run execute, but what seems to happen is that it also tries to _find / retrieve_ the results from the results collection after executing the map/reduce. This is an unnecessary overhead.

Ideally, it should simply run the mapreduce and do nothing or return the statistics of the mapreduce (time taken, emits, etc). Instead, it queries for all the results of the mapreduce.

<?php 

// $qb is the query builder

$qb->map('function(){
            emit(this.from.userId, {
                sex: this.from.sex,
                circle: this.from.circle,
                age: this.from.age
            });
        }')
        ->reduce('function(k, values){
            return values[values.length - 1];
        }')
        ->out(array('replace' => 'tmp.mr.ActiveUsers_foo'));

$query = $qb->getQuery();

var_dump($query->execute()); // is a LoggableCursor, why?

If I log the commands / queries executed, there's a MapReduce command _followed by a "find" on the collection tmp.mr.ActiveUsers_foo_. The second query shouldn't happen. Is this an intentional behavior? If so, how do I prevent it from happening?

@epicwhale
Copy link
Author

I dug into the code and discovered this. I realized the ->find() function is being called after every mapreduce. What is the thought behind it? Read my comments inline in the code below.

In _Doctrine/MongoDB/Collection.php_ - Line _135_ ( https://github.com/doctrine/mongodb/blob/master/lib/Doctrine/MongoDB/Collection.php#L375 )

<?php
    protected function doMapReduce($map, $reduce, array $out, array $query, array $options)
    {

       // ..........

        if (isset($out['inline']) && $out['inline'] === true) {
            return new ArrayIterator($result['results']);
           // ^^^ this is as expected since results are asked for inline
        }

        return $this->database->selectCollection($result['result'])->find();
        // ^^^ why are we doing this find(..) ? the user has categorically mentioned that he wants the output go to a particular collection. 
    }

Don't you think it would be better to do away with the ->find(..) all together?

@jwage
Copy link
Member

jwage commented Feb 7, 2012

I am not sure. what do you expect the behavior to be?

@epicwhale
Copy link
Author

If I'm performing a MapReduce whose results are being stored in a separate collection, there's no guarantee that I want to retrieve the results later on. Sometimes, you run MapReduce's to just aggregate and keep the results for later use.

If the MapReduce does not return the result inline, I believe it should either return nothing or return the statistics of the mapreduce (time taken, emits, etc).

@jmikola
Copy link
Member

jmikola commented Feb 27, 2013

Closing this, as we'll track it in doctrine/mongodb#95

@jmikola jmikola closed this as completed Feb 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants