Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Simple MapReduce implementation, written in JavaScript
JavaScript
tag: 0.0.2

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
samples
test
.gitignore
.npmignore
LICENSE
README.md
package.json
test.js

README.md

SimpleMapReduce

Simple MapReduce implementation, written in JavaScript.

Installation

Via npm on Node:

npm install simplemapreduce

Usage

Reference in your program:

var simplemapreduce = require('simplemapreduce');

Run

Synchronous run

simplemapreduce.runSync(items, mapfn, newfn, processfn);

where

  • items: to be processed. In the current version, it's an object with forEach function defined.
  • mapfn(item): given an item to be processed, it returns it's associated key..
  • newfn(item, key): given a new key, it returns the new object to be associated with that key.
  • processfn(item, result, [key, map]): process an item, usually modifying its associated result object. In addition, it could receive and use the associated key and the map, the dictionary that is being build by the process.

Example

var result = simplemapreduce.runSync(
    ["A", "word", "is", "a", "word"], 
    function (item) { return item.toLowerCase(); },
    function (item, key) { return { count: 0 }; },
    function (item, result) { result.count++; }
);
console.dir(result);

Output

{ a: { count: 2 }, word: { count: 2 }, is: { count: 1 } }

There is a run with callback:

simplemapreduce.run(items, mapfn, newfn, processfn);

under development. Current implementation internally uses runSync. Example:

simplemapreduce.run(
    ["A", "word", "is", "a", "word"], 
    function (item) { return item.toLowerCase(); },
    function (item, key) { return { count: 0 }; },
    function (item, result) { result.count++; },
    function (result) {
        console.dir(result);
    }
);

Run Task

Alternatively, you can define a task, an object with functions:

  • getItems(): return the items to be processed.
  • getKey(item): maps an item to its associated key.
  • getResult(item, key): creates a new object/value to be associated to the key/item. Usually it's used to accumulate results.
  • processItem(item, result, [key, map]): function that process an item, usually updating the result object.

Example:

var task = {
    items: ["A", "word", "is", "a", "word"], 
    getItems: function () { return this.items; },
    getKey: function (item) { return item.toLowerCase(); },
    getResult: function (item, key) { return { count: 0 }; },
    processItem: function (item, result) { result.count++; }
};

simplemapreduce.runTask(task, function (result) { console.dir(result); });

Notice that in this case, getItems returns items defined in the same task. You can provide a more complex function, i.e. reading an stream or file.

Development

git clone git://github.com/ajlopez/SimpleMapReduce.git
cd SimpleMapReduce
npm install
npm test

Samples

Words Word Count sample with callback.

Words Sync Synchronous Word Count sample.

Task Run Task sample with callback.

Task Sync Synchrnous Run Task.

To do

  • Improve async procesing
  • Distributed sample

Versions

  • 0.0.1 : Published
  • 0.0.2 : Under development

Contribution

Feel free to file issues and submit pull requests — contributions are welcome.

If you submit a pull request, please be sure to add or update corresponding test cases, and ensure that npm test continues to pass.

Something went wrong with that request. Please try again.