Skip to content

jdstmporter/Haskell-MapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HASKELL MAPREDUCE LIBRARY

This is an implementation of MapReduce for use on a single machine.  It will 
automatically parallelise into multiple threads running on all available 
processors.

The key idea is that map and reduce stages can just be treated as functions of 
type

[s] -> [(s',a)]

where s is the input data type, s' is the output data type and a is the output 
key type.  The package providers a wrapper function liftMR that converts the map 
/ reduce function into a monadic function, so building a processing chain is 
simply a matter of setting out

liftMR map1 >>= liftMR reduce1 >>= liftMR map2 >>= liftMR reduce2 >>= . . .

Then existing utility functions allow one to evaluate the processing chain on 
specified initial data.

Key points are:

(1) From the point of view of the applications programmer MapReduce is totally
    transparent; mappers and reducers can be written as simple functions; the
    wrapper function and the monad do all the work of distributing work across
    mappers / reducers, sorting the output based on key value, etc.
(2) Writing a MapReduce function reduces to writing the mappers / reducers and
    the providing fewer than ten lines of boilerplate code.
(3) The fact that the distinction between mappers and reducers is eliminate     
    makes it possible to produce processing chains more general than those 
    permitted by more traditional MapReduce frameworks, such as HADOOP.

THE PACKAGE

The package contains:

(1) Full source code for the library, with Haddock documentation.
(2) A Cabal file to build the library and generate documentation.

To build the library and applications:

    mkdir MR
    cd MR
    git clone git://github.com/Julianporter/Haskell-MapReduce.git
    cd Haskell-MapReduce 
    cabal install

Cabal will the proceed to build the library, add it to your installation and 
create Haddock documentation.  As well as the library, it will install two
applications:

(1) 'wordcount' : this does a simple word count on a supplied file of space-
    separated words and prints the result.
(2) 'testwordcount' : this applies automated tests using the word count
    algorithm, using the QuickCheck testing framework.  The maximum number of
    words is kept to less than 10,000 because for larger values the tests can
    become extremely slow.

FOR MORE INFORMATION

Project website: http://blog.jpembeddedsolutions.com/MapReduce

About

A project to develop a fully distributed MapReduce library for Haskell which makes using the MapReduce framework totally transparent for application programmers.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published