Library to aid writing Hadoop jobs in Clojure.
Switch branches/tags
Pull request Compare This branch is 131 commits ahead, 1 commit behind stuartsierra:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
src/clojure_hadoop added type annotations Nov 20, 2013
test-resources Add files and archives to the DistributedCache via configuraiton. Mar 16, 2013
CHANGES.txt update changelog Aug 20, 2010
LICENSE.html Move license file to LICENSE.html for Maven compatibility Jan 11, 2010
README.txt Merge branch 'adding-files-to-distributed-cache' Mar 12, 2013
project.clj bump to snapshot Nov 21, 2013



A library to assist in writing Hadoop MapReduce jobs in Clojure.

Originally written by Stuart Sierra (
Extended by Roman Scherer, Christopher Miles, Ian Eslick, 
Dave Lambert, Alex Ott, and other.

Stable releases are available via

For more information
on Clojure,
on Hadoop,

Also see Stuart's presentation about this library at

Introduction to work with library is available at

Copyright (c) Stuart Sierra, 2009. All rights reserved.  The use and
distribution terms for this software are covered by the Eclipse Public
License 1.0 ( which can
be found in the file LICENSE.html at the root of this distribution.
By using this software in any fashion, you are agreeing to be bound by
the terms of this license.  You must not remove this notice, or any
other, from this software.


This library requires Java 6 JDK,

Building from source requires Leiningen,


If you downloaded the library distribution as a .zip or .tar file,
everything is pre-built and there is nothing you need to do.

If you downloaded the sources from Git, then you need to run the build
with Leiningen. In the top-level directory of this project, run:

    lein jar

This compiles and builds the JAR file.


After building, copy the file from


to something short, like "examples.jar".  Each of the *.clj files in
the test/clojure_hadoop/examples directory contains instructions for
running that example.

The wordcount examples can also be run via the "lein test" command.


After building, include the "clojure-hadoop-${VERSION}.jar" file
in the lib/ directory of the JAR you submit as your Hadoop job.


You can depend on clojure-hadoop in your Maven 2 projects by adding
the following lines to your pom.xml:




        <url> </url>



This library provides different layers of abstraction away from the
raw Hadoop API.

Layer 1: clojure-hadoop.imports

    Provides convenience functions for importing the many classes and
    interfaces in the Hadoop API.

Layer 2: clojure-hadoop.gen

    Provides gen-class macros to generate the multiple classes needed
    for a MapReduce job.  See the example file "wordcount1.clj" for a
    demonstration of these macros.

Layer 3: clojure-hadoop.wrap

    clojure-hadoop.wrap: provides wrapper functions that automatically
    convert between Hadoop Text objects and Clojure data structures.
    See the example file "wordcount2.clj" for a demonstration of these

Layer 4: clojure-hadoop.job

    Provides a complete implementation of a Hadoop MapReduce job that
    can be dynamically configured to use any Clojure functions in the
    map and reduce phases.  See the example file "wordcount3.clj" for
    a demonstration of this usage.

Layer 5: clojure-hadoop.defjob

    A convenient macro to configure MapReduce jobs with Clojure code.
    See the example files "wordcount4.clj" and "wordcount5.clj" for
    demonstrations of this macro.

Layer 6: clojure-hadoop.defjob - Specifying JobConf parameters 

    Often its necessary to specify parameters in the job's 
    configuration to in order to enable dynamic map/reduce jobs.
    Hadoop natively enables this through the -D<key>=<value>
    commandline specification.
    Using the convenient defjob macro, "wordcount6.clj" demonstrates
    how to set job configuration (JobConf) parameters either via
    the commandline, or as part of the defjob defintion within the file.

 Layer 7: clojure-hadoop.config - Adding files and archives 
to the DistributedCache.
    Example file "wordcount7.clj" demonstrates how to specify files
    and archives for distribution to across nodes via the 
    DistributedCache, as well as how to access the files 
    during the mapper-setup or reducer-setup phases.


* README.txt changed to reflect the Leiningen build process (Roman Scherer).