Skip to content


Subversion checkout URL

You can clone with
Download ZIP
a Map/Reduce framework for distributed computing
Erlang Python JavaScript Shell HTML Makefile Other
Failed to load latest commit information.
bin Don't ignore compress argument
cloud cloud: add initial script for setting up a Disco cluster in AWS
conf conf: Add a spec file for creating rpm packages from Disco.
contrib docker: work around the new limitations of sshd in newer versions of …
doc Bump version to 0.5.4
ext Fix compile warnings and cleanup whitespace.
lib Merge pull request #606 from sloweater/local_files
master Fix badrecord bug for fail_info.
notes Added doc notes on GC safety; important background for understanding …
pkg Bump version to 0.5.4
tests fix ImportError instead of test skip
.disco-home updated disco command line / settings to use clx; related house-cleaning
.gitignore add .swp extension to the gitignore list.
.gitmodules Avoid using git:// urls to avoid firewall issues when building.
.travis.yml travis ci: add Erlang 17.3 to the build matrix
Makefile Bump version to 0.5.4 update readme

Disco - Massive data, Minimal code

Disco Logo

Disco is a distributed map-reduce and big-data framework. Like the original framework, which was publicized by Google, Disco supports parallel computations over large data sets on an unreliable cluster of computers. This makes it a perfect tool for analyzing and processing large datasets without having to bother about difficult technical questions related to distributed computing, such as communication protocols, load balancing, locking, job scheduling or fault tolerance, all of which are taken care by Disco.

Writing a Disco job is very simple. For example, the following job counts the number of words in a document:

from disco.core import Job, result_iterator

def map(line, params):
    for word in line.split():
        yield word, 1

def reduce(iter, params):
    from disco.util import kvgroup
    for word, counts in kvgroup(sorted(iter)):
        yield word, sum(counts)

if __name__ == '__main__':
    input = [""]
    job = Job().run(input=input, map=map, reduce=reduce)
    for word, count in result_iterator(job.wait()):
        print word, count

Note: For installing Disco, you cannot use the zip or tar.gz packages generated by github, instead you should clone this repository.

The develop branch contains the newest features and is not recommended for use in production. The master branch is the latest stable release and is tested in production. Important bug fixes will be first merged into the develop branch and then backported into the master branch.

Disco integrates with a lot of different tools. The following screenshot, for example, shows using ipython notebook to write a Disco job and using matplotlib to plot the results: ipython example

To learn more about the Disco Ecosystem see Disco Integrations. For some other resources, check out the Talks on Disco. Visit for more information.

Build Status: Travis-CI :: Travis-CI

Something went wrong with that request. Please try again.