Python clone of Spark, a MapReduce alike framework in Python
Python JavaScript C HTML CSS Shell Dockerfile
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
docker dpark.conf 添加 MEM_PER_TASK 选项 Aug 14, 2014
docs Refactor: rename job taskset Aug 2, 2018
dpark pytest parallel test Aug 8, 2018
examples Refactor: rename _last_stats last_jobstats Aug 8, 2018
tests pytest parallel test Aug 8, 2018
tools Feature: mrun use special role. Aug 2, 2018
.gitignore fix recursion max depth Aug 8, 2018
.travis.yml Update .travis.yml Aug 10, 2018
AUTHORS first public commit Apr 11, 2012
CONTRIBUTORS prepare pypi release Dec 24, 2015
LICENSE first public commit Apr 11, 2012 feat: Web UI support for dpark Dec 5, 2016
req.txt add hardware accelerated crc32c module Mar 19, 2018
setup.cfg release to pypi Dec 24, 2015 fix recursion max depth Aug 8, 2018
tox.ini pytest parallel test Aug 8, 2018



pypi status ci status Join the chat at

DPark is a Python clone of Spark, MapReduce(R) alike computing framework supporting iterative computation.


## Due to the use of C extensions, some libraries need to be installed first.

$ sudo apt-get install libtool pkg-config build-essential autoconf automake
$ sudo apt-get install python-dev
$ sudo apt-get install libzmq-dev

## Then just pip install dpark (``sudo`` maybe needed if you encounter permission problem).

$ pip install dpark

Example for word counting (

from dpark import DparkContext
ctx = DparkContext()
file = ctx.textFile("/tmp/words.txt")
words = file.flatMap(lambda x:x.split()).map(lambda x:(x,1))
wc = words.reduceByKey(lambda x,y:x+y).collectAsMap()
print wc

This script can run locally or on a Mesos cluster without any modification, just using different command-line arguments:

$ python
$ python -m process
$ python -m host[:port]

See examples/ for more use cases.

Some more docs (in Chinese):

DPark can run with Mesos 0.9 or higher.

If a $MESOS_MASTER environment variable is set, you can use a shortcut and run DPark with Mesos just by typing

$ python -m mesos

$MESOS_MASTER can be any scheme of Mesos master, such as

$ export MESOS_MASTER=zk://zk1:2181,zk2:2181,zk3:2181/mesos_master

In order to speed up shuffling, you should deploy Nginx at port 5055 for accessing data in DPARK_WORK_DIR (default is /tmp/dpark), such as:

server {
        listen 5055;
        server_name localhost;
        root /tmp/dpark/;

Mailing list: (