GitHub - ccl0326/dpark: Python clone of Spark, a MapReduce alike framework in Python

ccl0326 / dpark Public

forked from douban/dpark

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Python clone of Spark, a MapReduce alike framework in Python

BSD-3-Clause license

1 star 534 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dpark		dpark
examples		examples
tests		tests
tools		tools
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
README		README
TODO		TODO
setup.py		setup.py

Repository files navigation

Dpark is a Python clone of Spark, MapReduce computing 
framework supporting regression computation.

Word count example wc.py:

 from dpark import DparkContext
 ctx = DparkContext()
 file = ctx.textFile("/tmp/words.txt")
 words = file.flatMap(lambda x:x.split()).map(lambda x:(x,1))
 wc = words.reduceByKey(lambda x,y:x+y).collectAsMap()
 print wc

This scripts can run locally or on Mesos cluster without
any modification, just with different command arguments:

$ python wc.py
$ python wc.py -m process
$ python wc.py -m mesos

See examples/ for more examples.

Some Chinese docs: https://github.com/jackfengji/test_pro/wiki