Skip to content
A Python implementation of Spark's Python API, but on a single machine.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


A Python implementation of Spark's Python API, but working on a single machine.


Programming directly on Spark is a hard task:

  • The behavior of Spark is hard to understand;
  • The error message is hard to decipher;
  • There is few debugging tools;
  • The test is time consuming.

With this package, you can get rid of all these problems. Use this package to prototype or experiment on a single machine.

  • Your code runs fast.
  • The error message is stardard Python error message, and thus easy to understand.
  • You have automatically all debugging tools, which you are used to using.
  • Any further doubt? Just peek into this package's source code. There are no "magic" inside.

As soon as your program runs correctly, use Spark to deploy it on a cluster. Your code needs no or little modification.


import park as sc


If you ever use the mapPartitions() function, you will need to do a mild modification when you pass to Spark. For every function which you pass to Spark's mapPartitions() function, you should add a [] around the returned value(s).

You can’t perform that action at this time.