Skip to content
A Python implementation of Spark's Python API, but on a single machine.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE
README.md
park.py

README.md

Park

A Python implementation of Spark's Python API, but working on a single machine.

Motivation

Programming directly on Spark is a hard task:

  • The behavior of Spark is hard to understand;
  • The error message is hard to decipher;
  • There is few debugging tools;
  • The test is time consuming.

With this package, you can get rid of all these problems. Use this package to prototype or experiment on a single machine.

  • Your code runs fast.
  • The error message is stardard Python error message, and thus easy to understand.
  • You have automatically all debugging tools, which you are used to using.
  • Any further doubt? Just peek into this package's source code. There are no "magic" inside.

As soon as your program runs correctly, use Spark to deploy it on a cluster. Your code needs no or little modification.

Usage

import park as sc
sc.parallelize()

Note

If you ever use the mapPartitions() function, you will need to do a mild modification when you pass to Spark. For every function which you pass to Spark's mapPartitions() function, you should add a [] around the returned value(s).

You can’t perform that action at this time.