Pypsark Proxy

Under active development. Do not use for production use.

Seamlessly execute pyspark code on remote clusters.

Features

100% compatibility with Pyspark API (Just change the imports)
Structure code however you see fit
No need to copy files to the cluster
Resumable Sessions
Simple installation
Seamless integration with other tools such as Jupyter and Matplotlib

How it works

Pyspark proxy is made of up a client and server. The client mimics the pyspark api but when objects get created or called a request is made to the API server. The calls the API server receives then calls the actual pyspark APIs.

Documenation

Wiki

Getting Started

Pyspark Proxy requires set up a server where your Spark is located and simply install the package locally where you want to execute code from.

On Server

Install pyspark proxy via pip:

pip install pysparkproxy

Start the server:

pyspark-proxy-server start

The server listens on localhost:8765 by default. Check the pyspark-proxy-server help for additional options.

Locally

Install pyspark proxy via pip:

pip install pysparkproxy

Now you can start a spark context and do some dataframe operations.

from pyspark_proxy import SparkContext
from pyspark_proxy.sql import SQLContext

sc = SparkContext(appName='pyspark_proxy_app')

sc.setLogLevel('ERROR')

sqlContext = SQLContext(sc)

df = sqlContext.read.json('my.json')

print(df.count())

Then use the normal python binary to run this python my_app.py. This code works the same if you were to run it via spark-submit on the server.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
bin		bin
examples		examples
pyspark_proxy		pyspark_proxy
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
Dockerfile_base		Dockerfile_base
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pypsark Proxy

Features

How it works

Documenation

Getting Started

On Server

Locally

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

abronte/PysparkProxy

Folders and files

Latest commit

History

Repository files navigation

Pypsark Proxy

Features

How it works

Documenation

Getting Started

On Server

Locally

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages