Skip to content

A simple wrapper around Apache Spark to submit spark jobs

Notifications You must be signed in to change notification settings

dsmiff/sparklight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A simple wrapper around Apache Spark spark-submit command.

Description

sparklight is a library for submitting Spark jobs either locally or to a cluster.

It is designed to provide easy access for setting up and submitting Spark jobs, removing the complexity of the command-line-interface.

Requirements

  • Apache Spark
  • pyspark

Installation

Clone this repository then run the setup.sh.

git clone git@github.com:dsmiff/sparklight.git
./setup.sh

Examples

  • examples/cars_submit.py: Submits a simple spark job to perform a groupBy on the cars.csv dataset

TODO

  • Submit to cluster functionality
  • HDFS interface
  • DAG jobs

About

A simple wrapper around Apache Spark to submit spark jobs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published