Spark.jl

Julia interface to Apache Spark.

See Roadmap for current status.

Installation

Spark.jl requires at least Java 7 and Maven to be installed and available in PATH.

Pkg.clone("https://github.com/dfdx/Spark.jl")
Pkg.build("Spark")
# we also need latest master of JavaCall.jl
Pkg.checkout("JavaCall")

This will download and build all Julia and Java dependencies. To use Spark.jl type:

using Spark
Spark.init()

RDD Interface: Examples

All examples below are runnable from REPL

Count lines in a text file

sc = SparkContext(master="local")
path = "file:///var/log/syslog"
txt = text_file(sc, path)
count(txt)
close(sc)

Map / Reduce on Standalone master, application name

sc = SparkContext(master="spark://spark-standalone:7077", appname="Say 'Hello!'")
path = "file:///var/log/syslog"
txt = text_file(sc, path)
rdd = map(txt, line -> length(split(line)))
reduce(rdd, +)
close(sc)

NOTE: currently named Julia functions cannot be fully serialized, so functions passed to executors should be either already defined there (e.g. in preinstalled library) or be anonymous functions.

Map partitions on Mesos and HDFS

sc = SparkContext(master="mesos://mesos-master:5050")
path = "hdfs://namenode:8020/user/hdfs/test.log"
txt = text_file(sc, path)
rdd = map_partitions(txt, it -> filter(line -> contains(line, "a"), it))
collect(rdd)
close(sc)

For the full supported API see the list of exported functions.

SQL Interface: Examples

All examples assume that you have a file people.json with content like this:

{"name": "Alice", "age": 27}
{"name": "Bob", "age": 32}

Read dataframe from JSON and collect to a driver:

spark = SparkSession()
df = read_json(spark, "/path/to/people.json")
collect(df)

Read JSON and write Parquet:

spark = SparkSession()
df = read_json(spark, "/path/to/people.json")
write_parquet(df, "/path/to/people.parquet")

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
deps		deps
examples		examples
jvm/sparkjl		jvm/sparkjl
src		src
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md
REQUIRE		REQUIRE
appveyor.yml		appveyor.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deps

deps

examples

examples

jvm/sparkjl

jvm/sparkjl

src

src

test

test

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE.md

LICENSE.md

README.md

README.md

REQUIRE

REQUIRE

appveyor.yml

appveyor.yml

Repository files navigation

Spark.jl

Installation

RDD Interface: Examples

Count lines in a text file

Map / Reduce on Standalone master, application name

Map partitions on Mesos and HDFS

SQL Interface: Examples

About

Releases

Packages

Languages

License

aviks/Spark.jl

Folders and files

Latest commit

History

Repository files navigation

Spark.jl

Installation

RDD Interface: Examples

Count lines in a text file

Map / Reduce on Standalone master, application name

Map partitions on Mesos and HDFS

SQL Interface: Examples

About

Resources

License

Stars

Watchers

Forks

Languages