Skip to content

basho/spark-riak-connector

Repository files navigation

Spark-Riak Connector Build Status

The Spark-Riak connector enables you to connect Spark applications to Riak KV and Riak TS with the Spark RDD and Spark DataFrames APIs. You can write your app in Scala, Python, and Java. The connector makes it easy to partition the data you get from Riak so multiple Spark workers can process the data in parallel and it has support for failover if a Riak node goes down while your Spark job is running.

Features

  • Construct a Spark RDD from a Riak KV bucket with a set of keys
  • Construct a Spark RDD from a Riak KV bucket by using a 2i string index or a set of indexes
  • Construct a Spark RDD from a Riak KV bucket by using a 2i range query or a set of ranges
  • Map JSON formatted data from Riak KV to user defined types
  • Save a Spark RDD into a Riak KV bucket and apply 2i indexes to the contents
  • Construct a Spark Dataframe from a Riak TS table using range queries and schema discovery
  • Save a Spark Dataframe into a Riak TS table
  • Construct a Spark RDD using Riak KV bucket's enhanced 2i query (a.k.a. full bucket read)
  • Perform parallel full bucket reads from a Riak KV bucket into multiple partitions

Compatibility

  • Riak TS 1.3.1+
  • Apache Spark 1.6+
  • Scala 2.10 and 2.11
  • Java 8

Coming Soon

  • Support for Riak KV 2.3 and later

Prerequisites

In order to use the Spark-Riak connector, you must have the following installed:

Spark-Riak Connector

Mailing List

The Riak Users Mailing List is highly trafficked and a great resource for technical discussions, Riak issues and questions, and community events and announcements.

We pride ourselves on answering every email that comes over the Riak User mailing list. Sign up and send away. If you prefer points for your questions, you can always tag Riak on StackOverflow.

IRC

The #riak IRC room on irc.freenode.net is a great place for real-time help with your Riak issues and questions.

Reporting Bugs

To report a bug or issue, please open a new issue against this repository.

You can read the full guidelines for bug reporting on the Riak Docs.

Contributing

Basho encourages contributions to the Spark-Riak Connector from the community. Here’s how to get started.

  • Fork the appropriate project that is affected by your change.
  • Make your changes and run the test suite.
  • Commit your changes and push them to your fork.
  • Open pull-requests for the appropriate projects.
  • Basho engineers will review your pull-request, suggest changes, and merge it when it’s ready and/or offer feedback.

License

Copyright © 2016 Basho Technologies

Licensed under the Apache License, Version 2.0