`Netezza Connector for Apache Spark`

A data source library to load data into Apache Spark using SQL DataFrames from IBM® Netezza® database. This data source library is implemented using Netezza external table mechanism to transfer data from the Netezza host system to the Apache Spark system optimally.

Binary download:

You can download the Netezza Connector for Apache Spark assembly jars from here:

Apache Spark Version	Release #	Binary Location
1.5.2+	v0.1.1	[spark-netezza-assembly-0.1.1.jar] (https://github.com/SparkTC/spark-netezza/releases/download/v0.1.1/spark-netezza-assembly-0.1.1.jar)

Usage

Specify Package

Spark-netezza connector is a registered package on spark-packages site, so it supports command line option --packages to resolve maven dependencies. You can use spark-shell, spark-sql, pyspark or spark-submit based on your scenario to invoke spark-netezza connector. Dependency on netezza jdbc driver needs to be provided.

For example, for spark-shell:

spark-shell --packages com.ibm.SparkTC:spark-netezza_2.10:0.1.1 --driver-class-path /path/to/nzjdbc.jar

Data Sources API

You can use spark-netezza connector via the Apache Spark Data Sources API in Scala, Java, Python or SQL, as follows:

Scala

import org.apache.spark.sql._

val sc = // existing SparkContext
val sqlContext = new SQLContext(sc)

val opts = Map("url" -> "jdbc:netezza://netezzahost:5480/database",
        "user" -> "username",
        "password" -> "password",
        "dbtable" -> "tablename",
        "numPartitions" -> "4")
val df = sqlContext.read.format("com.ibm.spark.netezza")
  .options(opts).load()

Java

import org.apache.spark.sql.*;

JavaSparkContext sc = // existing SparkContext
SQLContext sqlContext = new SQLContext(sc);

Map<String, String> opts = new HashMap<>();
opts.put("url", "jdbc:netezza://netezzahost:5480/database");
opts.put("user", "username");
opts.put("password", "password");
opts.put("dbtable", "tablename");
opts.put("numPartitions", "4");
DataFrame df = sqlContext.read()
             .format("com.ibm.spark.netezza")
             .options(opts)
             .load();

Python

from pyspark.sql import SQLContext

sc = # existing SparkContext
sqlContext = SQLContext(sc)

df = sqlContext.read \
  .format('com.ibm.spark.netezza') \
  .option('url','jdbc:netezza://netezzahost:5480/database') \
  .option('user','username') \
  .option('password','password') \
  .option('dbtable','tablename') \
  .option('numPartitions','4') \
  .load()

SQL

CREATE TABLE my_table
USING com.ibm.spark.netezza
OPTIONS (
  url 'jdbc:netezza://netezzahost:5480/database',
  user 'username',
  password 'password',
  dbtable 'tablename'
);

Configuration Overview

Configuration can be passed on DataFrame using option:

Name	Description
url	The url to connect with JDBC driver.
user	Username to connect to the database.
password	Password to connect to the database.
dbtable	The table in the database to load.
numPartitions	Number of partitions to specify to parallely execute data movement.

Building From Source

Scala 2.10

spark-netezza build supports Scala 2.10 by default, if Scala 2.11 artifact is needed, please refer to Scala 2.11 or Version Cross Build

Building General Artifacts

To generate regular binary, in the root directory run:

build/sbt package

To generate assembly jar, in the root directory run:

build/sbt assembly

The artifacts will be generated to:

spark-netezza/target/scala-{binary.version}/

To run the tests, in the root directory run:

build/sbt test

Scala 2.11

To build against scala 2.11, use '++' option with desired version number, for example:

build/sbt ++2.11.7 assembly
build/sbt ++2.11.7 test

Version Cross Build

To produce artifacts for both scala 2.10 and 2.11:

Start SBT:

 build/sbt

Run in the SBT shell:

 + package
 + test

Using sbt-spark-package Plugin

spark-netezza connector supports sbt-spark-package plugin, to publish to local ivy repository, run:

build/sbt spPublishLocal

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github		.github
build		build
dev		dev
project		project
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.sbt		build.sbt
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`Netezza Connector for Apache Spark`

Binary download:

Usage

Specify Package

Data Sources API

Scala

Java

Python

SQL

Configuration Overview

Building From Source

Scala 2.10

Building General Artifacts

Scala 2.11

Version Cross Build

Using sbt-spark-package Plugin

About

Releases

Packages

Contributors 3

Languages

License

CODAIT/spark-netezza

Folders and files

Latest commit

History

Repository files navigation

Netezza Connector for Apache Spark

Binary download:

Usage

Specify Package

Data Sources API

Scala

Java

Python

SQL

Configuration Overview

Building From Source

Scala 2.10

Building General Artifacts

Scala 2.11

Version Cross Build

Using sbt-spark-package Plugin

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

`Netezza Connector for Apache Spark`

Packages