Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.env.example
.gitignore
Dockerfile
LICENSE
Makefile
README.md
main.go
plugin.go
plugin_test.go

README.md

Spark submit client plugin for Pipeline CI/CD

This repo contains a plugin that can be used to set up a spark-submit step in the Banzai Cloud Pipeline CI/CD workflow.

For better understanding of the Banzai Pipeline CI/CD workflow and PaaS please check this documentation

This plugin executes a fully configurable spark-submit step, as described here.

The plugin supports all the configuration options available for the spark-submit command.

Please note that the plugin is primarily intended to be used on Kubernetes clusters thus some configuration is automatically taken from items provided by the k8s cluster (eg.: --master)

In the Banzai Cloud Pipeline CI/CD flow definition the spark-submit step description may have three configuration sections (reflected by the names of the step elements):

  • spark_submit_options - the configuration entries that are passed in the form:
--[option] value
  • spark_submit_configs - items passed to the command in the form:
--conf [key]=[value]
  • spark_submit_app_args - a collection (list) that is passed to the command "as is" - a space delimited set of entries

The first two groups of configuration are represented as yaml maps while the last as a yaml list. All sections are built dynamically, custom configuration options, spark configuration and application arguments can be passed in following the described conventions.

Usage

For using the plugin please configure the .pipeline.yml properly, and let the magic happen.

If you need help configuring the yml please read the Readme of the related plugin, which handles the cluster related operations.

Examples

Spark-Pi

run:
   image: banzaicloud/plugin-k8s-proxy:latest
   pull: true
   service_account: spark

   original_image: banzaicloud/plugin-spark-submit-k8s:latest
   spark_submit_options:
     class: banzaicloud.SparkPi
     kubernetes-namespace: default
   spark_submit_configs:
     spark.app.name: sparkpi
     spark.local.dir: /tmp/spark-locals
     spark.kubernetes.driver.docker.image: banzaicloud/spark-driver:v2.2.0-k8s-1.0.197
     spark.kubernetes.executor.docker.image: banzaicloud/spark-executor:v2.2.0-k8s-1.0.197
     spark.kubernetes.initcontainer.docker.image: banzaicloud/spark-init:v2.2.0-k8s-1.0.197
     spark.dynamicAllocation.enabled: "true"
     spark.kubernetes.resourceStagingServer.uri: http://spark-rss:10000
     spark.kubernetes.resourceStagingServer.internal.uri: http://spark-rss:10000
     spark.shuffle.service.enabled: "true"
     spark.kubernetes.shuffle.namespace: default
     spark.kubernetes.shuffle.labels: app=spark-shuffle-service,spark-version=2.2.0
     spark.kubernetes.authenticate.driver.serviceAccountName: spark
     spark.metrics.conf: /opt/spark/conf/metrics.properties
   spark_submit_app_args:
     - target/spark-pi-1.0-SNAPSHOT.jar
     - 1000

For the full the configuration file please click here.

You can’t perform that action at this time.