Skip to content

Dockerfile Set-up to add dependencies into `spark-custom` images

License

Notifications You must be signed in to change notification settings

dsaidgovsg/spark-custom-addons

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-custom-addons

CI Status

Experimental set-up to add dependencies onto spark-custom Docker images. Builds for both Debian and Alpine.

This adds the following:

  • AWS Hadoop SDK JAR
    • Appends spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem into spark-defaults.conf
  • Google Cloud Storage SDK JAR
  • MariaDB JDBC Connector JAR

Additionally, all Alpine builds have gcompat and libc6-compat installed to prevent glibc shared library related issues.

AWS Java SDK Version Derivation

The version of AWS Java SDK is dependent on the Hadoop version. An example of how to derive this version number for Hadoop 3.1.0 is here:

https://github.com/apache/hadoop/blob/release-3.1.0-RC0/hadoop-project/pom.xml#L137

How to Apply Template for CI build

For Linux user, you can download Tera CLI v0.4 at https://github.com/guangie88/tera-cli/releases and place it in PATH.

Otherwise, you will need cargo, which can be installed via rustup.

Once cargo is installed, simply run cargo install tera-cli --version=^0.4.0.

About

Dockerfile Set-up to add dependencies into `spark-custom` images

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published