Spark-ETL-Pipeline

This project is an example for reading data from kafka as a Spark DataFrame and writing objects from Spark into hive database after performing transformation.

Version Compatibility

Scala	Spark	sbt
2.11.12	2.4.0	1.3.13

Build from Source

$ sbt assembly

Run

in order to run this application in kerberos enabled environment use the following command. you have to create your jaas.config file based on your production configuration.

spark-submit
 --deploy-mode cluster
 --files "spark_jaas.conf#spark_jaas.conf,your_keytabfile..keytab#your_keytabfile..keytab"
 --driver-java-options "-Djava.security.auth.login.config=./spark_jaas.conf"
 --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./spark_jaas.conf"
 --conf spark.yarn.submit.waitAppCompletion=false
 --driver-memory 16G
 --name SPARK_ETL
 --files config.ini,log4j.properties,spark_jaas.conf,your_keytabfile..keytab
 path_to_your_jar_file.jar path_to_log4j.properties path_to_config.ini

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
project		project
src		src
.gitignore		.gitignore
JenkinsFile		JenkinsFile
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark-ETL-Pipeline

Version Compatibility

Build from Source

Run

About

Releases

Packages

Languages

alisheykhi/Spark-ETL-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Spark-ETL-Pipeline

Version Compatibility

Build from Source

Run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages