GitHub - devang93/Spark_Oracle_Data_Transfer: Sample project

Spark Application to Load data from Oracle Databases to Azure DataLake Storage. It uses Apache Spark JDBC read capability with Oracle JDBC drivers. Data is written to ADLS in parquet format with snappy codec.

Dependencies: "org.apache.spark" %% "spark-core" % "2.1.0" % "provided" "org.apache.spark" %% "spark-sql" % "2.1.0" % "provided" "com.github.scopt" %% "scopt" % "3.3.0" (included) "ojdbc7.jar" (included)

Run Command:

spark-submit --master <spark-master> --class OracleSparkTransfer.Main Spark_Oracle_Data_Transfer-assembly-1.0.0.jar
--user <oracle Username>
--password <oracle Password>
--dbHostPort <oracle host:port (e.g. 127.0.0.1:8000)>
--dbName <oracle Database Name>
--query <oracle sql query to pull data>
--adls <Azure Data Lake Store URI (e.g. adl://mydatalakestore.azuredatalakestore.net)>
--tenantId <Azure Tenant ID>
--spnClientId <Azure App Service Principal ApplicationID>
--spnClientSecret <Azure App Service Principal Secret/Password>
--outputPath <ADLS storage folder path>
--writeMode <append | overwrite (default)> (optional)
--numPartitions <spark parallel reads for oracle>
--job_run_id <if job running daily then an run_id to identify individual runs> (optional)
--exec_date <if job running daily then current execution_date for the run> (optional)
--prev_exec_date <if job running daily then previous_execution_date for last run> (optional)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
lib		lib
project		project
src/main		src/main
.gitignore		.gitignore
README.MD		README.MD
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages