Skip to content

SharpData/SharpETL

Repository files navigation

Sharp ETL

Sharp ETL is an ETL framework that simplifies writing and executing ETLs by simply writing SQL workflow files. The SQL workflow file format is combined your favorite SQL dialects with just a little bit of configuration.

Getting started

Let's start a sharp ETL system db first

docker run --name sharp_etl_db -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=root -e MYSQL_DATABASE=sharp_etl mysql:5.7

build from source or download jar from releases

./gradlew buildJars -PscalaVersion=2.12 -PsparkVersion=3.3.0 -PscalaCompt=2.12.15

Take a look at hello_world.sql

cat spark/src/main/resources/tasks/hello_world.sql

you will see the following contents:

-- workflow=hello_world
--  loadType=incremental
--  logDrivenType=timewindow

-- step=define variable
-- source=temp
-- target=variables

SELECT 'RESULT' AS `OUTPUT_COL`;

-- step=print SUCCESS to console
-- source=temp
-- target=console

SELECT 'SUCCESS' AS `${OUTPUT_COL}`;

Run and check the console output

spark-submit --master local --class com.github.sharpdata.sharpetl.spark.Entrypoint spark/build/libs/sharp-etl-spark-standalone-3.3.0_2.12-0.1.0.jar single-job --name=hello_world --period=1440 --default-start-time="2022-07-01 00:00:00" --once --local

And you will see the output like:

== Physical Plan ==
*(1) Project [SUCCESS AS RESULT#17167]
+- Scan OneRowRelation[]
root
 |-- RESULT: string (nullable = false)

+-------+
|RESULT |
+-------+
|SUCCESS|
+-------+

Versions and dependencies

The compatible versions of Spark are as follows:

Spark Scala
2.3.x 2.11
2.4.x 2.11 / 2.12
3.0.x 2.12
3.1.x 2.12
3.2.x 2.12 / 2.13
3.3.x 2.12 / 2.13
3.4.x 2.12 / 2.13
3.5.x 2.13

License

FOSSA Status