BlueSky

BlueSky is based on a pentaho-kettle metadata, Spark as a low-level ETL platform, pentaho-kettle created tasks can be run on Spark, a pentaho-kettle big data version of the plug-in. At the same time, it also supports the spark streaming data fusion scenario. Currently, it supports kafka as a data pipeline for spark streaming. The core idea of this platform is to use the pentaho-kettle metadata as the Spark running parameter. The core architecture is inspired by the pentaho-kettle architecture. DATAFRAM is used as the data reference for the flow between components. The JOB submission based on the stream processing architecture is serial. The JOB run is distributed. This platform needs to translate the components of pentaho-kettle. The currently translated components are switchcase, tableoutput, filterrows, update, delete, kafkaconsumer, deleterow, input, output, multiwaymergejoin, selectvalues, convergence, kafkaproducer, TableInput, groupbydata, mergejoin.

The flow chart is as follows：

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bluesky-pentaho-kettle-plugin		bluesky-pentaho-kettle-plugin
bluesky-pentaho-kettle		bluesky-pentaho-kettle
bluesky-spark		bluesky-spark
img-folder		img-folder
BlueSky.iml		BlueSky.iml
README.md		README.md
bluesky.iml		bluesky.iml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BlueSky

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

BlueCodeBoy/bluesky

Folders and files

Latest commit

History

Repository files navigation

BlueSky

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages