Build a pipeline flow

Think about the dependencies between the images and/or manual actions in your pilot case. Which container depends on another container? Which (manual) actions must be executed before a container can start? Example dependencies could be:

The Spark master needs to be started before the Spark worker such that the Spark worker can register itself at the Spark master.
The input data needs to be loaded in HDFS before the Map Reduce algorithm starts computing.

Based on these dependencies construct a pipeline flow. The flow determines the order of the services that needs to be started and the actions that need to be executed. For example, in the demo application the flow is:

Start HDFS
Start Spark
Put input file on HDFS
Compute aggregations
Get output from HDFS

You can configure your pipeline flow using the Pipeline Builder engine packaged with the integrator-ui. Browse to http://integrator-ui.big-data-europe.aksw.org (your integrator-ui instance) then click on "Workflow builder". You can then create your workflow and the steps needed. The init-daemon should then be able to manage the workflow as long as your services make sure to check & report their statuses.

Implementing Pilot on BDE Stack

Computational frameworks
- Flink
- Spark
- Storm
Data storage
- Hadoop
- Hue HDFS File Browser
- Cassandra
- Hive
- Redis
- Virtuoso
- 4store
- PostGIS
- Zeppelin
Data acquisition
- Flume
Message passing
- Kafka
Search engines
- Elasticsearch
- Solr
Semantic components
- DEER
- EDCAT
- FOX
- GeoTriples
- Silk
- Limes
- SEMAGROW engine
- Sextant
- Strabon
- UnifiedViews

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build a pipeline flow

Home

BDE stack

Implementing pilot on BDE stack

Implementing pilot on BDI platform

Installation

Components

Clone this wiki locally