Skip to content

Build a pipeline flow

Arnaud S'Jongers edited this page Jan 8, 2018 · 7 revisions

Think about the dependencies between the images and/or manual actions in your pilot case. Which container depends on another container? Which (manual) actions must be executed before a container can start? Example dependencies could be:

  • The Spark master needs to be started before the Spark worker such that the Spark worker can register itself at the Spark master.
  • The input data needs to be loaded in HDFS before the Map Reduce algorithm starts computing.

Based on these dependencies construct a pipeline flow. The flow determines the order of the services that needs to be started and the actions that need to be executed. For example, in the demo application the flow is:

  1. Start HDFS
  2. Start Spark
  3. Put input file on HDFS
  4. Compute aggregations
  5. Get output from HDFS

You can configure your pipeline flow using the Pipeline Builder engine packaged with the integrator-ui. Browse to http://integrator-ui.big-data-europe.aksw.org (your integrator-ui instance) then click on "Workflow builder". You can then create your workflow and the steps needed. The init-daemon should then be able to manage the workflow as long as your services make sure to check & report their statuses.


Implementing Pilot on BDE Stack

  1. List the required images and dependencies
  2. Add support for the init daemon
  3. Build a pipeline flow
  4. Create a configuration for the Integrator UI
  5. Construct a docker-compose.yml file
Clone this wiki locally