Skip to content

Construct a docker compose.yml file

Oncilla edited this page Apr 19, 2020 · 7 revisions

In the last step we will combine all the parts that are configured in the previous steps in a Docker Compose file. Before starting make sure you know the basics of the YAML format and Docker Compose. Next, use a copy of the docker-compose.yml that is available in the BDE Pipeline repository as a starting point for your pilot case's YAML file.

First, add all the services you need for your pilot case under 'services' based on the image list created in step 1. Don't remove the services that are already configured in the YAML file. For each image you find a Docker Compose snippet which you can copy-paste into your docker-compose.yml in the image's README. If this should not be the case, contact the component responsible.

Next, configure an INIT_DAEMON_STEP environment variable for each service that needs to communicate with the init daemon service. The value of the variable must be the code of the corresponding step as configured in your flow in step 4. A service can have only one INIT_DAEMON_STEP configured. This can be done in your Docker Compose file as follows:

services:    
  demo:   
    image: bde2020/demo-spark-sensor-data:2.0.0
    environment:
      INIT_DAEMON_STEP: compute_aggregations   

It might be that some steps in your flow don't match a service in the docker-compose.yml. These are steps that will need to be finished manually by the pipeline executor when running the pipeline. It might also be the case that some services in the docker-compose.yml don't have an INIT_DAEMON_STEP configured. These are services that can start immediately without dependency on another service or action.

Finally, perform all the necessary steps to add the desired features to the environment, such as logging, cpu stats, etc..

# Triplestore database that acts as the single source of truth.
database:
  image: tenforce/virtuoso:1.3.0-virtuoso7.2.2
  environment:
    SPARQL_UPDATE: "true"
    DEFAULT_GRAPH: "http://mu.semte.ch/application"
  volumes:
    - ./data/db:/data
    - ./config/toLoad:/data/toLoad

# Logs CPU stats and docker container events.
swarm-logger:
  image: bde2020/mu-swarm-logger-service:latest
  links:
    - database:database
  volumes:
    - /var/run/docker.sock:/var/run/docker.sock

# All event-query, docker-watcher, har-transformation & elasticsearch
# are necessary for HTTP logging.
event-query:
  image: bde2020/mu-event-query-service
  links:
    - database:database
  volumes:
    - ./containers:/usr/src/app/containers/

docker-watcher:
  image: bde2020/mu-docker-watcher-service
  volumes:
    - ./config/supervisord/supervisord.conf:/etc/supervisord.conf
    - ./containers:/app/containers
    - ./pcap:/app/pcap/
  network_mode: host
  environment:
    PCAP_READ_DIR: '/pcap'

har-transformation:
  image: bde2020/mu-har-transformation-service
  volumes:
    - ./pcap:/app/pcap
    - ./har:/app/har
    - ./containers:/app/containers
    - ./backups:/app/backups
  links:
    - elasticsearch:elasticsearch
  environment:
    BU_DIR: "/app/backups"

elasticsearch:
  image: elasticsearch:2.4.6
  command: elasticsearch -D network.host=0.0.0.0

spark-master:
  image: bde2020/spark-master:2.2.0-hadoop2.7
  container_name: spark-master
  ports:
    - "8080:8080"
    - "7077:7077"
  environment:
      VIRTUAL_HOST: "spark-master.big-data-europe.aksw.org"
      VIRTUAL_PORT: "8080"
      INIT_DAEMON_STEP: "setup_spark"
      constraint: "node==<yourmasternode>"
      LOG: "true" # Log container's docker events into the database.
      logging: "true" # Log container's HTTP traffic.


(... Spark Worker1, Spark Worker2, etc...)

That's it. You now have a Docker Compose pipeline that can run on the BDE platform!


Implementing Pilot on BDE Stack

  1. List the required images and dependencies
  2. Add support for the init daemon
  3. Build a pipeline flow
  4. Create a configuration for the Integrator UI
  5. Construct a docker-compose.yml file
Clone this wiki locally