This is an experimental Kylo deployment, not officially supported. Tested on ingesting userdata2.csv via standard ingest. Hadoop namenode, hive server and spark master are in separate container now. Nifi and Kylo are in separate containers now. Both kylo and nifi images are based on common "hadoopclient" image. Version 3.5 is updated and build from github sources on 20170713 (ver. 0.8.2. official release day)
docker volume rm kstack_nifilib
The work is based on Keven Wang's Kylo in Docker: https://github.com/keven4ever/kylo_docker
This project aims to dockerize Kylo deployment from source so that the adjacent services are in separate containers and Kylo container is built from NiFi container. As a goal, Kylo image docker build from sources should be matter of seconds rather than minutes or more...
Each layer should contain only the necessary minimums of settings needed to run with Kylo.
Containers schema:
Current status is as per slide above (separate containers for hadoop services and for NiFi)
- Change "vm.max_map_count" kernel varialble in the VM running docker daemon: https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#docker-cli-run-prod-mode. So if you are using macbook with Docker for Mac installed (note, docker for mac https://docs.docker.com/docker-for-mac/install/ is different from previous generation of docker on mac which is Docker machine https://docs.docker.com/machine/), then you can follow steps below
# Launch a terminal in your macbook and start a screen session to connect to the VM which is the host of docker containers
screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty
# Click "enter" once to get shell prompt, then type in shell command "login"
/ # login
# input login credential, root/<empty password>
moby login: root
# Input empty pwd by click enter
# Set the parameter
moby:~#sysctl -w vm.max_map_count=262144
# Then you can check if this parameter is set:
moby:~# sysctl vm.max_map_count
# You should see sth like below
vm.max_map_count = 262144
# At the end exit the screen session by type in Ctrl-A + D to exist the screen session, Hold Ctrl and A together then click D.
- Tune Docker instance
Increase memory dedicated for Docker (Preferences -> Advanced, currently 9G)
- Login to Docker Hub
docker login -u dockerhub_username -p dockerhub_passwd
-
- download docker-compose_3_5.yml from danmalczyk/kdoc GitHub repo
- in the directory where docker-compose_3_5.yml is, create shared mountpoint for Kylo container:
wget https://github.com/danmalczyk/kdoc/blob/master/docker-compose_3_5.yml
mkdir -p ./kylo-stack-mountpoints/kyloshare
-
- download Makefile from danmalczyk/kdoc GitHub repo and run a task to download all the images needed:
wget https://github.com/danmalczyk/kdoc/blob/master/Makefile
make fetch-stable
- OR pull all the Kylo Stack images manually
docker pull docker.elastic.co/elasticsearch/elasticsearch:5.4.1
docker pull rmohr/activemq:5.13.3
docker pull mariadb:10.0
docker pull dmalczyk/kstack-hadoopservice:3.5
docker pull dmalczyk/kstack-nifi:3.5
docker pull dmalczyk/kstack-kylo:3.5
- Init docker swarm
docker swarm init #first-time init, no need to reissue
- Deploy Kylo stack
docker stack deploy -c docker-compose_3_5.yml kstack
-
Open Kylo from browser at localhost:8400 ("docker ps" must show 6 running containers, Kylo takes up to 15mins to start)
-
For testing, use kylo template ./kstack-nifi/sample_data/data_ingest.zip , as the URLs must be changed according to the new hostnames, see below. Keep that in mind when creating/modifying further templates.
-
If anything goes wrong, remove stack, find the cause and start again from point 7.
docker stack ps --no-trunc kstack
docker stack rm kstack
remember to use service names instead of localhost when calling different services:
kylo - for Kylo UI, Kylo services and Kylo Spark shell
nifi - for NiFi
mariadb - for the DB
elasticsearch - for ElasticSearch
hadoopservice - for HDFS, Hive and Spark master.
docker swarm init
Builds a kylo image with the latest kylo src. Check kylo dev readme
cd kylo-dev
make
cd -
Don't forget to fetch and build the images before starting.
make start
# wait up to 15min and open http://localhost:8400
Runs a stack with the latest kylo src. Check kylo dev readme
make start-dev
# wait up to 15min and open http://localhost:8400
make stop
# alternatively docker stack rm kstack
- in some image build situations, security jwt key is not copied from kylo-ui to kylo-services, in that case use --no-cache when building kstack-kylo image
- docker images should be ran with unpriviledged user (kylo/nifi)
- add feed templates to the kylo prod image
- run spark in yarn-cluster mode
- docker-compose.yml:
- change/parametrize MYSQL_ROOT_PASSWORD
- externalize mariadb data directory volume? externalize kylo and nifi volumes with user data? (maybe elasticsearch too?)
- tune Elasticsearch
- tune hadoop cluster
- automate builds from Kylo GIT repo