Skip to content

Latest commit

 

History

History
58 lines (44 loc) · 2.06 KB

README.md

File metadata and controls

58 lines (44 loc) · 2.06 KB

Introduction - hadoop-docker

This is the candidate release for a docker based HADOOP setup with the following type of services:

  • namenode
  • datanode
  • nodemanager
  • resourcemanager
  • historyserver

The docker based setup can be used in two scenarios:

  • docker-compose setup only for debugging. In this scenario all containers are running on the same host but there is only one copy of the datanode service and hence no data replication, no multiprocessing, etc.
  • swarm based - with services deployed on a swarm of docker nodes - each node running at least one datanode and nodemanager - while one of the nodes run the namenode and resourcemanager.

Building

For building the whole project (docker images) from scratch, one needs to perform the following steps:

  1. Go to the base folder and issue
docker build [-t repo:tag] . 
cd ..

This will ensure you can directly push the resulting image to a docker image res=pository (like docker hub).

  1. Build the rest of the images by issuing:
docker compose build 

This will build all images accoring to the definitions in docker-compose.yml

Running the setup

  1. Running in compose mode (for debugging) should be done by issuing:
docker compuse up 

which will launch one instance of each container on the local machine. (the datanode can be scaled using "docker-composer scale node=x" but all datanodes will use the same storage volume which makes it useless and error prone).

docker compose down 

do shutdown the setup.

  1. Running in swarm mode requires a swarm already created with at least 2-3 nodes. Generally once the container images are either downloaded or built - one can issue the following command sequence to start the swarm (on a manager node):
docker network create -d overlay hadoop
docker stack create -c docker-compose-v3-swarm.yml

#list the started services
docker stack services hadoop

#shutdown swarm hadoop setup
docker stack rm hadoop

This work was inspired by the https://github.com/big-data-europe/docker-hadoop project.