Skip to content

Latest commit



98 lines (85 loc) · 2.38 KB

File metadata and controls

98 lines (85 loc) · 2.38 KB


Build the image

This image has ONBUILD statement, you should build your own image based on it.

The ONBUILD statements will copy the contents in the .ssh directory, relative to current location, to /root/.ssh in the image. You can use your own SSH RSA keys, or generate with the following snippet.

mkdir .ssh
ssh-keygen -P '' -f .ssh/id_rsa
cat .ssh/ > .ssh/authorized_keys

Create the container

docker run -d \
  -p 50010:50010 -p 50020:50020 -p 50070:50070 -p 50075:50075 \
  -p 50100-50200:50100-50200 \
  -p 8032:8032 -p 8042:8042 -p 8088:8088 -p 9000:9000 -p 19888:19888 \
  --volume /your/hadoop/root:/data \
  --name hadoop --hostname hadoop \
  hadoop --format-namenode

If you map /data to your docker host, the changes you made to this Hadoop instance can be persisted. --format-namenode will do hdfs namenode -format. If you want persistent storage, you should not specify it when you run the container a 2nd time.

Change /user directory permission

hdfs dfs -chmod 1773 /user
hdfs dfs -chmod 1773 /user/history

Configure the client



Replace hadoop-host with the real host name returned by your docker.



Configure the hosts file on your OS, so you can access your hadoop host using its host name, instead of the IP. This is necessary if your docker engine is in a virtual machine, e.g. the Docker Tool Kit on Windows/Mac




Verify if the Hadoop instance is working

  1. Create a directory for your user.
hdfs dfs -mkdir /user/hadoop-user
  1. Upload some text file as input
hdfs dfs -put /path/to/text/ input
  1. Run map/reduce job as a client
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs.+'
  1. Get the output
hdfs dfs -get output output
ls output
  1. Check the job log from YARN
yarn logs -applicationId application_id_number