README

Build the image

This image has ONBUILD statement, you should build your own image based on it.

The ONBUILD statements will copy the contents in the .ssh directory, relative to current location, to /root/.ssh in the image. You can use your own SSH RSA keys, or generate with the following snippet.

mkdir .ssh
ssh-keygen -P '' -f .ssh/id_rsa
cat .ssh/id_rsa.pub > .ssh/authorized_keys

Create the container

docker run -d \
  -p 50010:50010 -p 50020:50020 -p 50070:50070 -p 50075:50075 \
  -p 50100-50200:50100-50200 \
  -p 8032:8032 -p 8042:8042 -p 8088:8088 -p 9000:9000 -p 19888:19888 \
  --volume /your/hadoop/root:/data \
  --name hadoop --hostname hadoop \
  hadoop --format-namenode

If you map /data to your docker host, the changes you made to this Hadoop instance can be persisted. --format-namenode will do hdfs namenode -format. If you want persistent storage, you should not specify it when you run the container a 2nd time.

Change /user directory permission

hdfs dfs -chmod 1773 /user
hdfs dfs -chmod 1773 /user/history

Configure the client

core-site.xml

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://hadoop-host:9000</value>
</property>

Replace hadoop-host with the real host name returned by your docker.

hdfs-site.xml

  <property>
    <name>dfs.client.use.datanode.hostname</name>
    <value>true</value>
  </property>

Configure the hosts file on your OS, so you can access your hadoop host using its host name, instead of the IP. This is necessary if your docker engine is in a virtual machine, e.g. the Docker Tool Kit on Windows/Mac

yarn-site.xml

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>hadoop-host</value>
</property>

<property>
  <name>yarn.nodemanager.remote-app-log-dir</name>
  <value>/user/logs</value>
</property>

Verify if the Hadoop instance is working

Create a directory for your user.

hdfs dfs -mkdir /user/hadoop-user

Upload some text file as input

hdfs dfs -put /path/to/text/ input

Run map/reduce job as a client

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs.+'

Get the output

hdfs dfs -get output output
ls output

Check the job log from YARN

yarn logs -applicationId application_id_number

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
opt/hadoop/etc/hadoop		opt/hadoop/etc/hadoop
root		root
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt/hadoop/etc/hadoop

opt/hadoop/etc/hadoop

root

root

.gitignore

.gitignore

Dockerfile

Dockerfile

README.md

README.md

Repository files navigation

README

Build the image

Create the container

Change /user directory permission

Configure the client

core-site.xml

hdfs-site.xml

yarn-site.xml

Verify if the Hadoop instance is working

About

Releases

Packages

Contributors 2

Languages

davidshen84/docker-hadoop

Folders and files

Latest commit

History

Repository files navigation

README

Build the image

Create the container

Change /user directory permission

Configure the client

core-site.xml

hdfs-site.xml

yarn-site.xml

Verify if the Hadoop instance is working

About

Resources

Stars

Watchers

Forks

Languages