This image has ONBUILD statement, you should build your own image based on it.
The ONBUILD statements will copy the contents in the .ssh
directory, relative to current location, to /root/.ssh
in the
image. You can use your own SSH RSA keys, or generate with the
following snippet.
mkdir .ssh
ssh-keygen -P '' -f .ssh/id_rsa
cat .ssh/id_rsa.pub > .ssh/authorized_keys
docker run -d \
-p 50010:50010 -p 50020:50020 -p 50070:50070 -p 50075:50075 \
-p 50100-50200:50100-50200 \
-p 8032:8032 -p 8042:8042 -p 8088:8088 -p 9000:9000 -p 19888:19888 \
--volume /your/hadoop/root:/data \
--name hadoop --hostname hadoop \
hadoop --format-namenode
If you map /data
to your docker host, the changes you made to
this Hadoop instance can be persisted. --format-namenode
will do
hdfs namenode -format
. If you want persistent storage, you should
not specify it when you run the container a 2nd time.
hdfs dfs -chmod 1773 /user
hdfs dfs -chmod 1773 /user/history
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-host:9000</value>
</property>
Replace hadoop-host
with the real host name returned by your docker.
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
Configure the hosts file on your OS, so you can access your hadoop host using its host name, instead of the IP. This is necessary if your docker engine is in a virtual machine, e.g. the Docker Tool Kit on Windows/Mac
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-host</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/user/logs</value>
</property>
- Create a directory for your user.
hdfs dfs -mkdir /user/hadoop-user
- Upload some text file as input
hdfs dfs -put /path/to/text/ input
- Run map/reduce job as a client
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs.+'
- Get the output
hdfs dfs -get output output
ls output
- Check the job log from YARN
yarn logs -applicationId application_id_number