Skip to content

Latest commit

 

History

History
83 lines (51 loc) · 3.24 KB

README-tez.md

File metadata and controls

83 lines (51 loc) · 3.24 KB

This cluster uses hadoop 2.4.1 and tez 0.5.0.

To bring up the cluster and install tez do this:

boot the cluster:

vagrant up

start hadoop (history server is a work in progress):

vagrant ssh master sudo prepare-cluster.sh

sudo start-dfs.sh --config $HADOOP_CONF_DIR
sudo start-yarn.sh --config $HADOOP_CONF_DIR

sudo hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done
sudo hadoop fs -chmod -R 777 /tmp/
sudo hadoop fs -mkdir -p /user/
sudo hadoop fs -chmod -R 777 /user/
sudo hadoop fs -mkdir -p /apps/
sudo hadoop fs -chmod -R 777 /apps/

sudo mr-jobhistory-daemon.sh  --config $HADOOP_CONF_DIR start historyserver
sudo yarn-daemon.sh --config $HADOOP_CONF_DIR start historyserver

compile and deploy tez:

tez deploy

export HADOOP_CLASSPATH=/vagrant/tez/tez-dist/target/tez-0.5.3/*:/vagrant/tez/tez-dist/target/tez-0.5.3/lib/*:$HADOOP_CLASSPATH

tez is a script in /opt/tools/bin which will checkout tez 0.5.0 from git into /vagrant/tez, compile it with maven (installed in /opt/tools/apache-maven-3.2.3/bin/mvn) and copy it onto HDFS in /apps/tez-0.5.0. Note that /vagrant is the checkout directory of the cluster, so you can also compile tez on the host machine and simply run tez upload on the cluster, if that is faster for you.

Now you should be read to go.

The cluster is already configured to use tez. You can cofirm that it works like so:

hadoop jar /opt/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.4.1-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1

Web Interfaces:

stop/restart hadoop:

sudo mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver
sudo yarn-daemon.sh --config $HADOOP_CONF_DIR stop historyserver
sudo stop-yarn.sh --config $HADOOP_CONF_DIR
sudo stop-dfs.sh --config $HADOOP_CONF_DIR
for host in master hadoop1 hadoop2 hadoop3; do vagrant ssh $host --command  'sudo rm -rf /srv/hadoop' ; done
vagrant provision

Notes:

This deployment is by default reliant on VMWare, VirtualBox settings are still retained, just comment out the relevant bits.

VM slave memory settings are 2G.

VMWare Fusion wants you to edit the /etc/hosts file and add the .local machines its hosting (sudo dscacheutil -flushcache)

Current vms have no password for root.

To get access to the application logs (from sftp as user vagrant), after a run

for host in master hadoop1 hadoop2 hadoop3; do vagrant ssh $host --command  'sudo chmod -R 777 /tmp/hadoop-root/ /opt/hadoop-2.4.1/logs/' ; done

Or to have them copied to the local shared folder

for host in master hadoop1 hadoop2 hadoop3; do vagrant ssh $host --command  "sudo mkdir -p /vagrant/machines/$host/; sudo cp -Ru /tmp/hadoop-root/ /opt/hadoop-2.4.1/logs/ /vagrant/machines/$host/" ; done

Or on master, call (log aggregation is currently enabled by default)

yarn logs -applicationId <application ID>

To grab the stack traces for all running TezChild vms

for host in master hadoop1 hadoop2 hadoop3; do vagrant ssh $host --command  "sudo mkdir -p /vagrant/stacks/$host; sudo jps | grep TezChild | cut -d\" \" -f1 | xargs -n 1 -I @ sh -c \"sudo jstack @ > /vagrant/stacks/$host/@.txt\" " ; done