Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Clone this wiki locally
I. Building BFS
Note that references to svn on declarativity.org need to be changed to git in this repo!
1. Fetch and build Stasis, the transactional storage library used by JOL.
svn co http://stasis.googlecode.com/svn/trunk stasis cd stasis mkdir build cd build cmake .. make cd ../..
2. Fetch and build JOL, the Java Overlog Library.
svn co https://svn.declarativity.net/lincoln/java/trunk jol export STASIS_DIR=[location of stasis directory above] export JAVA_DIR=[java home] cd jol make ant cd ..
3. Fetch and build BFS, the BOOM distributed filesystem.
svn co https://svn.declarativity.net/bfs/bfs_eurosys bfs cd bfs export JOL_DIR=../jol cp $JOL_DIR/ant-dist/jol.jar lib export CLASSPATH=$CLASSPATH:lib/jol.jar make test-stasis cd ..
The last command runs a series of unit tests after building BFS. Some of these take a long time to run.
II. Setting up Hadoop to run with BFS.
In principle, BFS can (with the appropriate bindings) replace HDFS for any versi on of Hadoop. In practice, we have it working with version 0.19.1 because this is the version against which BOOM-MR was built. A typical user will want t o use BFS with BOOM-MR, so the instructions below begin by downloading this Hadoop distribution.
1. Get the BOOM-MR version of Hadoop
svn co https://svn.declarativity.net/hadoop-0.19.1-bfs boom-mr cd boom-mr export CLASSPATH=$CLASSPATH:../bfs/dist/bfs.jar ant ant examples
2. Edit conf/hadoop_env.sh
Make sure that JAVA_HOME is uncommented and set correctly, and that HADOOP_CLASS PATH includes the jars for JOL and BFS. I have
# The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-6-sun export JAVA=`which java` # Extra Java CLASSPATH elements. Optional. export HADOOP_CLASSPATH=/root/tests/bfs/dist/bfs.jar:/root/tests/jol/ant-dist/jo l.jar
3. Edit conf/hadoop_site.sh
Add any appropriate local configuration here, making sure to have entries like t hose below.
<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property>
To use BFS in place of HDFS, replace the first entry above with:
<property> <name>fs.default.name</name> <value>bfs://localhost:9000</value> </property>
3. Edit conf/masters and conf/slaves
Set these as appropriate to represent the cluster, but at the very least change 'localhost' to your hostname or IP address.
4. FIX ME. Edit bin/hadoop_daemon.sh
See these lines?
export JOL_DIR=/root/jol export STASIS_DIR=/root/stasis export JAVA_DIR=/usr/lib/jvm/java-6-sun export LD_LIBRARY_PATH=/root/stasis/build/src/stasis
Get rid of them!
Now you should be able to run BOOM-MR on top of BFS.
start-bfs-all.sh hadoop fs -put /usr/share/dict/words /words hadoop jar build/hadoop-*-examples.jar wordcount /words /words.out