BFS Instructions

Joe Hellerstein edited this page Dec 8, 2015 · 2 revisions

I. Building BFS

Note that references to svn on declarativity.org need to be changed to git in this repo!

1. Fetch and build Stasis, the transactional storage library used by JOL.

svn co http://stasis.googlecode.com/svn/trunk stasis
cd stasis
mkdir build
cd build
cmake ..
make
cd ../..

2. Fetch and build JOL, the Java Overlog Library.

svn co https://svn.declarativity.net/lincoln/java/trunk jol
export STASIS_DIR=[location of stasis directory above]
export JAVA_DIR=[java home]
cd jol
make
ant
cd ..

3. Fetch and build BFS, the BOOM distributed filesystem.

svn co https://svn.declarativity.net/bfs/bfs_eurosys bfs
cd bfs
export JOL_DIR=../jol
cp $JOL_DIR/ant-dist/jol.jar lib
export CLASSPATH=$CLASSPATH:lib/jol.jar

make test-stasis
cd ..

The last command runs a series of unit tests after building BFS. Some of these take a long time to run.

II. Setting up Hadoop to run with BFS.

In principle, BFS can (with the appropriate bindings) replace HDFS for any versi on of Hadoop. In practice, we have it working with version 0.19.1 because this is the version against which BOOM-MR was built. A typical user will want t o use BFS with BOOM-MR, so the instructions below begin by downloading this Hadoop distribution.

1. Get the BOOM-MR version of Hadoop

svn co https://svn.declarativity.net/hadoop-0.19.1-bfs boom-mr
cd boom-mr
export CLASSPATH=$CLASSPATH:../bfs/dist/bfs.jar
ant
ant examples

2. Edit conf/hadoop_env.sh

Make sure that JAVA_HOME is uncommented and set correctly, and that HADOOP_CLASS PATH includes the jars for JOL and BFS. I have

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export JAVA=`which java`


# Extra Java CLASSPATH elements.  Optional.
export HADOOP_CLASSPATH=/root/tests/bfs/dist/bfs.jar:/root/tests/jol/ant-dist/jo
l.jar

3. Edit conf/hadoop_site.sh

Add any appropriate local configuration here, making sure to have entries like t hose below.

 <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

To use BFS in place of HDFS, replace the first entry above with:

 <property>
    <name>fs.default.name</name>
    <value>bfs://localhost:9000</value>
  </property>

3. Edit conf/masters and conf/slaves

Set these as appropriate to represent the cluster, but at the very least change 'localhost' to your hostname or IP address.

4. FIX ME. Edit bin/hadoop_daemon.sh

See these lines?

        export JOL_DIR=/root/jol
        export STASIS_DIR=/root/stasis
        export JAVA_DIR=/usr/lib/jvm/java-6-sun
        export LD_LIBRARY_PATH=/root/stasis/build/src/stasis

Get rid of them!

Now you should be able to run BOOM-MR on top of BFS.

start-bfs-all.sh
hadoop fs -put /usr/share/dict/words /words
hadoop jar build/hadoop-*-examples.jar wordcount /words /words.out