Building Shark from Source Code

This guide describes the components needed to compile and run Shark from the beginning.

(1) Getting Java + Scala

The only prerequisite for this guide is that you have Java version 6 or 7 and Scala 2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can download it by running:

$ wget http://www.scala-lang.org/files/archive/scala-2.9.3.tgz
$ tar xvfz scala-2.9.3.tgz

(2) Getting the Patched Hive

Then download our patched version of Hive and untar it:

$ wget http://spark-project.org/download-hive-0.9.0-bin.tar.tz
$ tar xvfz hive-0.9.0-bin.tar.gz

(3) Compiling Spark + Shark

Clone the branch-0.7 branch of Spark from Github, and compile and publish Spark to your local repository:

$ git clone https://github.com/mesos/spark.git -b branch-0.7 spark-0.7
$ sbt/sbt publish-local

Clone the branch-0.7 branch of Shark from Github:

$ git clone https://github.com/amplab/shark.git -b branch-0.7 shark-0.7

Edit the shark-0.7/conf/shark-env.sh file and change SCALA_HOME and HIVE_HOME to point the the right locations:

$ cd shark-0.7
$ cp conf/shark-env.sh.template conf/shark-env.sh
$ vim conf/shark-env.sh

Hive requires that /tmp and /user/hive/warehouse/src exist on your computer. Create them if they don't already exist.

Compile Shark:

$ sbt/sbt package

The build system uses Maven/Ivy to fetch its dependencies. If this is the first time you are building the project, it can take a while to download all the dependencies. Subsequent builds, however, will be much faster.

(4) Running Shark

Once it is built, you can start the Shark CLI:

$ shark-0.7/bin/shark-withinfo

bin/shark-withinfo is useful for development, since it outputs logging information to the Shark console. bin/shark provides a less verbose version of the CLI.

To verify that Shark is running, you can try the following example, which creates a table with sample data:

shark> CREATE TABLE src(key INT, value STRING);
shark> LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src;
shark> SELECT COUNT(1) FROM src;
shark> CREATE TABLE src_cached AS SELECT * FROM SRC;
shark> SELECT COUNT(1) FROM src_cached;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building Shark from Source Code

(1) Getting Java + Scala

(2) Getting the Patched Hive

(3) Compiling Spark + Shark

(4) Running Shark

Clone this wiki locally