Skip to content

Building Shark from Source Code

Harvey Feng edited this page Sep 3, 2013 · 9 revisions

This guide describes the components needed to compile and run Shark from the beginning.

(1) Getting Java + Scala

The only prerequisite for this guide is that you have Java version 6 or 7 and Scala 2.9.3 installed on your machine. If you don't have Scala 2.9.3, you can download it by running:

$ wget
$ tar xvfz scala-2.9.3.tgz
(2) Getting the Patched Hive

Then download our patched version of Hive and untar it:

$ wget
$ tar xvfz hive-0.9.0-bin.tar.gz
(3) Compiling Spark + Shark

Clone the branch-0.7 branch of Spark from Github, and compile and publish Spark to your local repository:

$ git clone -b branch-0.7 spark-0.7
$ sbt/sbt publish-local

Clone the branch-0.7 branch of Shark from Github:

$ git clone -b branch-0.7 shark-0.7

Edit the shark-0.7/conf/ file and change SCALA_HOME and HIVE_HOME to point the the right locations:

$ cd shark-0.7
$ cp conf/ conf/
$ vim conf/

Hive requires that /tmp and /user/hive/warehouse/src exist on your computer. Create them if they don't already exist.

Compile Shark:

$ sbt/sbt package

The build system uses Maven/Ivy to fetch its dependencies. If this is the first time you are building the project, it can take a while to download all the dependencies. Subsequent builds, however, will be much faster.

(4) Running Shark

Once it is built, you can start the Shark CLI:

$ shark-0.7/bin/shark-withinfo

bin/shark-withinfo is useful for development, since it outputs logging information to the Shark console. bin/shark provides a less verbose version of the CLI.

To verify that Shark is running, you can try the following example, which creates a table with sample data:

shark> CREATE TABLE src(key INT, value STRING);
shark> LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src;
shark> SELECT COUNT(1) FROM src;
shark> CREATE TABLE src_cached AS SELECT * FROM SRC;
shark> SELECT COUNT(1) FROM src_cached;
You can’t perform that action at this time.