Skip to content

Commit

Permalink
customize for inmobi hive trunk
Browse files Browse the repository at this point in the history
  • Loading branch information
Sharad Agarwal authored and Sharad Agarwal committed Dec 3, 2014
1 parent 055da60 commit 35585e4
Show file tree
Hide file tree
Showing 14 changed files with 43 additions and 2,488 deletions.
33 changes: 28 additions & 5 deletions Dockerfile
@@ -1,5 +1,12 @@
FROM prasanthj/docker-tez:tez-0.5.2
MAINTAINER Prasanth Jayachandran
FROM prasanthj/docker-hadoop

MAINTAINER Sharad Agarwal
#Based on Inmobi Hive
#Builds the InMobi Hive from trunk
#Configure Postgres DB
#Starts Hive metastore Server
#Starts Hive Server2


# to configure postgres as hive metastore backend
RUN apt-get update
Expand All @@ -15,10 +22,21 @@ RUN /etc/init.d/postgresql start &&\

# revert back to default user
USER root

# dev tools to build
RUN apt-get update
RUN apt-get install -y git libprotobuf-dev protobuf-compiler

# install maven
RUN curl -s http://mirror.olnevhost.net/pub/apache/maven/binaries/apache-maven-3.2.1-bin.tar.gz | tar -xz -C /usr/local/
RUN cd /usr/local && ln -s apache-maven-3.2.1 maven
ENV MAVEN_HOME /usr/local/maven
ENV PATH $MAVEN_HOME/bin:$PATH


# clone and compile hive
ENV HIVE_VERSION 0.15.0-SNAPSHOT
RUN cd /usr/local && git clone https://github.com/apache/hive.git
ENV HIVE_VERSION 0.13.4-inm-SNAPSHOT
RUN cd /usr/local && git clone https://github.com/InMobi/hive.git
RUN cd /usr/local/hive && /usr/local/maven/bin/mvn clean install -DskipTests -Phadoop-2,dist
RUN mkdir /usr/local/hive-dist && tar -xf /usr/local/hive/packaging/target/apache-hive-${HIVE_VERSION}-bin.tar.gz -C /usr/local/hive-dist

Expand All @@ -36,7 +54,7 @@ ENV PGPASSWORD hive
# initialize hive metastore db
RUN /etc/init.d/postgresql start &&\
cd $HIVE_HOME/scripts/metastore/upgrade/postgres/ &&\
psql -h localhost -U hive -d metastore -f hive-schema-0.15.0.postgres.sql
psql -h localhost -U hive -d metastore -f hive-schema-0.13.0.postgres.sql

# copy config, sql, data files to /opt/files
RUN mkdir /opt/files
Expand All @@ -61,3 +79,8 @@ ENV POSTGRESQL_CONFIG_FILE $POSTGRESQL_MAIN/postgresql.conf
ENV POSTGRESQL_BIN /usr/lib/postgresql/9.3/bin/postgres
ADD postgresql.conf $POSTGRESQL_MAIN
RUN chown postgres:postgres $POSTGRESQL_CONFIG_FILE





116 changes: 3 additions & 113 deletions README.md
@@ -1,117 +1,7 @@
Docker image to run Apache Hive on Tez
======================================

This repository contains a docker file to build a docker image to run Apache Hive on Tez. This docker file depends on my other repos containing [docker-tez] and [docker-hadoop] base images.
Docker image to run InMobi Hive
===============================

## Current Version
* Apache Hive (trunk version)
* Apache Tez 0.5.2
* Apache InMobi Hive (trunk version)
* Apache Hadoop 2.5.0
* PostgreSQL 9.3 (Hive metastore backend)

## Running on Mac OS X

This step is required only for Mac OS X as docker is not natively supported in Mac OS X. To run docker on Mac OS X we need Boot2Docker. Boot2Docker installs headless virtual box, runs a lightweight linux distribution and sets up to run docker daemon.

### Setting up docker

* Install Boot2Docker from [here].
* After installing, from terminal, run `boot2docker init` to initialize boot2docker.
* Run `boot2docker start` to start boot2docker and export `DOCKER_HOST` and `DOCKER_CERT_PATH` as shown at the end of command.
* After exporting `DOCKER_HOST` and `DOCKER_CERT_PATH` we can run docker commands.

*NOTE:* docker 1.3.0 versions require --tls to be passed to all docker command

## Pull the image
You can either pull the image that is already pre-built from Docker hub or build the image locally (refer next section)

> docker --tls pull prasanthj/hive-on-tez

## Building the image

If you do not want to pull the image from Docker hub, you can build it locally using the following steps
* To build the hive-on-tez docker image locally from Dockerfile, first checkout source using
`git clone https://github.com/prasanthj/docker-hive-on-tez.git`
* Change to docker-hive-on-tez directory `cd docker-hive-on-tez`

> docker --tls build -t prasanthj/hive-on-tez .

## Running the image

> docker --tls run -i -t -P prasanthj/hive-on-tez /etc/hive-bootstrap.sh -bash

## Testing Hive on Tez
After launching the container using the command from "Running the image" section, bash is launched. On the bash prompt type the following to run a sample hive query

> hive -f /opt/files/store_sales.sql
Running the above command should show output like below after successful execution

Status: Running (Executing on YARN cluster with App id application_1415171696020_0001)

--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
Reducer 3 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 1.65 s
--------------------------------------------------------------------------------
OK
2452143 30
2451524 25
2452274 25
2452187 20
2450952 16
2451942 16
2451083 15
2451390 15
2451415 15
2452181 15
Time taken: 2.566 seconds, Fetched: 10 row(s)

## Testing Hive on MapReduce v2 (YARN)
Run the same example above with the following additional hive config
> hive -f /opt/files/store_sales.sql -hiveconf hive.execution.engine=mr -hiveconf mapreduce.framework.name=yarn -hiveconf yarn.resourcemanager.address=localhost:8032
Running the above command should show output like below after successful execution

MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.17 sec HDFS Read: 36073 HDFS Write: 1830 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 33.47 sec HDFS Read: 2234 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 36 seconds 640 msec
OK
2452143 30
2451524 25
2452274 25
2452187 20
2450952 16
2451942 16
2451083 15
2451390 15
2451415 15
2452181 15
Time taken: 53.967 seconds, Fetched: 10 row(s)

## Viewing Web UI
If you are running docker using Boot2Docker then do the following steps

* Setup routing on the host machine (Mac OS X) using the following
command `sudo route add -net 172.17.0.0/16 192.168.59.103`
_NOTE_: 172.17.0.X is usually the ipaddress of docker container. 192.168.59.103 is the ipaddress exported in `DOCKER_HOST`

* Get containers IP address
* To get containers IP address we need CONTAINER_ID. To get container id use the following command which should list all running containers and its ID
`docker --tls ps`
* Use the following command to get containers IP address (where CONTAINER_ID is the container id of prasanthj/hive-on-tez image)
`docker --tls inspect -f=“{{.NetworkSettings.IPAddress}}” CONTAINER_ID`

* Launch a web browser and type `http://<container-ip-address>:8088` to view hadoop cluster web UI.

[here]:https://github.com/boot2docker/osx-installer/releases
[docker-tez]:https://github.com/prasanthj/docker-tez.git
[docker-hadoop]:https://github.com/prasanthj/docker-hadoop.git
60 changes: 0 additions & 60 deletions hive-0.14/Dockerfile

This file was deleted.

0 comments on commit 35585e4

Please sign in to comment.