Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
|Current responsible(s)||Yiannis Mouchakis @ NCSR-D -- email@example.com|
The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
The docker container for Apache Hive is based on https://github.com/big-data-europe/docker-hadoop so check there for Hadoop configurations. This container deploys Hive and starts a hiveserver2 on port 10000. By default metastore_db is located at /hive-metastore. All Hive configuration files are located in the conf directory.
First you have to clone the repository from https://github.com/big-data-europe/docker-hive.git
To build docker-hive go into the docker-hive directory and run
docker build -t hive .
To run it first deploy Hadoop (see https://github.com/big-data-europe/docker-hadoop) Then start hiveserver2 by running
docker run --name hive --net=hadoop -p 10000:10000 -p 10002:10002 -v <path/to/metastore_db/location>:/hive-metastore --env-file=./hadoop.env hive
Then you can access hiveserver2 from localhost:10000 and hiveserver2 UI from localhost:10002
Deploy with docker compose
You can also deploy Hive with Hadoop with docker compose. It will set up a hadoop cluster with 3 datanodes and hive with hiveserver running. All data are stored in ./data
To do so first create the hadoop network
docker network create hadoop
Then deploy the cluster with
In order to scale up Hive you must add more Hadoop nodes. For more info see on how to add more nodes see https://github.com/big-data-europe/docker-hadoop. You can also edit the docker-compose.yml file and add more nodes there.