Hive


Website	https://hive.apache.org/
Supported versions	2.0.0
Current responsible(s)	Yiannis Mouchakis @ NCSR-D -- gmouchakis@iit.demokritos.gr
Docker image(s)	bde2020/hive
More info	https://github.com/apache/hive.git

Short description

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

Example usage

The docker container for Apache Hive is based on https://github.com/big-data-europe/docker-hadoop so check there for Hadoop configurations. This container deploys Hive and starts a hiveserver2 on port 10000. By default metastore_db is located at /hive-metastore. All Hive configuration files are located in the conf directory.

First you have to clone the repository from https://github.com/big-data-europe/docker-hive.git

To build docker-hive go into the docker-hive directory and run

docker build -t hive .

To run it first deploy Hadoop (see https://github.com/big-data-europe/docker-hadoop) Then start hiveserver2 by running

 docker run --name hive --net=hadoop -p 10000:10000 -p 10002:10002 -v <path/to/metastore_db/location>:/hive-metastore --env-file=./hadoop.env hive

Then you can access hiveserver2 from localhost:10000 and hiveserver2 UI from localhost:10002

Deploy with docker compose

You can also deploy Hive with Hadoop with docker compose. It will set up a hadoop cluster with 3 datanodes and hive with hiveserver running. All data are stored in ./data

To do so first create the hadoop network

 docker network create hadoop

Then deploy the cluster with

 docker-compose up

Scaling

In order to scale up Hive you must add more Hadoop nodes. For more info see on how to add more nodes see https://github.com/big-data-europe/docker-hadoop. You can also edit the docker-compose.yml file and add more nodes there.

Computational frameworks
- Flink
- Spark
- Storm
Data storage
- Hadoop
- Hue HDFS File Browser
- Cassandra
- Hive
- Redis
- Virtuoso
- 4store
- PostGIS
- Zeppelin
Data acquisition
- Flume
Message passing
- Kafka
Search engines
- Elasticsearch
- Solr
Semantic components
- DEER
- EDCAT
- FOX
- GeoTriples
- Silk
- Limes
- SEMAGROW engine
- Sextant
- Strabon
- UnifiedViews

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hive

Short description

Example usage

Deploy with docker compose

Scaling

Home

BDE stack

Implementing pilot on BDE stack

Implementing pilot on BDI platform

Installation

Components

Clone this wiki locally