# HBase Overview and Hive Getting Started

As part of this topic, We will cover briefly on HBase and Hive.
* Overview of HBase
    * DDL and DML
* Overview of Hive
* Hive architecture
* Hive properties and about log files

### HBase
HBase is a data model similar to Google’s big table that is designed to provide random access to the high volume of structured or unstructured data. HBase is an important component of the Hadoop ecosystem that leverages the fault tolerance feature of HDFS. HBase provides real-time read or writes access to data in HDFS. HBase can be referred to as a data store instead of a database as it misses out on some important features of traditional RDBMS like typed columns, triggers, advanced query languages and secondary indexes.HBase is not a direct replacement for a classic SQL database

HBase is a NoSQL, column-oriented database built on top of Hadoop to overcome the drawbacks of HDFS as it allows fast random writes and reads in an optimized way. Also, with exponentially growing data, relational databases cannot handle the variety of data to render better performance. HBase region servers handle the data read in real time and written in real-time.n the HBase data model columns are grouped into column families, which must be defined up front during table creation. Column families are stored together on disk, which is why HBase is referred to as a column-oriented data store.

### *[HBase Application – Development Life Cycle](https://kaizen.itversity.com/courses/)*
* To get launch hbase we have to run the command hbase shell.
* To get the help from the hbase shell just type help and we will get a bunch of commands and usage of them.

* The commands are categorised into
    * DDL – Creating table, Altering table and dropping table.
    * DML – Insert data, Update table etc.
* To list all the tables we have to run below command

* A namespace is nothing but a logical grouping of tables.
* Creating namespace in Hbase

* To list the tables under one namespace

* Creating a table under the namespace and insert a column under the table

create 'bootcampdemo:demo','cf1'

* When we scan the table output will be sorted automatically
* Put command will update the value based on the column name.
* To list particular values based on key as the column

* To delete one column
* We can delete all the columns at a time by using delete all and column key

* runcate to delete all the rows from the table
* To get the available filters to use the command

### Overview of Hive
Hive is a flexible environment to create databases on HDFS. Hive uses HDFS as the data storage file system. Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data summarization, query, and analysis. Hive gives a SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.
* Metastore
    * Stores the metadata for each of the tables such as their schema and location which is highly crucial.
    * The data is stored in a traditional RDBMS format. The driver keeps the track of the metadata.
* Driver
    * The driver acts like a controller which receives the HiveQL statements.
    * The driver also acts as a collection point of data or query result obtained after the Reduce operation.
* Compiler
    * The compiler performs a compilation of the HiveQL query, which converts the query to an execution plan.
    * The compiler converts the query to an abstract syntax tree. 
* Optimizer
    * The optimizer performs various transformations on the execution plan to get an optimized DAG.
    * It can also split the tasks, such as applying a transformation to data before a reduce operation, to provide better performance and scalability.
* Executor
    * After compilation and optimization, the executor executes the tasks. It interacts with the job tracker of Hadoop to schedule tasks to be run.
* CLI, UI, and Thrift Server
    * A command-line interface (CLI) provides a user interface for an external user to interact with Hive by submitting queries, instructions and monitoring the process status.
    * Thrift server allows external clients to interact with Hive over a network, similar to the JDBC or ODBC protocols.
    
### Hive CLI
We can launch hive by just running simple command hive from any location in the gateway using **hive** command. Let us see how we can write queries.
* The hive will launch its console where we can perform all the DDL and DML operations.
* To list the databases

* To use the database and show tables us below commands

### Overriding Hive Properties
We can alter run time behavior of Hive CLI by overriding the properties.
* To get the properties, We can go to Ambari.
* To check the properties from From the command line

* Our system-level log file location

* Hive command line history file save at your home location with the name .hivehistory
* To locate the hive warehouse directory

* Set the hive execution engine to tez

* We can update replication by command

dfs.replication=3

* To control the behavior of the hive when launching we can write a file name .hiverc

* We can also add properties from the command line by passing We can also add properties from the command line by passing

* To set the own log location, We can change at the time of launching

```hive --hiveconf```

```set hive.log.dir=/home/`whoami```

### Running Hive in Spark Context
Let us see how we can run Hive Queries using Spark Context.
* We can access hive tables from the spark-sql console as well.
* We can run the hive commands from spark-shell

* Hive has UI console which is available in Ambari, It is hive view.