# Hive Data Definition Language (Contd…)

As part of this topic, we will cover
* Comparison with RDBMS
* Hive Delimiters
* Hive Schema on Read
* Creating Retail Database in Hive

### Comparison with RDBMS
Let us understand the difference between Hive Databases and traditional RDBMS Databases.
* Traditional RDBMS is used for mission-critical transaction based applications.
* There are some differences with respect to Data Types – eg: string is not available as the data type for tables in databases like Oracle.
* Data Models are typically Normalized into parent tables and child tables.
* Data is loaded into tables using DML statements like insert or using tools like SQLLDR.
* We can access data from tables only using queries, we cannot directly read data from underlying files and make some sense out of it.

### Hive Delimiters
As Hive Table’s Metadata is decoupled from actual data, we typically need to specify delimiters while creating the tables using Text File Format. Let us understand more about the Delimiters in Hive tables of type Text File Format.
* When we create tables in Hive, data will reside in HDFS and metadata will be in Metastore.
* Using Hive Metastore we can query metadata of our tables.
* **describe formatted** give all the details about metadata of a table.
* LOCATION will tell us under which HDFS directory data resides
* Unlike in RDBMS, we can access files in HDFS directly.
* Even though we can run statements like insert, update and delete etc on Hive tables, it is not very common. There is a lot of overhead when ever we try to perform traditional DML statements on Hive tables.
* We can use **show create table** command to get the statement which can be used to create table with similar structure else where.
* We can specify delimiters for text data.
* We need to use LOAD command to copy files into Hive tables directly or INSERT command to save query results into Hive table.
* We will get into details with respect to LOAD and INSERT at a later point in time

### Hive – Schema on Read
Let us understand the concept of **Schema on Read** when it comes to Hive.
* When we insert data into a hive table without any delimiter it will take ASCII null by default
* We can use Describe formatted to locate out table HDFS location directly


```DESCRIBE FORMATTED demo;```

* To get the data from HDFS we can use **hadoop fs -get** command

```hadoop fs -get hdfs://nn01.itversity.com/apps/hive/warehouse/bootcampdemo_retail_db_txt.db/demo```

* To drop table use command drop

```DROP demo;```

* If inserted data has more than defined size, Hive will omit the extra data and stores the data for defined size. It won’t specify any warnings like in SQL.
* To know exact syntax of Hive create table by default

```show create table demo;```

* Adding delimiter as ‘,’ use the below syntax

```CREATE TABLE demo(
i INT,
s STRING,
v VARCHAR(3))
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';```

* Loading data from the external sources. We can load data from the local file system and HDFS.
* When we load data from the file it will run faster than inserting one row by using insert command

```LOAD DATA LOCAL INPATH '/home/training/demo.txt' INTO demo;```

* When we load the data which has wrong schema structure will be inserted into a table without any errors or warnings.
* It will just replace with “NULL” where the schema is not satisfied. But the data remain like the original file in HDFS

### Create Retail Database in Hive
Let us create database for the retail tables in Hive.
* We have data in the lab under location data

```ls -ltr /data```

* We can invoke functions such as current_date, current_timestamp to get current system date and also to understand the default format.

```select current_date;```

```select current_timestamp;```

* As our order_date is in timestamp format, let’s specify data type for order_status as timestamp and create orders table.

```CREATE TABLE orders(
order_id INT,
order_date TIMESTAMP,
order_customer_id INT,
order_status STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';```

* Loading orders data into table orders

```LOAD DATA LOCAL INPATH'/data/retail_db/orders' INTO TABLE orders;```

* If we need to validate loaded data and we want to select limit data to use the command

```select * from orders limit 10;```