# Hive Data Manipulation Language (DML)

As part of this topic, We cover some DDL and DML commands. We will also see about CTAS, the different type of inserting methods into the table and how to create a partitioned table and insert into a partitioned table.

### Hive File Formats
**DDL** is an abbreviation for Data Definition Language. It is used to create and modify the structure of database objects in the database.

**DML** is an abbreviation for Data Manipulation Language. It is used to retrieve, store, modify, delete, insert and update data in the database.
* To create CTAS (create table as select) table we can run the command

```use bootcampdemo_retail_db_txt;```

```create table orders01 as select * from orders;```

* It will create the new table with the same structure as the original table. But there might be some changes in the metadata of the table.
* Like we can see row format changed in the orders01 table.

***Creating a table with orc format***

```create table orders_orc
stored as orc
as 
select * from orders;```

* By default orc format inherits the row format serde because it has to understand how to read the data and how to write the data.
* There are three ways to load the data into hive tables
    * Insert Individual records
    * Select data from other tables
    * Load the data from the files
        * Load table in Hive console
        * Copy the files directly into HDFS location
* The truncate command will delete the data in the table but preserves the structure of the table

### Hive Load Command
There are two ways to load the data from the local file system, the first one is using put command and the other one is load command.
* Create table by using row format delimited by ‘,’


```CREATE TABLE `orders`(
`order_id` int,
`order_date` string,
`order_customer_id` int,
`order_status` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','```

* Now let us copy data directly from HDFS location

```hadoop fs -put /data/retail_db/orders /apps/hive/warehouse/bootcampdemo_retail_db_txt/orders```

* The number of files is 0 since we copied data from hdfs location directly but it is not updated metadata.
* Now let us see loading a local file into the table

```load data local inpath /data/retail_db/orders orders```

* To append the same data into the same table we have to specify the file name

```hadoop fs -put /data/retail_db/orders /apps/hive/warehouse/bootcampdemo_retail_db_txt/orders```

* If we do the same table by loading the data it will update the file name with a new file name.

```load data local inpath /data/retail_db/orders orders```

```Original>part-00000_0```

```New file>part-0000_copy_1```

* To overwrite the data we can use the keyword ‘overwrite’

```load data local inpath /data/retail_db/orders overwrite into table orders```

### Hive Insert Command
Now let us see the second approach of getting data into a table using insert statement.
* We can drop the table using drop command.

```truncate orders;```

* Let us truncate table orders_orc so that we can insert into it as we have already inserted data into the table using CTAS.

```truncate orders_orc;```

* Inserting data into an existing table by overwriting the data

```insert overwrite table orders_orc```

```select * from orders_stage;```

* creating a table by using partitioned

```CREATE TABLE orders_partitioned(
order_id INT,
order_customer_id INT,
order_status STRING
)PARTITIONED BY (order_date STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';```

* Insert data into a partitioned table

```INSERT INTO TABLE orders_partitioned PARTITION (order_date)```

```SELECT order_id, order_customer_id, order_status, order_date
FROM orders_stage;```

Note:It will fail due to dynamic partition mode is not set to nonstrict. We have to set it to nonstrict mode to run and insert data into table

```set hive.exec.dynamic.partition.mode=nonstrict```

//Now run the insert command to insert into a partitioned table
* Now we can check the HDFS, it will create partitioned files based on date

```dfs -ls /apps/hive/warehouse/bootcampdemo_retail_db_txt.db/orders_partitioned/order_date=2014-05-20;```

* If we read the files we can see only three fields since the date become part of the file name.