## Creating Partitioned Tables

Let us understand how to create partitioned table and get data into that table.

In [1]:
val username = System.getProperty("user.name")

username = itv002461


itv002461

In [2]:
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Managing Tables - DML and Partitioning").
    master("yarn").
    getOrCreate

username = itv002461
spark = org.apache.spark.sql.SparkSession@b49b6f


org.apache.spark.sql.SparkSession@b49b6f

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

* Earlier we have already created orders table. We will use that as reference and create partitioned table.
* We can use `PARTITIONED BY` clause to define the **column along with data type**. In our case we will use **order_month as partition column**.
* We will not be able to directly load the data into the partitioned table using our original orders data (as data is not in sync with structure).

Here is the example of creating partitioned tables in Spark Metastore.

In [3]:
%%sql

USE itv002461_retail

Waiting for a Spark session to start...

++
||
++
++



In [4]:
%%sql

SHOW tables

+----------------+-----------------+-----------+
|        database|        tableName|isTemporary|
+----------------+-----------------+-----------+
|itv002461_retail|       categories|      false|
|itv002461_retail|        customers|      false|
|itv002461_retail|      departments|      false|
|itv002461_retail|      order_items|      false|
|itv002461_retail|order_items_stage|      false|
|itv002461_retail|           orders|      false|
|itv002461_retail|         products|      false|
+----------------+-----------------+-----------+



* Drop orders_part if it already exists

In [5]:
%%sql

DROP TABLE IF EXISTS orders_part

++
||
++
++



In [6]:
%%sql

CREATE TABLE orders_part (
  order_id INT,
  order_date STRING,
  order_customer_id INT,
  order_status STRING
) PARTITIONED BY (order_month INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [7]:
%%sql

DESCRIBE orders_part

+--------------------+---------+-------+
|            col_name|data_type|comment|
+--------------------+---------+-------+
|            order_id|      int|   null|
|          order_date|   string|   null|
|   order_customer_id|      int|   null|
|        order_status|   string|   null|
|         order_month|      int|   null|
|# Partition Infor...|         |       |
|          # col_name|data_type|comment|
|         order_month|      int|   null|
+--------------------+---------+-------+



In [8]:
spark.sql("DESCRIBE FORMATTED orders_part").show(200, false)

+----------------------------+--------------------------------------------------------------------------------------+-------+
|col_name                    |data_type                                                                             |comment|
+----------------------------+--------------------------------------------------------------------------------------+-------+
|order_id                    |int                                                                                   |null   |
|order_date                  |string                                                                                |null   |
|order_customer_id           |int                                                                                   |null   |
|order_status                |string                                                                                |null   |
|order_month                 |int                                                                                   |n

In [9]:
import sys.process._

s"hdfs dfs -ls /user/${username}/warehouse/${username}_retail.db/orders_part" !



0