## Exercises - Basic SQL Queries

Here are some of the exercises for which you can write SQL queries to self evaluate.

Let us start spark context for this Notebook so that we can execute the code provided. You can sign up for our [10 node state of the art cluster/labs](https://labs.itversity.com/plans) to learn Spark SQL using our unique integrated LMS.

In [1]:
val username = System.getProperty("user.name")

username = itv002461


itv002461

In [2]:
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")
val spark = SparkSession.
    builder.
    config("spark.ui.port", "0").
    config("spark.sql.warehouse.dir", s"/user/${username}/warehouse").
    enableHiveSupport.
    appName(s"${username} | Spark SQL - Basic Transformations").
    master("yarn").
    getOrCreate

username = itv002461
spark = org.apache.spark.sql.SparkSession@59574de9


org.apache.spark.sql.SparkSession@59574de9

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

**Using Spark SQL**

```
spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Scala**

```
spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

**Using Pyspark**

```
pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
```

In [3]:
%%sql
USE itv002461_retail

Waiting for a Spark session to start...

++
||
++
++



In [4]:
%%sql

DROP TABLE IF EXISTS orders

++
||
++
++



In [5]:
%%sql
DROP TABLE IF EXISTS order_items

++
||
++
++



In [6]:
%%sql
DROP TABLE IF EXISTS categories

++
||
++
++



In [7]:
%%sql
DROP TABLE IF EXISTS customers

++
||
++
++



In [8]:
%%sql
DROP TABLE IF EXISTS products

++
||
++
++



In [9]:
%%sql
DROP TABLE IF EXISTS departments

++
||
++
++



In [10]:
%%sql

CREATE TABLE orders (
    order_id INT,
    order_date STRING,
    order_customer_id INT,
    order_status STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [11]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/orders' INTO TABLE orders

++
||
++
++



In [12]:
%%sql 

CREATE TABLE order_items (
    order_item_id INT,
    order_item_order_id INT,
    order_item_product_id INT,
    order_item_quantity INT,
    order_item_subtotal FLOAT,
    order_item_product_price FLOAT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [13]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/order_items' INTO TABLE order_items

++
||
++
++



In [14]:
%%sql

CREATE TABLE customers (
customer_id INT,
customer_first_name STRING,
customer_last_name STRING,
customer_email STRING,
customer_password STRING,
customer_street STRING,
customer_city STRING,
customer_state STRING,
customer_zipcode INT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [15]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/customers' INTO TABLE customers

++
||
++
++



In [16]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itv002461/warehouse/${username}_retail.db/customers"!

Found 1 items
-rwxr-xr-x   3 itv002461 supergroup     953719 2022-06-03 02:12 /user/itv002461/warehouse/itv002461_retail.db/customers/part-00000


username = itv002461




0

In [17]:
%%sql

CREATE TABLE products (
product_id INT,
product_category_id INT,
product_name STRING,
product_description STRING,
product_price FLOAT,
product_image STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [18]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/products' INTO TABLE products

++
||
++
++



In [19]:
%%sql
select * from products

|         4|                  2|Under Armour Men'...|          ...


+----------+-------------------+--------------------+-------------------+-------------+--------------------+
|product_id|product_category_id|        product_name|product_description|product_price|       product_image|
+----------+-------------------+--------------------+-------------------+-------------+--------------------+
|         1|                  2|Quest Q64 10 FT. ...|                   |        59.98|http://images.acm...|
|         2|                  2|Under Armour Men'...|                   |       129.99|http://images.acm...|
|         3|                  2|Under Armour Men'...|                   |        89.99|http://images.acm...|
|         4|                  2|Under Armour Men'...|                   |        89.99|http://images.acm...|
|         5|                  2|Riddell Youth Rev...|                   |       199.99|http://images.acm...|
|         6|                  2|Jordan Men's VI R...|                   |       134.99|http://images.acm...|
|         7|       

In [20]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itv002461/warehouse/${username}_retail.db/products"!

Found 1 items
-rwxr-xr-x   3 itv002461 supergroup     174155 2022-06-03 02:12 /user/itv002461/warehouse/itv002461_retail.db/products/part-00000


username = itv002461




0

In [21]:
%%sql

CREATE TABLE categories (
category_id INT,
category_department_id INT,
category_name STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [22]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/categories' INTO TABLE categories

++
||
++
++



In [23]:
import sys.process._
val username = System.getProperty("user.name")
s"hdfs dfs -ls /user/itv002461/warehouse/${username}_retail.db/categories"!

Found 1 items
-rwxr-xr-x   3 itv002461 supergroup       1029 2022-06-03 02:12 /user/itv002461/warehouse/itv002461_retail.db/categories/part-00000


username = itv002461




0

In [24]:
%%sql

CREATE TABLE departments (
department_id INT,
department_name STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

++
||
++
++



In [25]:
%%sql

LOAD DATA LOCAL INPATH '/data/retail_db/departments' INTO TABLE departments

++
||
++
++



### Exercise 1 - Customer order count

Get order count per customer for the month of 2014 January.
* Tables - orders and customers
* Data should be sorted in descending order by count and ascending order by customer id.
* Output should contain customer_id, customer_first_name, customer_last_name and customer_order_count.

In [26]:
%%sql

SELECT c.customer_id,
c.customer_first_name,
c.customer_last_name,
count(o.order_id) AS customer_order_count
FROM orders o JOIN customers c
ON o.order_customer_id = c.customer_id
WHERE date_format(o.order_date, 'yyyy-MM') = '2014-01'
GROUP BY c.customer_id,c.customer_first_name,c.customer_last_name
ORDER BY customer_order_count DESC, c.customer_id
LIMIT 10

|       2555|               Mary|              Lon...


+-----------+-------------------+------------------+--------------------+
|customer_id|customer_first_name|customer_last_name|customer_order_count|
+-----------+-------------------+------------------+--------------------+
|       8622|            Shirley|             Smith|                   5|
|       9676|            Theresa|             Smith|                   5|
|          7|            Melissa|            Wilcox|                   4|
|        222|              Frank|              Ruiz|                   4|
|       2444|            Kenneth|             Smith|                   4|
|       2485|               Mary|         Hernandez|                   4|
|       2555|               Mary|              Long|                   4|
|       3128|              Karen|            Turner|                   4|
|       3199|             Ashley|         Hernandez|                   4|
|       3610|             Jordan|             Smith|                   4|
+-----------+-------------------+-----

### Exercise 2 - Dormant Customers

Get the customer details who have not placed any order for the month of 2014 January.
* Tables - orders and customers
* Data should be sorted in ascending order by customer_id
* Output should contain all the fields from customers

In [31]:
%%sql

SELECT count(*)
FROM customers c LEFT OUTER JOIN
(SELECT DISTINCT order_customer_id
FROM orders WHERE
date_format(order_date, 'yyyy-MM') = '2014-01'
) o
ON c.customer_id = o.order_customer_id
WHERE o.order_customer_id IS NULL
LIMIT 10

+--------+
|count(1)|
+--------+
|    7739|
+--------+



### Exercise 3 - Revenue Per Customer

Get the revenue generated by each customer for the month of 2014 January
* Tables - orders, order_items and customers
* Data should be sorted in descending order by revenue and then ascending order by customer_id
* Output should contain customer_id, customer_first_name, customer_last_name, customer_revenue.
* If there are no orders placed by customer, then the corresponding revenue for a give customer should be 0.
* Consider only COMPLETE and CLOSED orders

In [35]:
%%sql

SELECT c.customer_id, c.customer_first_name, c.customer_last_name,
CASE
WHEN (round(sum(oi.order_item_subtotal),2)) IS NULL THEN 0
ELSE (round(sum(oi.order_item_subtotal),2)) END AS customer_revenue
FROM customers c
LEFT OUTER JOIN orders o
ON c.customer_id = o.order_customer_id
LEFT OUTER JOIN order_items oi
ON o.order_id = oi.order_item_order_id
WHERE o.order_status IN ('COMPLETE', 'CLOSED')
AND date_format(o.order_date, 'yyyy-MM') = '2014-01'
GROUP BY c.customer_id, c.customer_first_name, c.customer_last_name
ORDER BY customer_revenue DESC, c.customer_id


+--------+
|count(1)|
+--------+
|    2300|
+--------+



### Exercise 4 - Revenue Per Category

Get the revenue generated for each category for the month of 2014 January
* Tables - orders, order_items, products and categories
* Data should be sorted in ascending order by category_id.
* Output should contain all the fields from category along with the revenue as category_revenue.
* Consider only COMPLETE and CLOSED orders

In [29]:
%%sql
SELECT c.*,
round(sum(oi.order_item_subtotal), 2) AS category_revenue
FROM categories c JOIN products p
ON c.category_id = p.product_category_id
JOIN order_items oi
ON p.product_id = oi.order_item_product_id
JOIN orders o
ON oi.order_item_order_id = o.order_id
WHERE date_format(o.order_date, 'yyyy-MM') = '2014-01'
AND o.order_status IN ('COMPLETE', 'CLOSED')
GROUP BY c.category_id,c.category_department_id,c.category_name
ORDER BY c.category_id
LIMIT 10

|          9|                     3|   Cardio Equi...


+-----------+----------------------+-------------------+----------------+
|category_id|category_department_id|      category_name|category_revenue|
+-----------+----------------------+-------------------+----------------+
|          2|                     2|             Soccer|         1094.88|
|          3|                     2|Baseball & Softball|         3214.41|
|          4|                     2|         Basketball|         1299.98|
|          5|                     2|           Lacrosse|         1299.69|
|          6|                     2|   Tennis & Racquet|         1124.75|
|          7|                     2|             Hockey|          1433.0|
|          9|                     3|   Cardio Equipment|       133156.77|
|         10|                     3|  Strength Training|         3388.96|
|         11|                     3|Fitness Accessories|         1509.73|
|         12|                     3|       Boxing & MMA|         3998.46|
+-----------+----------------------+--

### Exercise 5 - Product Count Per Department

Get the products for each department.
* Tables - departments, categories, products
* Data should be sorted in ascending order by department_id
* Output should contain all the fields from department and the product count as product_count

In [30]:
%%sql
SELECT d.*,
count(p.product_id) AS product_count
FROM departments d
JOIN categories c
ON d.department_id = c.category_department_id
JOIN products p
ON c.category_id = p.product_category_id
GROUP BY d.department_id,d.department_name
ORDER BY d.department_id

+-------------+---------------+-------------+
|department_id|department_name|product_count|
+-------------+---------------+-------------+
|            2|        Fitness|          168|
|            3|       Footwear|          168|
|            4|        Apparel|          140|
|            5|           Golf|          120|
|            6|       Outdoors|          336|
|            7|       Fan Shop|          149|
+-------------+---------------+-------------+

