## Joining in Snowflake

In [None]:
JOINS 
INNER JOIN 

OUTER JOINS
LEFT OUTER JOIN or LEFT JOIN
RIGHT OUTER JOIN or RIGHT JOIN
FULL OUTER JOIN or FULL JOIN

CROSS JOINS
SELF JOINS
NATURAL JOIN
LATERAL JOIN

#### NATURAL JOIN
NATURAL JOIN automatically match columns and eliminate duplicated ones
Syntax:

In [None]:
Without NATURAL JOIN 
SELECT * FROM pizzas AS p 
JOIN  pizza_type AS t 
ON t.pizza_type_id = p.pizza_type_id

With NATURAL JOIN 
SELECT * FROM pizzas AS p 
NATURAL JOIN pizza_type AS t

In [None]:
SELECT * FROM pizzas AS p 
NATURAL JOIN pizza_type AS t
WHERE pizza_type_id ='bbq_ckn'

#### LATERAL JOIN 
LATERAL JOIN: lets a subquery in FROM reference columns from preceding tables or views.

In [None]:
SELECT ... 
FROM <left_hand_expression> , -- 
LATERAL (<right_hand_expression>) 

left_hand_expression - Table, view, or subquery
right_hand_expression - Inline view or subquery

In [None]:
SELECT p.pizza_id, lat.name, lat.category 
FROM pizzas AS p, 
LATERAL -- Keyword LATERAL
( SELECT * FROM pizza_type AS t 
-- Referencing outer query column: p.pizza_type_id
WHERE p.pizza_type_id = t.pizza_type_id 
) AS lat

#### Why LATERAL JOIN?

In [None]:
SELECT * 
FROM orders AS o, 
LATERAL (-- Subquery calculating total_spent 
         SELECT 
             SUM(p.price * od.quantity) AS total_spent
         FROM order_details AS od 
         JOIN pizzas AS p 
             ON od.pizza_id = p.pizza_id
         WHERE o.order_id = od.order_id
) AS t
ORDER BY o.order_id

## NATURAL JOIN
Pissa, the ever-expanding pizza delivery enterprise, has a new challenge for you. They're interested in discovering which type of pizza generates the most revenue.

Here is the pizza schema for reference:

In [None]:
SELECT
	-- Get the pizza category
    category,
    SUM(p.price * od.quantity) AS total_revenue
FROM order_details AS od
NATURAL JOIN pizzas AS p
-- NATURAL JOIN the pizza_type table
NATURAL JOIN pizza_type AS pt
-- GROUP the records by category
GROUP BY category
-- ORDER by total_revenue and limit the records
ORDER BY total_revenue DESC
LIMIT 1;

## The world of JOINS
Previously, you generated insights for Pissa around their sales and revenue by month. Now, you'll look into their most popular pizzas.

Apply your knowledge of joins to get the desired result.

In [None]:
SELECT COUNT(o.order_id) AS total_orders,
        AVG(p.price) AS average_price,
        -- Calculate total revenue
        SUM(p.price * od.quantity) AS total_revenue	
FROM orders AS o
LEFT JOIN order_details AS od
ON o.order_id = od.order_id
-- Use an appropriate JOIN with the pizzas table
RIGHT JOIN pizzas AS p
ON od.pizza_id = p.pizza_id

In [None]:
SELECT COUNT(o.order_id) AS total_orders,
        AVG(p.price) AS average_price,
        -- Calculate total revenue
        SUM(p.price * od.quantity) AS total_revenue,
        -- Get the name from the pizza_type table
		pt.name AS pizza_name
FROM orders AS o
LEFT JOIN order_details AS od
ON o.order_id = od.order_id
-- Use an appropriate JOIN with the pizzas table
RIGHT JOIN pizzas p
ON od.pizza_id = p.pizza_id
-- NATURAL JOIN the pizza_type table
NATURAL JOIN pizza_type AS pt
GROUP BY pt.name, pt.category
ORDER BY total_revenue desc, total_orders desc

## Subquerying and Common Table Expressions

In [None]:
Subquerying 
Nested queries 
Used in FROM, WHERE, HAVING or SELECT clauses

SELECT column1 
FROM table1 
WHERE column1 = (SELECT column2 FROM table2 WHERE condition)

In [None]:
Uncorrelated subquery
-- Main query returns pizzas priced at the maximum value found in the subquery

SELECT pizza_id 
FROM pizzas
-- Uncorrelated subquery that identifies the highest pizza price
WHERE price = (SELECT MAX(price) FROM pizzas)

In [None]:
Correlated subquery 
Subquery references columns from the main query

SELECT pt.name, pz.price, pt.category
FROM pizzas AS pz 
JOIN pizza_type AS pt ON pz.pizza_type_id = pt.pizza_type_id
WHERE pz.price = (
    -- Identifies max price for each pizza category 
    SELECT MAX(p2.price) -- Max price 
    FROM pizzas AS p2 
    WHERE-- Correlated: uses outer query column   
p2.pizza_type_id = pz.pizza_type_id
)

In [None]:
Common Table Expressions

WITH max_price AS ( -- CTE called max_price 
                    SELECT pizza_type_id, MAX(price) AS max_price
FROM pizzas 
GROUP BY pizza_type_id
)
-- Main query 
SELECT pt.name, pz.price, pt.category
FROM pizzas AS pz
JOIN pizza_type AS pt ON pz.pizza_type_id = pt.pizza_type_id
JOIN max_price AS mp  -- Joining with CTE max_price
ON pt.pizza_type_id = mp.pizza_type_id
WHERE pz.price < mp.max_price -- Compare the price with max_price CTE column


### Subqueries
Pissa, the expanding pizza delivery enterprise, is now using your expertise to identify some trends.

They want to streamline its pizza offerings by identifying under-performing pizzas. Your task is to find the pizza types ordered less frequently than the average for all types.

In [None]:
SELECT pt.name,
    pt.category,
    SUM(od.quantity) AS total_orders
FROM pizza_type pt
JOIN pizzas p
    ON pt.pizza_type_id = p.pizza_type_id
JOIN order_details od
    ON p.pizza_id = od.pizza_id
GROUP BY ALL
HAVING SUM(od.quantity) < (
  -- Calculate AVG of total_quantity
  SELECT AVG(total_quantity)
  FROM (
    -- Calculate total_quantity
    SELECT SUM(od.quantity) AS total_quantity
    FROM pizzas p
    JOIN order_details od 
    	ON p.pizza_id = od.pizza_id
    GROUP BY p.pizza_id
    -- Alias as subquery
  ) AS subquery
)

### Common Table Expressions
Pissa, the company you're consulting for, is planning a promotional campaign and needs your expertise.

The campaign aims to spotlight their most popular pizza based on total orders.

They're also considering introducing a value meal featuring their least expensive pizza.

Your task as a consulting data engineer is to identify both these pizzas.

In [None]:
-- Create a CTE named most_ordered and limit the results 
WITH most_ordered AS (
    SELECT pizza_id, SUM(quantity) AS total_qty 
    FROM order_details GROUP BY pizza_id ORDER BY total_qty DESC
    LIMIT 1
)
-- Create CTE cheapest_pizza where price is equal to min price from pizzas table
, cheapest_pizza AS (
    SELECT pizza_id, price
    FROM pizzas 
    WHERE price = (SELECT MIN(price) FROM pizzas)
    LIMIT 1
)

SELECT pizza_id, 'Most Ordered' AS Description, total_qty AS metric
-- Select from the most_ordered CTE
FROM most_ordered
UNION ALL
SELECT pizza_id, 'Cheapest' AS Description, price AS metric
-- Select from the cheapest_pizza CTE
FROM cheapest_pizza

## Snowflake Query Optimization

In [None]:
Common query problems
UNION or UNION ALL: Know the difference 
UNION removes duplicates, slows down the query
UNION ALL is faster if no duplicates 

Handling big data 
Use filters to narrow down data
Apply limits for quicker results


In [None]:
How to optimize queries? 
SELECT * 
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS
LIMIT 10

SELECT o_orderdate, o_orderstatus 
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS

In [None]:
With early filtering

WITH filtered_orders AS (
    SELECT * FROM orders
        WHERE order_date ='2015-01-01'-- Filtering in CTE before JOIN
              )
    SELECT filtered_orders.order_id, filtered_orders.order_date, pizza_type.name, pizzas.pizza_size
    FROM filtered_orders -- Joining with CTE 
JOIN order_details ON filtered_orders.order_id = order_details.order_id
JOIN pizzas 
ON order_details.pizza_id = pizzas.pizza_id

In [None]:
Query History 
snowflake.account_usage.query_history
Query History provides different metrics such as execution time
ILIKE: Case-insensitive string-matching

SELECT query_text, start_time, end_time, execution_time
FROM snowflake.account_usage.query_history
WHERE query_text ILIKE '%order_details%'

SELECT query_text,
start_time,
end_time,
execution_time
FROM snowflake.account_usage.query_history
WHERE execution_time > 1000

### Early filtering
Pissa has now asked for your expertise to optimize the performance of their database queries. They suspect that their existing queries are not efficient enough and take too long to run.

The goal is to retrieve the orders made after November 01, 2015, and only the pizzas in the 'Veggie' category.

Complete the given SQL query by implementing early filtering techniques.

In [None]:
WITH filtered_orders AS (
  SELECT order_id, order_date 
  FROM orders 
  -- Filter records where order_date is greater than November 1, 2015
  WHERE order_date > '2015-11-01'
)

, filtered_pizza_type AS (
  SELECT name, pizza_type_id 
  FROM pizza_type 
  -- Filter the pizzas which are in the Veggie category
  WHERE category = 'Veggie'
)

SELECT fo.order_id, fo.order_date, fpt.name, od.quantity
-- Get the details from filtered_orders CTE
FROM filtered_orders AS fo
JOIN order_details AS od ON fo.order_id = od.order_id
JOIN pizzas AS p ON od.pizza_id = p.pizza_id
-- JOIN the filtered_pizza_type CTE on pizza_type_id
JOIN filtered_pizza_type AS fpt ON p.pizza_type_id = fpt.pizza_type_id

## Handling semi-structured data

In [None]:
Introducing JSON 
JavaScript Object Notation
Common use cases: Web APIs and Config files
JSON data structure:
Key-Value Pairs, e.g., cust_id: 1

In [None]:
Comparisons:
Postgres: Uses JSONB
Snowflake: Uses VARIANT

In [None]:
How Snowflake stores JSON data 
VARIANT supports OBJECT and ARRAY data types
OBJECT: { "key": "value"}
ARRAY: ["list", "of", "values"]

Creating a Snowflake Table to handle JSON data
CREATE TABLE cust_info_json_data (
    customer_id INT,
    customer_info VARIANT -- VARIANT data type
);

In [None]:
Semi-structured data functions 
PARSE_JSON 
expr: JSON data in string format
Returns: VARIANT type, valid JSON object

SELECT PARSE_JSON(-- Enclosed in strings
                  '{"cust_id": 1,
                    "cust_name": "cust1", 
                    "cust_age": 40, 
                    "cust_email":"cust1***@gmail.com"  
}' 
-- Enclosed in strings
) AS customer_info_json

In [None]:
OBJECT_CONSTRUCT 
Syntax: OBJECT_CONSTRUCT( [<key1>, <value1> [, <keyN>, <valueN> ...]] )
Returns: JSON object

SELECT OBJECT_CONSTRUCT(-- Comma separated values rather than : notation
'cust_id', 1,
'cust_name', 'cust1',
'cust_age', 40,
'cust_email', 'cust1***@gmail.com'  
)

In [None]:
Querying JSON data in Snowflake 
Simple JSON 
:

SELECT  
customer_info:cust_age, -- Use colon to access cust_age from column 
customer_info:cust_name, 
customer_info:cust_email,
FROM  cust_info_json_data;

In [None]:
Querying nested JSON using colon/dot notations
Accessing values using colon notation
<column>:<level1_element>:<level2_element>:<level3_element> 

SELECT
    customer_info:address:street AS street_name 
FROM cust_info_json_data


Accessing values using dot notation
<column>:<level1_element>.<level2_element>.<level3_element>

SELECT
    customer_info.address.street AS street_name 
FROM cust_info_json_data

### Querying JSON data
Yelpto, a leading platform for discovering local businesses, seeks your expertise as a consulting Data Engineer.

They aim to explore the restaurant industry, focusing on popular 5-star-rated restaurants that are open on weekends in Philadelphia.

You'll work with the yelp_business_data table, particularly the name, categories, attributes, and hours columns.

You can explore the yelp_business_data table in the SQL console.

In [None]:
SELECT name,
    review_count,
    -- Retrieve the Saturday hours
    hours:Saturday,
    -- Retrieve the Sunday hours
    hours:Sunday
FROM yelp_business_data
-- Filter for Restaurants
WHERE categories ILIKE '%Restaurant%'
    AND (hours:Saturday IS NOT NULL AND hours:Sunday IS NOT NULL)
    AND city = 'Philadelphia'
    AND stars = 5
ORDER BY review_count DESC

### JSONified
Semi-structured data can be challenging to query, so let's practice interacting with Yelpto's data once again.

Here, you will filter the contents of a VARIANT column.

In [None]:
SELECT business_id, name
FROM yelp_business_data
WHERE categories ILIKE '%Restaurant%'
	-- Filter where DogsAllowed is '%True%'
	AND attributes:DogsAllowed ILIKE '%True%'
    -- Filter where BusinessAcceptsCreditCards is '%True%'
    AND attributes:BusinessAcceptsCreditCards ILIKE '%True%'
    AND city ILIKE '%Philadelphia%'
    AND stars = 5