# Setup

Please make sure to setup the DuckDB database and tables using the below command. It creates the tables and loads in data and can take a few minutes to complete.

**NOTE**: If you have run `setup.py` already you can skip this step.


In [None]:
! pwd

In [None]:
! python ../../setup.py

Ensure that you are able to run SQL queries

In [1]:
# Load the extension
%load_ext sql

In [2]:
# Connect to DuckDB
%sql duckdb:///../../tpch.db

In [3]:
%config SqlMagic.displaylimit = None

In [4]:
%%sql
-- Run a simple show tables
SELECT
  table_name
FROM
  information_schema.tables
WHERE
  table_schema = 'main'

table_name
customer
lineitem
nation
orders
part
partsupp
region
supplier


## Data Model

Through out this workshop we will use the TPCH data to run queries.

The TPC-H data is usually used to benchmark database performance. The TPC-H data represents a car parts seller’s data warehouse, where we record orders, items that make up that order (lineitem), supplier, customer, part (parts sold), region, nation, and partsupp (parts supplier). 

**Note**: Have a copy of the data model(table schemas and how they relate to each other) as you follow along; this will help in understanding the examples provided and in answering exercise questions.


![](../../images/tpch_erd.png)


# CTE (Common Table Expression) can improve readability and reduce code repetition

## [WHY] CTEs make testing complex queries simpler

* A CTE is a select statement that can be reused in a single query. 

* Complex SQL queries often involve multiple sub-queries. Multiple sub-queries make the code hard to read.

* Use a Common Table Expression (CTE) to make your queries readable


## [HOW] to define a CTE


In [6]:
%%sql
-- CTE definition
WITH
  supplier_nation_metrics AS ( -- CTE 1 defined using WITH keyword
    SELECT
      n.n_nationkey,
      SUM(l.l_QUANTITY) AS num_supplied_parts
    FROM
      lineitem l
      JOIN supplier s ON l.l_suppkey = s.s_suppkey
      JOIN nation n ON s.s_nationkey = n.n_nationkey
    GROUP BY
      n.n_nationkey
  ),
  buyer_nation_metrics AS ( -- CTE 2 defined just as a name
    SELECT
      n.n_nationkey,
      SUM(l.l_QUANTITY) AS num_purchased_parts
    FROM
      lineitem l
      JOIN orders o ON l.l_orderkey = o.o_orderkey
      JOIN customer c ON o.o_custkey = c.c_custkey
      JOIN nation n ON c.c_nationkey = n.n_nationkey
    GROUP BY
      n.n_nationkey
  )
SELECT -- The final select will not have a comma before it
  n.n_name AS nation_name,
  s.num_supplied_parts,
  b.num_purchased_parts
FROM
  nation n
  LEFT JOIN supplier_nation_metrics s ON n.n_nationkey = s.n_nationkey
  LEFT JOIN buyer_nation_metrics b ON n.n_nationkey = b.n_nationkey
LIMIT 10;

nation_name,num_supplied_parts,num_purchased_parts
ALGERIA,6454691.0,6117618.0
ARGENTINA,6339724.0,6087566.0
BRAZIL,6085551.0,6149174.0
CANADA,6296547.0,6168913.0
EGYPT,6385468.0,6024134.0
ETHIOPIA,5817697.0,6095241.0
FRANCE,6141618.0,6289987.0
GERMANY,6076474.0,6098776.0
INDIA,6347392.0,6102406.0
INDONESIA,6204759.0,6276420.0
