# Training: SQL (Medium-users)
Welcome to the training notebook on using SQL.

This notebook is pitched at medium-users who perform more advanced querying operations to retrieve heavily-wrangled data from SQL.

# What will this session cover?
This session will show you how to do the following things in SQL:

1. Data matrix/tidy data principles
1. Advanced filtering of data
1. Differences Tables and Views
1. Three types of temporary tables
     - Local temporary table
     - Common-table-expression
     - Global temporary table
1. Subquering the data
1. Ordering records according to their groups by a counter
1. Pivoting data from long to wide shape
1. Unpivoting data from wide to long shape


In [0]:
-- Set database to use
USE [HEFE-AN-DEV];

# 1. Advanced filtering of data


# 1. Data matrix/tidy data principles
It is best practice from an analyst's perspective for tables to be formatted in data matrix/tidy data format. For a table to be formatted in this way, it must adhere to two things:
- Each variable is a column
- Each observation is a row

It is best practice in the way that it standardises the way data is organised so the data cleaning process is easier and faster.

Whereas for messy, datasets, you can think of them like this:
> *Happy families are all alike; every unhappy family is unhappy in its own way* - Leo Tolstoy

From this persepctive, you can imagine that a messy dataset requires some initial upfront cost to understand how it is structured before you can clean it.

For a more thorough and example-laden discussion of tidy data principles, see this paper [here](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html)

In [4]:
-- create first table in messy format!
WITH table_messy_a AS 
(
    SELECT * 
    FROM 
    (
        VALUES
            ('Jane Smith', NULL, 18)
            ,('Xi Tang', 4, 1)
            ,('Park Min Woo', 6, 6)
    ) AS table_sub ([PersonName], [Treatment_a], [Treatment_b])
)
SELECT * 
FROM table_messy_a;

-- create second table in messy format!
WITH table_messy_b AS
(
    SELECT *
    FROM 
    (
        VALUES
            ('a', NULL, 4, 6)
            ,('b', 18, 1, 6)
    ) AS table_sub ([Treatment], [JaneSmith], [XiTang], [ParkMinWoo])
)
SELECT * 
FROM table_messy_b;

In [5]:
-- create above table in tidy format
WITH table_tidy AS
(
    SELECT *
    FROM
    (
        VALUES
            ('Jane Smith', 'a', NULL)
            ,('Jane SMith', 'b', 18)
            ,('Xi Tang', 'a', 4)
            ,('Xi Tang', 'b', 1)
            ,('Park Min Woo', 'a', 6)
            ,('Park Min Woo', 'b', 6)
    ) AS table_sub ([PersonName], [TreatmentType], [TreatmentValue])
)
SELECT *
FROM table_tidy;

PersonName,TreatmentType,TreatmentValue
Jane Smith,a,
Jane SMith,b,18.0
Xi Tang,a,4.0
Xi Tang,b,1.0
Park Min Woo,a,6.0
Park Min Woo,b,6.0


## EXERCISE: Tidy data principles
**QUESTION:** Is the [Sales].[SpecialOffer] table in a tidy data format? If it is not in tidy data format, how can you manipulate the dataset so that it is? Please write your answer below.

In [0]:
-- Please write your answer below