In [2]:
%load_ext sql
DB_ENDPOINT = '127.0.0.1'
DB = 'sakila'
DB_USER = 'postgres'
DB_PASSWORD = '1234'
DB_PORT = '5432'

conn_string = 'postgresql://{}:{}@{}:{}/{}'\
                .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)
print(conn_string)
%sql $conn_string

The sql extension is already loaded. To reload it, use:
  %reload_ext sql
postgresql://postgres:1234@127.0.0.1:5432/sakila


### Grouping Sets
It happens often that for 3 dimensions, you want to aggregate a fact:

- by nothing (total)
- then by the 1st dimension
- then by the 2nd
- then by the 3rd
- then by the 1st and 2nd
- then by the 2nd and 3rd
- then by the 1st and 3rd
- then by the 1st and 2nd and 3rd

Since this is very common, and in all cases, we are iterating through all the fact table anyhow, there is a more clever way to do that using the SQL grouping statement "GROUPING SETS"

### Total Revenue
Write a query that calculates total revenue (sales_amount)

In [3]:
%%sql
SELECT sum(sales_amount) as revenue
FROM factSales

 * postgresql://postgres:***@127.0.0.1:5432/sakila
1 rows affected.


revenue
134833.02


### Revenue by Country

Write a query that calculates total revenue (sales_amount) by country

In [4]:
%%sql
SELECT dimStore.country, sum(sales_amount) as revenue
FROM factSales
JOIN dimStore ON (dimStore.store_key = factSales.store_key)
GROUP BY dimStore.country
ORDER BY dimStore.country, revenue DESC

 * postgresql://postgres:***@127.0.0.1:5432/sakila
2 rows affected.


country,revenue
Australia,67453.54
Canada,67379.48


### Revenue by Month
Write a query that calculates total revenue (sales_amount) by month

In [5]:
%%sql
SELECT dimDate.month, sum(sales_amount) as revenue
FROM factSales
JOIN dimDate ON (dimDate.date_key = factSales.date_key)
GROUP BY dimDate.month
ORDER BY dimDate.month, revenue DESC

 * postgresql://postgres:***@127.0.0.1:5432/sakila
5 rows affected.


month,revenue
2,1028.36
5,9648.86
6,19263.76
7,56747.78
8,48144.26


### Revenue by Month & Country
Write a query that calculates total revenue (sales_amount) by month and country. Sort the data by month, country, and revenue in descending order. The first few rows of your output should match the table below.

In [6]:
%%sql
SELECT dimDate.month, dimStore.country, sum(sales_amount) as revenue
FROM factSales
JOIN dimDate ON (dimDate.date_key = factSales.date_key)
JOIN dimStore ON (dimStore.store_key = factSales.store_key)
GROUP BY dimDate.month, dimStore.country
ORDER BY dimDate.month, dimStore.country, revenue DESC
LIMIT 5

 * postgresql://postgres:***@127.0.0.1:5432/sakila
5 rows affected.


month,country,revenue
2,Australia,542.16
2,Canada,486.2
5,Australia,4728.38
5,Canada,4920.48
6,Australia,9790.2


Revenue Total, by Month, by Country, by Month & Country All in one shot
TODO: Write a query that calculates total revenue at the various grouping levels done above (total, by month, by country, by month & country) all at once using the grouping sets function. Your output should match the table below.

In [7]:
%%sql
SELECT dimDate.month, dimStore.country, sum(sales_amount) as revenue
FROM factSales
JOIN dimDate ON (dimDate.date_key = factSales.date_key)
JOIN dimStore ON (dimStore.store_key = factSales.store_key)
GROUP BY GROUPING SETS ((), dimDate.month, dimStore.country, (dimDate.month, dimStore.country))
LIMIT 5

 * postgresql://postgres:***@127.0.0.1:5432/sakila
5 rows affected.


month,country,revenue
,,134833.02
5.0,Canada,4920.48
7.0,Australia,28120.5
2.0,Australia,542.16
7.0,Canada,28627.28


### CUBE
- Group by CUBE (dim1, dim2, ..) , produces all combinations of different lenghts in one go.
- This view could be materialized in a view and queried which would save lots repetitive aggregations

Write a query that calculates the various levels of aggregation done in the grouping sets exercise (total, by month, by country, by month & country) using the CUBE function. Your output should match the table below.

In [8]:
%%sql
SELECT dimDate.month, dimStore.country, sum(sales_amount) as revenue
FROM factSales
JOIN dimDate ON (dimDate.date_key = factSales.date_key)
JOIN dimStore ON (dimStore.store_key = factSales.store_key)
GROUP BY CUBE (dimDate.month, dimStore.country)
LIMIT 5

 * postgresql://postgres:***@127.0.0.1:5432/sakila
5 rows affected.


month,country,revenue
,,134833.02
5.0,Canada,4920.48
7.0,Australia,28120.5
2.0,Australia,542.16
7.0,Canada,28627.28


### Revenue Total, by Month, by Country, by Month & Country All in one shot, NAIVE way
The naive way to create the same table as above is to write several queries and UNION them together. Grouping sets and cubes produce queries that are shorter to write, easier to read, and more performant. Run the naive query below and compare the time it takes to run to the time it takes the cube query to run.

In [9]:
%%sql
SELECT  NULL as month, NULL as country, sum(sales_amount) as revenue
FROM factSales
    UNION all 
SELECT NULL, dimStore.country,sum(sales_amount) as revenue
FROM factSales
JOIN dimStore on (dimStore.store_key = factSales.store_key)
GROUP by  dimStore.country
    UNION all 
SELECT cast(dimDate.month as text) , NULL, sum(sales_amount) as revenue
FROM factSales
JOIN dimDate on (dimDate.date_key = factSales.date_key)
GROUP by dimDate.month
    UNION all
SELECT cast(dimDate.month as text),dimStore.country,sum(sales_amount) as revenue
FROM factSales
JOIN dimDate     on (dimDate.date_key         = factSales.date_key)
JOIN dimStore on (dimStore.store_key = factSales.store_key)
GROUP by (dimDate.month, dimStore.country)

 * postgresql://postgres:***@127.0.0.1:5432/sakila
18 rows affected.


month,country,revenue
,,134833.02
5.0,,9648.86
2.0,,1028.36
8.0,,48144.26
7.0,,56747.78
6.0,,19263.76
,Canada,67379.48
,Australia,67453.54
5.0,Canada,4920.48
7.0,Australia,28120.5
