# Exercise 02 -  OLAP Cubes - Grouping Sets

All the databases table in this demo are based on public database samples and transformations
- `Sakila` is a sample database created by `MySql` [Link](https://dev.mysql.com/doc/sakila/en/sakila-structure.html)
- The postgresql version of it is called `Pagila` [Link](https://github.com/devrimgunduz/pagila)
- The facts and dimension tables design is based on O'Reilly's public dimensional modelling tutorial schema [Link](http://archive.oreilly.com/oreillyschool/courses/dba3/index.html)

Start by connecting to the database by running the cells below. If you are coming back to this exercise, then uncomment and run the first cell to recreate the database. If you recently completed the slicing and dicing exercise, then skip to the second cell.

In [1]:
import os
import sql

In [None]:
# !PGPASSWORD=student createdb -h 127.0.0.1 -U student pagila_star
# !PGPASSWORD=student psql -q -h 127.0.0.1 -U student -d pagila_star -f Data/pagila-star.sql

### Connect to the local database where Pagila is loaded

In [3]:
import sql
%load_ext sql

host = os.environ["PGHOST"]
dbname = 'pagila'
user = os.environ["PGUSER"]
password = os.environ["PGPASSWORD"]
port = os.environ["PGPORT"]

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(user, password, host, port, dbname)

# print(conn_string)
%sql $conn_string

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: postgres@pagila'

### Star Schema

<img src="pagila-star.png" width="50%"/>

# Grouping Sets
- It happens often that for 3 dimensions, you want to aggregate a fact:
    - by nothing (total)
    - then by the 1st dimension
    - then by the 2nd 
    - then by the 3rd 
    - then by the 1st and 2nd
    - then by the 2nd and 3rd
    - then by the 1st and 3rd
    - then by the 1st and 2nd and 3rd
    
- Since this is very common, and in all cases, we are iterating through all the fact table anyhow, there is a more clever way to do that using the SQL grouping statement "GROUPING SETS" 

## Total Revenue

TODO: Write a query that calculates total revenue (sales_amount)

In [7]:
%%sql

SELECT
    SUM(sales_amount) AS total_revenue
FROM
    star.fact_sales

 * postgresql://postgres:***@localhost:5432/pagila
1 rows affected.


total_revenue
67416.50999999193


## Revenue by Country
TODO: Write a query that calculates total revenue (sales_amount) by country

In [12]:
%%sql

SELECT
     dim_customer.country
    ,SUM(fact_sales.sales_amount) AS total_revenue
FROM
    star.fact_sales
INNER JOIN
    star.dim_customer
ON
    fact_sales.customer_key = dim_customer.customer_key
GROUP BY
    dim_customer.country
ORDER BY
    total_revenue DESC
LIMIT
    5

 * postgresql://postgres:***@localhost:5432/pagila
5 rows affected.


country,total_revenue
India,6630.2699999997685
China,5802.729999999799
United States,4110.319999999901
Japan,3471.73999999993
Mexico,3307.03999999994


## Revenue by Month
TODO: Write a query that calculates total revenue (sales_amount) by month

In [13]:
%%sql

SELECT
     dim_date.month
    ,SUM(fact_sales.sales_amount) AS total_revenue
FROM
    star.fact_sales
INNER JOIN
    star.dim_date
ON
    fact_sales.date_key = dim_date.date_key
GROUP BY
    dim_date.month
ORDER BY
    total_revenue DESC
LIMIT
    5

 * postgresql://postgres:***@localhost:5432/pagila
5 rows affected.


month,total_revenue
4,28559.460000003823
3,23886.56000000212
2,9631.879999999612
1,4824.4299999998575
5,514.180000000001


## Revenue by Month & Country
TODO: Write a query that calculates total revenue (sales_amount) by month and country. Sort the data by month, country, and revenue in descending order. The first few rows of your output should match the table below.

In [20]:
%%sql

SELECT
     dim_date.month
    ,dim_store.country
    ,SUM(fact_sales.sales_amount) AS total_revenue
FROM
    star.fact_sales
INNER JOIN
    star.dim_date
ON
    fact_sales.date_key = dim_date.date_key
INNER JOIN
    star.dim_store
ON
    fact_sales.store_key = dim_store.store_key
    
GROUP BY
     dim_date.month
    ,dim_store.country
    
ORDER BY
     dim_date.month
    ,dim_store.country
    ,total_revenue DESC
    
LIMIT
    5

 * postgresql://postgres:***@localhost:5432/pagila
5 rows affected.


month,country,total_revenue
1,Australia,2364.189999999988
1,Canada,2460.239999999982
2,Australia,4895.099999999856
2,Canada,4736.77999999987
3,Australia,12060.329999999483


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>month</th>
        <th>country</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>1</td>
        <td>Australia</td>
        <td>2364.19</td>
    </tr>
    <tr>
        <td>1</td>
        <td>Canada</td>
        <td>2460.24</td>
    </tr>
    <tr>
        <td>2</td>
        <td>Australia</td>
        <td>4895.10</td>
    </tr>
    <tr>
        <td>2</td>
        <td>Canada</td>
        <td>4736.78</td>
    </tr>
    <tr>
        <td>3</td>
        <td>Australia</td>
        <td>12060.33</td>
    </tr>
</tbody></table></div>

## Revenue Total, by Month, by Country, by Month & Country All in one shot

TODO: Write a query that calculates total revenue at the various grouping levels done above (total, by month, by country, by month & country) all at once using the grouping sets function. Your output should match the table below.

In [26]:
%%sql

SELECT
     dim_date.month
    ,dim_store.country
    ,SUM(fact_sales.sales_amount) AS total_revenue
FROM
    star.fact_sales
INNER JOIN
    star.dim_date
ON
    fact_sales.date_key = dim_date.date_key
INNER JOIN
    star.dim_store
ON
    fact_sales.store_key = dim_store.store_key
    
GROUP BY GROUPING SETS (
     ()
    ,dim_date.month
    ,dim_store.country
    ,(dim_date.month, dim_store.country)
)

ORDER BY
     dim_date.month
    ,dim_store.country
    ,total_revenue DESC


 * postgresql://postgres:***@localhost:5432/pagila
18 rows affected.


month,country,total_revenue
1.0,Australia,2364.189999999988
1.0,Canada,2460.239999999982
1.0,,4824.4299999998575
2.0,Australia,4895.099999999856
2.0,Canada,4736.77999999987
2.0,,9631.879999999612
3.0,Australia,12060.329999999483
3.0,Canada,11826.229999999505
3.0,,23886.56000000212
4.0,Australia,14136.06999999937


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>month</th>
        <th>country</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>1</td>
        <td>Australia</td>
        <td>2364.19</td>
    </tr>
    <tr>
        <td>1</td>
        <td>Canada</td>
        <td>2460.24</td>
    </tr>
    <tr>
        <td>1</td>
        <td>None</td>
        <td>4824.43</td>
    </tr>
    <tr>
        <td>2</td>
        <td>Australia</td>
        <td>4895.10</td>
    </tr>
    <tr>
        <td>2</td>
        <td>Canada</td>
        <td>4736.78</td>
    </tr>
    <tr>
        <td>2</td>
        <td>None</td>
        <td>9631.88</td>
    </tr>
    <tr>
        <td>3</td>
        <td>Australia</td>
        <td>12060.33</td>
    </tr>
    <tr>
        <td>3</td>
        <td>Canada</td>
        <td>11826.23</td>
    </tr>
    <tr>
        <td>3</td>
        <td>None</td>
        <td>23886.56</td>
    </tr>
    <tr>
        <td>4</td>
        <td>Australia</td>
        <td>14136.07</td>
    </tr>
    <tr>
        <td>4</td>
        <td>Canada</td>
        <td>14423.39</td>
    </tr>
    <tr>
        <td>4</td>
        <td>None</td>
        <td>28559.46</td>
    </tr>
    <tr>
        <td>5</td>
        <td>Australia</td>
        <td>271.08</td>
    </tr>
    <tr>
        <td>5</td>
        <td>Canada</td>
        <td>243.10</td>
    </tr>
    <tr>
        <td>5</td>
        <td>None</td>
        <td>514.18</td>
    </tr>
    <tr>
        <td>None</td>
        <td>None</td>
        <td>67416.51</td>
    </tr>
    <tr>
        <td>None</td>
        <td>Australia</td>
        <td>33726.77</td>
    </tr>
    <tr>
        <td>None</td>
        <td>Canada</td>
        <td>33689.74</td>
    </tr>
</tbody></table></div>