# Exercise 02 -  OLAP Cubes - CUBE

All the databases table in this demo are based on public database samples and transformations
- `Sakila` is a sample database created by `MySql` [Link](https://dev.mysql.com/doc/sakila/en/sakila-structure.html)
- The postgresql version of it is called `Pagila` [Link](https://github.com/devrimgunduz/pagila)
- The facts and dimension tables design is based on O'Reilly's public dimensional modelling tutorial schema [Link](http://archive.oreilly.com/oreillyschool/courses/dba3/index.html)

Start by connecting to the database by running the cells below. If you are coming back to this exercise, then uncomment and run the first cell to recreate the database. If you recently completed the slicing and dicing exercise, then skip to the second cell.

In [1]:
import os
import sql

In [None]:
# !PGPASSWORD=student createdb -h 127.0.0.1 -U student pagila_star
# !PGPASSWORD=student psql -q -h 127.0.0.1 -U student -d pagila_star -f Data/pagila-star.sql

### Connect to the local database where Pagila is loaded

In [2]:
%load_ext sql

host = os.environ["PGHOST"]
dbname = 'pagila'
user = os.environ["PGUSER"]
password = os.environ["PGPASSWORD"]
port = os.environ["PGPORT"]


# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(user, password, host, port, dbname)

%sql $conn_string

'Connected: postgres@pagila'

### Star Schema

<img src="pagila-star.png" width="50%"/>

# CUBE 
- Group by CUBE (dim1, dim2, ..) , produces all combinations of different lenghts in one go.
- This view could be materialized in a view and queried which would save lots repetitive aggregations

TODO: Write a query that calculates the various levels of aggregation done in the grouping sets exercise (total, by month, by country, by month & country) using the CUBE function. Your output should match the table below.


In [4]:
%%time
%%sql

SELECT
     dim_date.month
    ,dim_store.country
    ,SUM(sales_amount) AS total_revenue
FROM
    star.fact_sales
INNER JOIN
    star.dim_store
ON
    dim_store.store_key = fact_sales.store_key
INNER JOIN
    star.dim_date
ON
    dim_date.date_key = fact_sales.date_key
GROUP BY CUBE (
     dim_date.month
    ,dim_store.country
)
ORDER BY
     dim_date.month
    ,dim_store.country


 * postgresql://postgres:***@localhost:5432/pagila
18 rows affected.
Wall time: 35 ms


month,country,total_revenue
1.0,Australia,2364.189999999988
1.0,Canada,2460.239999999982
1.0,,4824.4299999998575
2.0,Australia,4895.099999999856
2.0,Canada,4736.77999999987
2.0,,9631.879999999612
3.0,Australia,12060.329999999483
3.0,Canada,11826.229999999505
3.0,,23886.56000000212
4.0,Australia,14136.06999999937


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>month</th>
        <th>country</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>1</td>
        <td>Australia</td>
        <td>2364.19</td>
    </tr>
    <tr>
        <td>1</td>
        <td>Canada</td>
        <td>2460.24</td>
    </tr>
    <tr>
        <td>1</td>
        <td>None</td>
        <td>4824.43</td>
    </tr>
    <tr>
        <td>2</td>
        <td>Australia</td>
        <td>4895.10</td>
    </tr>
    <tr>
        <td>2</td>
        <td>Canada</td>
        <td>4736.78</td>
    </tr>
    <tr>
        <td>2</td>
        <td>None</td>
        <td>9631.88</td>
    </tr>
    <tr>
        <td>3</td>
        <td>Australia</td>
        <td>12060.33</td>
    </tr>
    <tr>
        <td>3</td>
        <td>Canada</td>
        <td>11826.23</td>
    </tr>
    <tr>
        <td>3</td>
        <td>None</td>
        <td>23886.56</td>
    </tr>
    <tr>
        <td>4</td>
        <td>Australia</td>
        <td>14136.07</td>
    </tr>
    <tr>
        <td>4</td>
        <td>Canada</td>
        <td>14423.39</td>
    </tr>
    <tr>
        <td>4</td>
        <td>None</td>
        <td>28559.46</td>
    </tr>
    <tr>
        <td>5</td>
        <td>Australia</td>
        <td>271.08</td>
    </tr>
    <tr>
        <td>5</td>
        <td>Canada</td>
        <td>243.10</td>
    </tr>
    <tr>
        <td>5</td>
        <td>None</td>
        <td>514.18</td>
    </tr>
    <tr>
        <td>None</td>
        <td>None</td>
        <td>67416.51</td>
    </tr>
    <tr>
        <td>None</td>
        <td>Australia</td>
        <td>33726.77</td>
    </tr>
    <tr>
        <td>None</td>
        <td>Canada</td>
        <td>33689.74</td>
    </tr>
</tbody></table></div>

## Revenue Total, by Month, by Country, by Month & Country All in one shot, NAIVE way
The naive way to create the same table as above is to write several queries and UNION them together. Grouping sets and cubes produce queries that are shorter to write, easier to read, and more performant. Run the naive query below and compare the time it takes to run to the time it takes the cube query to run.

In [8]:
%%time
%%sql

SELECT
     NULL as month
    ,NULL as country
    ,sum(sales_amount) as revenue
FROM 
    star.fact_sales
    
UNION ALL

SELECT 
     NULL
    ,dim_store.country
    ,sum(sales_amount) as revenue
FROM
    star.fact_sales    
JOIN
    star.dim_store
ON
    dim_store.store_key = fact_sales.store_key
GROUP BY
    dim_store.country

UNION ALL

SELECT
     cast(dim_date.month as text)
    ,NULL
    ,sum(sales_amount) as revenue
FROM
    star.fact_sales
JOIN 
    star.dim_date
ON
    dim_date.date_key = fact_sales.date_key
GROUP by
    dim_date.month

UNION ALL

SELECT
     cast(dim_date.month as text)
    ,dim_store.country
    ,sum(sales_amount) as revenue
FROM
    star.fact_sales
INNER JOIN 
    star.dim_date
ON
    dim_date.date_key = fact_sales.date_key
JOIN 
    star.dim_store
ON
    dim_store.store_key = fact_sales.store_key
GROUP by
     dim_date.month
    ,dim_store.country

 * postgresql://postgres:***@localhost:5432/pagila
18 rows affected.
Wall time: 136 ms


month,country,revenue
,,67416.50999999193
,Canada,33689.74000000495
,Australia,33726.77000000514
3.0,,23886.56000000212
5.0,,514.180000000001
4.0,,28559.460000003823
2.0,,9631.879999999612
1.0,,4824.4299999998575
5.0,Australia,271.08000000000027
1.0,Canada,2460.239999999982
