## Exercise 02 - OLAP Cubes - Slicing and Dicing


All the databases table in this demo are based on public database samples and transformations
- `Sakila` is a sample database created by `MySql` [Link](https://dev.mysql.com/doc/sakila/en/sakila-structure.html)
- The postgresql version of it is called `Pagila` [Link](https://github.com/devrimgunduz/pagila)
- The facts and dimension tables design is based on O'Reilly's public dimensional modelling tutorial schema [Link](http://archive.oreilly.com/oreillyschool/courses/dba3/index.html)

Start by creating and connecting to the database by running the cells below.

#### Connect to the local database where pagila is loaded

In [2]:
import sql
%load_ext sql

DB_ENDPOINT = "127.0.0.1"
DB = 'pagila'
DB_USER = 'postgres'
DB_PASSWORD = 'Met/14/7472'
DB_PORT = '5432'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)
%sql $conn_string

postgresql://postgres:Met/14/7472@127.0.0.1:5432/pagila


# Start with a simple cube
TODO: Write a query that calculates the revenue (sales_amount) by day, rating, and city. Remember to join with the appropriate dimension tables to replace the keys with the dimension labels. Sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below.

In [3]:
%%time 
%%sql

SELECT d.day, m.rating, c.city, sum(f.sales_amount) as revenue
FROM factsales f
JOIN dimcustomer c ON (f.customer_key = c.customer_key)
JOIN dimmovie m ON (f.movie_key = m.movie_key)
JOIN dimdate d ON (f.date_key = d.date_key)
GROUP BY (d.day, c.city, m.rating)
ORDER BY revenue DESC
LIMIT 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
CPU times: user 6.31 ms, sys: 209 µs, total: 6.52 ms
Wall time: 963 ms


day,rating,city,revenue
19,PG-13,Lengshuijiang,379.4
19,PG,Lengshuijiang,341.46
19,NC-17,Lengshuijiang,341.46
30,PG-13,Mannheim,318.56
20,PG-13,Johannesburg,311.48
22,R,Saint-Denis,298.48
21,G,Tabriz,296.23
21,R,Belm,289.5
30,PG-13,Omdurman,287.46
30,PG-13,Tiefa,285.56


## Slicing

Slicing is the reduction of the dimensionality of a cube by 1 e.g. 3 dimensions to 2, fixing one of the dimensions to a single value. In the example above, we have a 3-dimensional cube on day, rating, and country.

TODO: Write a query that reduces the dimensionality of the above example by limiting the results to only include movies with a `rating` of "PG-13". Again, sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below. 

In [4]:
%%time
%%sql

SELECT d.day, m.rating, c.city, sum(f.sales_amount) as revenue
FROM factsales f
JOIN dimcustomer c ON (f.customer_key = c.customer_key)
JOIN dimmovie m ON (f.movie_key = m.movie_key)
JOIN dimdate d ON (f.date_key = d.date_key)
WHERE m.rating = 'PG-13'
GROUP BY (d.day, c.city, m.rating)
ORDER BY revenue DESC
LIMIT 20;


 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
CPU times: user 6.45 ms, sys: 0 ns, total: 6.45 ms
Wall time: 258 ms


day,rating,city,revenue
19,PG-13,Lengshuijiang,379.4
30,PG-13,Mannheim,318.56
20,PG-13,Johannesburg,311.48
30,PG-13,Omdurman,287.46
30,PG-13,Tiefa,285.56
30,PG-13,Shanwei,285.45
30,PG-13,Apeldoorn,259.5
30,PG-13,Kolpino,259.5
21,PG-13,Tanza,251.52
18,PG-13,Uluberia,233.55


## Dicing
Dicing is creating a subcube with the same dimensionality but fewer values for  two or more dimensions. 

TODO: Write a query to create a subcube of the initial cube that includes moves with:
* ratings of PG or PG-13
* in the city of Bellevue or Lancaster
* day equal to 1, 15, or 30

The first few rows of your output should match the table below. 

In [5]:
%%time
%%sql

SELECT d.day, m.rating, c.city, sum(f.sales_amount) as revenue
FROM factsales f
JOIN dimcustomer c ON (f.customer_key = c.customer_key)
JOIN dimmovie m ON (f.movie_key = m.movie_key)
JOIN dimdate d ON (f.date_key = d.date_key)
WHERE m.rating IN ('PG-13', 'PG')
AND c.city IN ('Bellevue', 'Lancaster')
AND d.day IN ('1', '15', '30')
group by (d.day, c.city, m.rating)
order by revenue desc
limit 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
10 rows affected.
CPU times: user 9.18 ms, sys: 0 ns, total: 9.18 ms
Wall time: 63.3 ms


day,rating,city,revenue
30,PG-13,Lancaster,199.6
30,PG,Lancaster,124.75
1,PG-13,Lancaster,47.92
1,PG-13,Bellevue,35.88
1,PG,Lancaster,29.95
1,PG,Bellevue,29.9
30,PG-13,Bellevue,23.94
30,PG,Bellevue,19.95
15,PG-13,Bellevue,5.94
15,PG,Bellevue,4.95
