# Exercise 6: OLAP Cubes - Slicing and Dicing

### Connect to the local database where Pagila is loaded

In [1]:
# Load ipython-sql
%load_ext sql

# Setup database connection
# Define parameters
DB_ENDPOINT = 'localhost'
DB_NAME = 'pagila'
DB_USER = 'postgres'
DB_PASSWORD = 'postgres'
DB_PORT = '5432'

conn_string = f"postgresql://{DB_USER}:{DB_PASSWORD}@{DB_ENDPOINT}:{DB_PORT}/{DB_NAME}"

# Connect
%sql $conn_string

### Star Schema

<img src="../../../images/cloud_data_warehouse_star_schema_pagila.png" width="50%"/>

### Revenue by day, rating and city


In [16]:
%%time
%%sql
SELECT
    d.day,
    m.rating,
    c.city,
    SUM(f.sales_amount) AS revenue
FROM factsales f
JOIN dimdate d
USING (date_key)
JOIN dimmovie m
USING (movie_key) 
JOIN dimcustomer c
USING (customer_key)
GROUP BY 1,2,3
ORDER BY revenue DESC
LIMIT 5;

 * postgresql://postgres:***@localhost:5432/pagila
5 rows affected.
Wall time: 71 ms


day,rating,city,revenue
30,G,San Bernardino,24.97
30,NC-17,Apeldoorn,23.95
21,NC-17,Belm,22.97
30,PG-13,Zanzibar,21.97
21,G,Citt del Vaticano,21.97


### Slicing

Slicing is the reduction of the dimensionality of a cube by 1 e.g. 3 dimensions to 2, fixing one of the dimensions to a single value. In the example above, we have a 3-dimensional cube on day, rating, and city

In [15]:
%%time
%%sql
SELECT
    d.day,
    m.rating,
    c.city,
    SUM(f.sales_amount) AS revenue
FROM factsales f
JOIN dimdate d
USING (date_key)
JOIN dimmovie m
USING (movie_key) 
JOIN dimcustomer c
USING (customer_key)
WHERE m.rating = 'PG-13'
GROUP BY 1,2,3
ORDER BY revenue DESC
LIMIT 5;

 * postgresql://postgres:***@localhost:5432/pagila
5 rows affected.
Wall time: 26 ms


day,rating,city,revenue
30,PG-13,Zanzibar,21.97
28,PG-13,Dhaka,19.97
29,PG-13,Shimoga,18.97
30,PG-13,Osmaniye,18.97
21,PG-13,Asuncin,18.95


## Dicing
Dicing is creating a subcube with the same dimensionality but fewer values for  two or more dimensions. 

TODO: Write a query to create a subcube of the initial cube that includes moves with:
* ratings of PG or PG-13
* in the city of Bellevue or Lancaster
* day equal to 1, 15, or 30

The first few rows of your output should match the table below. 

In [17]:
%%time
%%sql
SELECT
    d.day,
    m.rating,
    c.city,
    SUM(f.sales_amount) AS revenue
FROM factsales f
JOIN dimdate d
USING (date_key)
JOIN dimmovie m
USING (movie_key) 
JOIN dimcustomer c
USING (customer_key)
WHERE 
    m.rating IN ('PG-13', 'PG')
    AND c.city IN ('Bellevue', 'Lancaster')
    AND day IN (1, 15, 30)
GROUP BY 1,2,3
ORDER BY revenue DESC
LIMIT 5;

 * postgresql://postgres:***@localhost:5432/pagila
5 rows affected.
Wall time: 12 ms


day,rating,city,revenue
30,PG,Lancaster,12.98
1,PG-13,Lancaster,5.99
30,PG-13,Bellevue,3.99
30,PG-13,Lancaster,2.99
15,PG-13,Bellevue,1.98
