# Exercise 02 -  OLAP Cubes - Slicing and Dicing

All the databases table in this demo are based on public database samples and transformations
- `Sakila` is a sample database created by `MySql` [Link](https://dev.mysql.com/doc/sakila/en/sakila-structure.html)
- The postgresql version of it is called `Pagila` [Link](https://github.com/devrimgunduz/pagila)
- The facts and dimension tables design is based on O'Reilly's public dimensional modelling tutorial schema [Link](http://archive.oreilly.com/oreillyschool/courses/dba3/index.html)

Start by creating and connecting to the database by running the cells below.

In [1]:
!PGPASSWORD=student createdb -h 127.0.0.1 -U student pagila_star
!PGPASSWORD=student psql -q -h 127.0.0.1 -U student -d pagila_star -f Data/pagila-star.sql

createdb: database creation failed: ERROR:  database "pagila_star" already exists
 set_config 
------------
 
(1 row)

psql:Data/pagila-star.sql:42: ERROR:  type "mpaa_rating" already exists
psql:Data/pagila-star.sql:52: ERROR:  type "year" already exists
psql:Data/pagila-star.sql:69: ERROR:  function "_group_concat" already exists with same argument types
psql:Data/pagila-star.sql:86: ERROR:  function "film_in_stock" already exists with same argument types
psql:Data/pagila-star.sql:103: ERROR:  function "film_not_in_stock" already exists with same argument types
psql:Data/pagila-star.sql:148: ERROR:  function "get_customer_balance" already exists with same argument types
psql:Data/pagila-star.sql:170: ERROR:  function "inventory_held_by_customer" already exists with same argument types
psql:Data/pagila-star.sql:207: ERROR:  function "inventory_in_stock" already exists with same argument types
psql:Data/pagila-star.sql:225: ERROR:  function "last_day" already exists with same argument 

### Connect to the local database where Pagila is loaded

In [2]:
import sql
%load_ext sql

DB_ENDPOINT = "127.0.0.1"
DB = 'pagila_star'
DB_USER = 'student'
DB_PASSWORD = 'student'
DB_PORT = '5432'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)
%sql $conn_string

postgresql://student:student@127.0.0.1:5432/pagila_star


'Connected: student@pagila_star'

### Star Schema

<img src="pagila-star.png" width="50%"/>

# Start with a simple cube
TODO: Write a query that calculates the revenue (sales_amount) by day, rating, and city. Remember to join with the appropriate dimension tables to replace the keys with the dimension labels. Sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below.

In [33]:
%%time
%%sql

SELECT dimDate.day, dimmovie.rating, dimcustomer.city, sum(factsales.sales_amount) as revenue 
FROM factSales 
JOIN dimmovie ON (factSales.movie_key = dimmovie.movie_key)
JOIN dimDate ON (factSales.date_key = dimdate.date_key)
JOIN dimcustomer ON (factSales.customer_key = dimcustomer.customer_key)


GROUP BY dimDate.day, dimmovie.rating, dimcustomer.city
ORDER BY revenue DESC
LIMIT 20


 * postgresql://student:***@127.0.0.1:5432/pagila_star
20 rows affected.
CPU times: user 1.16 ms, sys: 4.04 ms, total: 5.2 ms
Wall time: 37.2 ms


day,rating,city,revenue
30,G,San Bernardino,24.97
30,NC-17,Apeldoorn,23.95
21,NC-17,Belm,22.97
28,R,Mwanza,21.97
21,G,Citt del Vaticano,21.97
30,PG-13,Zanzibar,21.97
1,R,Qomsheh,19.97
22,R,Yangor,19.97
28,PG-13,Dhaka,19.97
17,G,Rajkot,19.97


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>day</th>
        <th>rating</th>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>30</td>
        <td>G</td>
        <td>San Bernardino</td>
        <td>24.97</td>
    </tr>
    <tr>
        <td>30</td>
        <td>NC-17</td>
        <td>Apeldoorn</td>
        <td>23.95</td>
    </tr>
    <tr>
        <td>21</td>
        <td>NC-17</td>
        <td>Belm</td>
        <td>22.97</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Zanzibar</td>
        <td>21.97</td>
    </tr>
    <tr>
        <td>28</td>
        <td>R</td>
        <td>Mwanza</td>
        <td>21.97</td>
    </tr>
</tbody></table></div>

## Slicing

Slicing is the reduction of the dimensionality of a cube by 1 e.g. 3 dimensions to 2, fixing one of the dimensions to a single value. In the example above, we have a 3-dimensional cube on day, rating, and country.

TODO: Write a query that reduces the dimensionality of the above example by limiting the results to only include movies with a `rating` of "PG-13". Again, sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below. 

In [39]:
%%time
%%sql

SELECT dimDate.day, dimmovie.rating, dimcustomer.city, sum(factsales.sales_amount) as revenue 
FROM factSales 
JOIN dimmovie ON (factSales.movie_key = dimmovie.movie_key)
JOIN dimDate ON (factSales.date_key = dimdate.date_key)
JOIN dimcustomer ON (factSales.customer_key = dimcustomer.customer_key)

WHERE dimmovie.rating = 'PG-13'
GROUP BY dimDate.day, dimmovie.rating, dimcustomer.city

ORDER BY revenue DESC
LIMIT 20


 * postgresql://student:***@127.0.0.1:5432/pagila_star
20 rows affected.
CPU times: user 4.97 ms, sys: 95 µs, total: 5.07 ms
Wall time: 16.2 ms


day,rating,city,revenue
30,PG-13,Zanzibar,21.97
28,PG-13,Dhaka,19.97
30,PG-13,Osmaniye,18.97
29,PG-13,Shimoga,18.97
21,PG-13,Asuncin,18.95
21,PG-13,Parbhani,17.98
20,PG-13,Baha Blanca,17.98
30,PG-13,Nagareyama,17.98
30,PG-13,Tanauan,17.96
17,PG-13,Ikerre,17.95


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>day</th>
        <th>rating</th>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Zanzibar</td>
        <td>21.97</td>
    </tr>
    <tr>
        <td>28</td>
        <td>PG-13</td>
        <td>Dhaka</td>
        <td>19.97</td>
    </tr>
    <tr>
        <td>29</td>
        <td>PG-13</td>
        <td>Shimoga</td>
        <td>18.97</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Osmaniye</td>
        <td>18.97</td>
    </tr>
    <tr>
        <td>21</td>
        <td>PG-13</td>
        <td>Asuncin</td>
        <td>18.95</td>
    </tr>
</tbody></table></div>

## Dicing
Dicing is creating a subcube with the same dimensionality but fewer values for  two or more dimensions. 

TODO: Write a query to create a subcube of the initial cube that includes moves with:
* ratings of PG or PG-13
* in the city of Bellevue or Lancaster
* day equal to 1, 15, or 30

The first few rows of your output should match the table below. 

In [None]:
%%time
%%sql

SELECT dimDate.day, dimmovie.rating, dimcustomer.city, sum(factsales.sales_amount) as revenue 
FROM factSales 
JOIN dimmovie ON (factSales.movie_key = dimmovie.movie_key)
JOIN dimDate ON (factSales.date_key = dimdate.date_key)
JOIN dimcustomer ON (factSales.customer_key = dimcustomer.customer_key)

WHERE dimmovie.rating in ('PG-13','PG') AND dimcustomer.city in ('Bellevue', 'Lancaster') AND dimDate.day in (1,15,30)
GROUP BY dimDate.day, dimmovie.rating, dimcustomer.city
ORDER BY revenue DESC
LIMIT 5


 * postgresql://student:***@127.0.0.1:5432/pagila_star
5 rows affected.
CPU times: user 6.72 ms, sys: 0 ns, total: 6.72 ms
Wall time: 10.5 ms


day,rating,city,revenue
30,PG,Lancaster,12.98
1,PG-13,Lancaster,5.99
30,PG-13,Bellevue,3.99
30,PG-13,Lancaster,2.99
15,PG-13,Bellevue,1.98


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>day</th>
        <th>rating</th>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>30</td>
        <td>PG</td>
        <td>Lancaster</td>
        <td>12.98</td>
    </tr>
    <tr>
        <td>1</td>
        <td>PG-13</td>
        <td>Lancaster</td>
        <td>5.99</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Bellevue</td>
        <td>3.99</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Lancaster</td>
        <td>2.99</td>
    </tr>
    <tr>
        <td>15</td>
        <td>PG-13</td>
        <td>Bellevue</td>
        <td>1.98</td>
    </tr>
</tbody></table></div>