# Exercise 02 -  OLAP Cubes - Slicing and Dicing

All the databases table in this demo are based on public database samples and transformations
- `Sakila` is a sample database created by `MySql` [Link](https://dev.mysql.com/doc/sakila/en/sakila-structure.html)
- The postgresql version of it is called `Pagila` [Link](https://github.com/devrimgunduz/pagila)
- The facts and dimension tables design is based on O'Reilly's public dimensional modelling tutorial schema [Link](http://archive.oreilly.com/oreillyschool/courses/dba3/index.html)

Start by creating and connecting to the database by running the cells below.

In [None]:
!PGPASSWORD=student createdb -h 127.0.0.1 -U student pagila_star
!PGPASSWORD=student psql -q -h 127.0.0.1 -U student -d pagila_star -f Data/pagila-star.sql

### Connect to the local database where Pagila is loaded

In [2]:
import sql
%load_ext sql

DB_ENDPOINT = "127.0.0.1"
DB = 'pagila_star'
DB_USER = 'student'
DB_PASSWORD = 'student'
DB_PORT = '5432'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)
%sql $conn_string

postgresql://student:student@127.0.0.1:5432/pagila_star


'Connected: student@pagila_star'

### Star Schema

<img src="pagila-star.png" width="50%"/>

# Start with a simple cube
TODO: Write a query that calculates the revenue (sales_amount) by day, rating, and city. Remember to join with the appropriate dimension tables to replace the keys with the dimension labels. Sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below.

In [4]:
%%time
%%sql

SELECT dimdate.day, dimmovie.rating, dimstore.city, SUM(factsales.sales_amount) AS revenue
FROM factsales
JOIN dimdate ON factsales.date_key = dimdate.date_key
JOIN dimmovie ON factsales.movie_key = dimmovie.movie_key
JOIN dimstore ON factsales.store_key = dimstore.store_key
GROUP BY dimdate.day, dimmovie.rating, dimstore.city
LIMIT 20

 * postgresql://student:***@127.0.0.1:5432/pagila_star
20 rows affected.
CPU times: user 4.44 ms, sys: 0 ns, total: 4.44 ms
Wall time: 23.8 ms


day,rating,city,revenue
14,R,Lethbridge,66.78
18,NC-17,Woodridge,443.99
20,G,Lethbridge,347.11
27,PG-13,Woodridge,444.9
23,G,Lethbridge,183.56
30,PG,Woodridge,593.57
26,NC-17,Woodridge,156.68
6,R,Woodridge,209.52
31,G,Lethbridge,57.89
7,PG,Lethbridge,204.5


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>day</th>
        <th>rating</th>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>30</td>
        <td>G</td>
        <td>San Bernardino</td>
        <td>24.97</td>
    </tr>
    <tr>
        <td>30</td>
        <td>NC-17</td>
        <td>Apeldoorn</td>
        <td>23.95</td>
    </tr>
    <tr>
        <td>21</td>
        <td>NC-17</td>
        <td>Belm</td>
        <td>22.97</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Zanzibar</td>
        <td>21.97</td>
    </tr>
    <tr>
        <td>28</td>
        <td>R</td>
        <td>Mwanza</td>
        <td>21.97</td>
    </tr>
</tbody></table></div>

## Slicing

Slicing is the reduction of the dimensionality of a cube by 1 e.g. 3 dimensions to 2, fixing one of the dimensions to a single value. In the example above, we have a 3-dimensional cube on day, rating, and country.

TODO: Write a query that reduces the dimensionality of the above example by limiting the results to only include movies with a `rating` of "PG-13". Again, sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below. 

In [8]:
%%time
%%sql

SELECT dimdate.day, dimmovie.rating, dimstore.city, SUM(factsales.sales_amount) AS revenue
FROM factsales
JOIN dimdate ON factsales.date_key = dimdate.date_key
JOIN dimmovie ON factsales.movie_key = dimmovie.movie_key
JOIN dimstore ON factsales.store_key = dimstore.store_key
GROUP BY dimdate.day, dimmovie.rating, dimstore.city
HAVING dimmovie.rating = 'PG-13'

LIMIT 20

 * postgresql://student:***@127.0.0.1:5432/pagila_star
20 rows affected.
CPU times: user 4.52 ms, sys: 0 ns, total: 4.52 ms
Wall time: 12.5 ms


day,rating,city,revenue
14,PG-13,Woodridge,64.76
27,PG-13,Woodridge,444.9
25,PG-13,Woodridge,68.84
6,PG-13,Lethbridge,243.44
21,PG-13,Lethbridge,499.92
12,PG-13,Woodridge,244.44
28,PG-13,Woodridge,368.12
2,PG-13,Lethbridge,274.36
17,PG-13,Woodridge,488.83
8,PG-13,Woodridge,274.41


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>day</th>
        <th>rating</th>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Zanzibar</td>
        <td>21.97</td>
    </tr>
    <tr>
        <td>28</td>
        <td>PG-13</td>
        <td>Dhaka</td>
        <td>19.97</td>
    </tr>
    <tr>
        <td>29</td>
        <td>PG-13</td>
        <td>Shimoga</td>
        <td>18.97</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Osmaniye</td>
        <td>18.97</td>
    </tr>
    <tr>
        <td>21</td>
        <td>PG-13</td>
        <td>Asuncin</td>
        <td>18.95</td>
    </tr>
</tbody></table></div>

## Dicing
Dicing is creating a subcube with the same dimensionality but fewer values for  two or more dimensions. 

TODO: Write a query to create a subcube of the initial cube that includes moves with:
* ratings of PG or PG-13
* in the city of Bellevue or Lancaster
* day equal to 1, 15, or 30

The first few rows of your output should match the table below. 

In [16]:
%%time
%%sql

SELECT dimdate.day, dimmovie.rating, dimcustomer.city, SUM(factsales.sales_amount) AS revenue
FROM factsales
JOIN dimdate ON factsales.date_key = dimdate.date_key
JOIN dimmovie ON factsales.movie_key = dimmovie.movie_key
JOIN dimcustomer ON factsales.customer_key = dimCustomer.customer_key
WHERE dimmovie.rating in ('PG', 'PG-13')
      AND dimcustomer.city in ('Bellevue', 'Lancaster')
      AND dimdate.day in (1,15,30)
GROUP BY (dimdate.day, dimmovie.rating, dimcustomer.city)
ORDER BY revenue
LIMIT 20

 * postgresql://student:***@127.0.0.1:5432/pagila_star
6 rows affected.
CPU times: user 2.38 ms, sys: 3.68 ms, total: 6.06 ms
Wall time: 12.2 ms


day,rating,city,revenue
1,PG,Bellevue,0.99
15,PG-13,Bellevue,1.98
30,PG-13,Lancaster,2.99
30,PG-13,Bellevue,3.99
1,PG-13,Lancaster,5.99
30,PG,Lancaster,12.98


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>day</th>
        <th>rating</th>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>30</td>
        <td>PG</td>
        <td>Lancaster</td>
        <td>12.98</td>
    </tr>
    <tr>
        <td>1</td>
        <td>PG-13</td>
        <td>Lancaster</td>
        <td>5.99</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Bellevue</td>
        <td>3.99</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Lancaster</td>
        <td>2.99</td>
    </tr>
    <tr>
        <td>15</td>
        <td>PG-13</td>
        <td>Bellevue</td>
        <td>1.98</td>
    </tr>
</tbody></table></div>