# Exercise 02 -  OLAP Cubes - Slicing and Dicing

All the databases table in this demo are based on public database samples and transformations
- `Sakila` is a sample database created by `MySql` [Link](https://video.udacity-data.com/topher/2021/August/61120e06_pagila-3nf/pagila-3nf.png)
- The postgresql version of it is called `Pagila` [Link](https://github.com/devrimgunduz/pagila)
- The facts and dimension tables design is based on O'Reilly's public dimensional modelling tutorial schema [Link](https://video.udacity-data.com/topher/2021/August/61120d38_pagila-star/pagila-star.png)

Start by creating and connecting to the database by running the cells below.

In [None]:
# !PGPASSWORD=root createdb -h 127.0.0.1 -U abbas pagila_star
# !PGPASSWORD=root psql -q -h 127.0.0.1 -U abbas -d pagila_star -f Data/pagila/pagila-star.sql

### Connect to the local database where Pagila is loaded

In [1]:
import sql
%load_ext sql

DB_ENDPOINT = "127.0.0.1"
DB = 'pagila_star'
DB_USER = 'abbas'
DB_PASSWORD = 'root'
DB_PORT = '5432'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)
%sql $conn_string

postgresql://abbas:root@127.0.0.1:5432/pagila_star


'Connected: abbas@pagila_star'

### Star Schema

<img src="pagila-star.png" width="50%"/>

# Start with a simple cube
TODO: Write a query that calculates the revenue (sales_amount) by day, rating, and city. Remember to join with the appropriate dimension tables to replace the keys with the dimension labels. Sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below.

In [22]:
%%time
%%sql

SELECT dd.day, dm.rating, dc.city, sum(fs.sales_amount) as revenue
FROM dimdate dd 
JOIN factsales fs ON (dd.date_key = fs.date_key)
JOIN dimmovie dm ON (fs.movie_key = dm.movie_key)
JOIN dimcustomer dc ON (fs.customer_key = dc.customer_key)
GROUP BY (dd.day, dm.rating, dc.city)
ORDER BY revenue DESC
LIMIT 20;

 * postgresql://abbas:***@127.0.0.1:5432/pagila_star
20 rows affected.
CPU times: user 1.95 ms, sys: 1.06 ms, total: 3.01 ms
Wall time: 235 ms


day,rating,city,revenue
21,G,Citt del Vaticano,32.94
30,PG-13,Osmaniye,28.96
28,R,Mwanza,28.96
30,R,Fengshan,28.94
18,R,Sumy,25.97
28,R,Usolje-Sibirskoje,25.96
23,G,San Juan Bautista Tuxtepec,25.96
30,G,San Bernardino,24.97
19,PG,Najafabad,24.95
6,G,Szkesfehrvr,23.98


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>day</th>
        <th>rating</th>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>30</td>
        <td>G</td>
        <td>San Bernardino</td>
        <td>24.97</td>
    </tr>
    <tr>
        <td>30</td>
        <td>NC-17</td>
        <td>Apeldoorn</td>
        <td>23.95</td>
    </tr>
    <tr>
        <td>21</td>
        <td>NC-17</td>
        <td>Belm</td>
        <td>22.97</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Zanzibar</td>
        <td>21.97</td>
    </tr>
    <tr>
        <td>28</td>
        <td>R</td>
        <td>Mwanza</td>
        <td>21.97</td>
    </tr>
</tbody></table></div>

## Slicing

Slicing is the reduction of the dimensionality of a cube by 1 e.g. 3 dimensions to 2, fixing one of the dimensions to a single value. In the example above, we have a 3-dimensional cube on day, rating, and city.

TODO: Write a query that reduces the dimensionality of the above example by limiting the results to only include movies with a `rating` of "PG-13". Again, sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below. 

In [26]:
%%time
%%sql

SELECT dimDate.day, dimMovie.rating, dimCustomer.city, sum(sales_amount) as revenue
FROM factSales
JOIN dimMovie ON (dimMovie.movie_key = factSales.movie_key)
JOIN dimDate ON (dimDate.date_key = factSales.date_key)
JOIN dimCustomer ON (dimCustomer.customer_key = factSales.customer_key)
WHERE dimMovie.rating = 'PG-13'
GROUP BY (dimDate.day, dimCustomer.city, dimMovie.rating)
ORDER BY revenue DESC 
LIMIT 20;

 * postgresql://abbas:***@127.0.0.1:5432/pagila_star
20 rows affected.
CPU times: user 1.92 ms, sys: 2.65 ms, total: 4.57 ms
Wall time: 32.2 ms


day,rating,city,revenue
30,PG-13,Osmaniye,28.96
17,PG-13,Yantai,23.97
15,PG-13,Jhansi,23.96
5,PG-13,Karnal,23.96
11,PG-13,Boa Vista,22.97
29,PG-13,Kirovo-Tepetsk,22.97
20,PG-13,Ciparay,22.96
30,PG-13,La Romana,22.95
22,PG-13,Rampur,21.98
30,PG-13,Zanzibar,21.97


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>day</th>
        <th>rating</th>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Zanzibar</td>
        <td>21.97</td>
    </tr>
    <tr>
        <td>28</td>
        <td>PG-13</td>
        <td>Dhaka</td>
        <td>19.97</td>
    </tr>
    <tr>
        <td>29</td>
        <td>PG-13</td>
        <td>Shimoga</td>
        <td>18.97</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Osmaniye</td>
        <td>18.97</td>
    </tr>
    <tr>
        <td>21</td>
        <td>PG-13</td>
        <td>Asuncin</td>
        <td>18.95</td>
    </tr>
</tbody></table></div>

## Dicing
Dicing is creating a subcube with the same dimensionality but fewer values for  two or more dimensions. 

TODO: Write a query to create a subcube of the initial cube that includes moves with:
* ratings of PG or PG-13
* in the city of Bellevue or Lancaster
* day equal to 1, 15, or 30

The first few rows of your output should match the table below. 

In [27]:
%%time
%%sql

SELECT dimDate.day, dimMovie.rating, dimCustomer.city, sum(sales_amount) as revenue
FROM factSales
JOIN dimMovie ON (dimMovie.movie_key = factSales.movie_key)
JOIN dimDate ON (dimDate.date_key = factSales.date_key)
JOIN dimCustomer ON (dimCustomer.customer_key = factSales.customer_key)
WHERE dimMovie.rating in ('PG-13', 'PG')
AND dimCustomer.city in ('Bellevue', 'Lancaster')
AND dimDate.day in ('1', '15', '30')
GROUP BY (dimDate.day, dimCustomer.city, dimMovie.rating)
ORDER BY revenue DESC 
LIMIT 20;

 * postgresql://abbas:***@127.0.0.1:5432/pagila_star
9 rows affected.
CPU times: user 2.08 ms, sys: 8.32 ms, total: 10.4 ms
Wall time: 27.5 ms


day,rating,city,revenue
30,PG,Lancaster,12.98
15,PG-13,Lancaster,8.99
1,PG,Lancaster,6.99
1,PG-13,Lancaster,5.99
30,PG-13,Bellevue,3.99
30,PG-13,Lancaster,2.99
15,PG-13,Bellevue,1.98
30,PG,Bellevue,0.99
1,PG,Bellevue,0.99


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>day</th>
        <th>rating</th>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>30</td>
        <td>PG</td>
        <td>Lancaster</td>
        <td>12.98</td>
    </tr>
    <tr>
        <td>1</td>
        <td>PG-13</td>
        <td>Lancaster</td>
        <td>5.99</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Bellevue</td>
        <td>3.99</td>
    </tr>
    <tr>
        <td>30</td>
        <td>PG-13</td>
        <td>Lancaster</td>
        <td>2.99</td>
    </tr>
    <tr>
        <td>15</td>
        <td>PG-13</td>
        <td>Bellevue</td>
        <td>1.98</td>
    </tr>
</tbody></table></div>