In [2]:
%load_ext sql
DB_ENDPOINT = '127.0.0.1'
DB = 'sakila'
DB_USER = 'postgres'
DB_PASSWORD = '1234'
DB_PORT = '5432'

conn_string = 'postgresql://{}:{}@{}:{}/{}'\
                .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)
print(conn_string)
%sql $conn_string

The sql extension is already loaded. To reload it, use:
  %reload_ext sql
postgresql://postgres:1234@127.0.0.1:5432/sakila


#### STARTING WITH A SIMPLE CUBE
TODO: Write a query that calculates the revenue (sales_amount) by day, rating, and city. Remember to join with the appropriate dimension tables to replace the keys with the dimension labels. Sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below.

In [3]:
%%time
%%sql
SELECT  dimDate.day, dimMovie.rating, dimCustomer.city, sum(sales_amount) as revenue
FROM factSales 
JOIN dimDate on (factSales.date_key = dimDate.date_key)
JOIN dimMovie ON (factSales.movie_key = dimMovie.movie_key)
JOIN dimCustomer ON (factSales.customer_key = dimCustomer.customer_key)
GROUP BY (dimDate.day, dimMovie.rating, dimCustomer.city)
ORDER BY revenue DESC
LIMIT 20

 * postgresql://postgres:***@127.0.0.1:5432/sakila
20 rows affected.
CPU times: user 8.36 ms, sys: 3.41 ms, total: 11.8 ms
Wall time: 209 ms


day,rating,city,revenue
30,G,San Bernardino,49.94
27,NC-17,Funafuti,45.92
21,G,Citt del Vaticano,43.94
1,R,Qomsheh,39.94
17,G,Rajkot,39.94
22,R,Yangor,39.94
28,PG-13,Dhaka,39.94
19,PG,Najafabad,39.92
21,G,Wroclaw,37.96
30,PG-13,Zanzibar,37.96


### SLICING 
Slicing is the reduction of the dimensionality of a cube by 1 e.g. 3 dimensions to 2, fixing one of the dimensions to a single value. In the example above, we have a 3-dimensional cube on day, rating, and city.

Write a query that reduces the dimensionality of the above example by limiting the results to only include movies with a rating of "PG-13". Again, sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below.

In [4]:
%%time
%%sql

SELECT  dimDate.day, dimMovie.rating, dimCustomer.city, sum(sales_amount) as revenue
FROM factSales 
JOIN dimDate on (factSales.date_key = dimDate.date_key)
JOIN dimMovie ON (factSales.movie_key = dimMovie.movie_key)
JOIN dimCustomer ON (factSales.customer_key = dimCustomer.customer_key)
WHERE dimMovie.rating = 'PG-13'
GROUP BY (dimDate.day, dimMovie.rating, dimCustomer.city)
ORDER BY revenue DESC
LIMIT 5

 * postgresql://postgres:***@127.0.0.1:5432/sakila
5 rows affected.
CPU times: user 5.15 ms, sys: 2.47 ms, total: 7.63 ms
Wall time: 105 ms


day,rating,city,revenue
28,PG-13,Dhaka,39.94
30,PG-13,Zanzibar,37.96
2,PG-13,Antofagasta,37.94
21,PG-13,Asuncin,37.9
21,PG-13,Parbhani,35.96


### Dicing
Dicing is creating a subcube with the same dimensionality but fewer values for two or more dimensions.

Write a query to create a subcube of the initial cube that includes moves with:

ratings of PG or PG-13
in the city of Bellevue or Lancaster
day equal to 1, 15, or 30
The first few rows of your output should match the table below.

In [5]:
%%time
%%sql

SELECT  dimDate.day, dimMovie.rating, dimCustomer.city, sum(sales_amount) as revenue
FROM factSales 
JOIN dimDate on (factSales.date_key = dimDate.date_key)
JOIN dimMovie ON (factSales.movie_key = dimMovie.movie_key)
JOIN dimCustomer ON (factSales.customer_key = dimCustomer.customer_key)
WHERE dimMovie.rating IN ('PG', 'PG-13') 
    AND dimCustomer.city IN ('Bellevue', 'Lancaster')
    AND dimDate.day IN (1, 15, 30)
GROUP BY (dimDate.day, dimMovie.rating, dimCustomer.city)
ORDER BY revenue DESC
LIMIT 20

 * postgresql://postgres:***@127.0.0.1:5432/sakila
5 rows affected.
CPU times: user 7.92 ms, sys: 2.88 ms, total: 10.8 ms
Wall time: 78.5 ms


day,rating,city,revenue
30,PG,Lancaster,13.98
1,PG-13,Lancaster,11.98
30,PG-13,Bellevue,7.98
15,PG-13,Bellevue,3.96
1,PG,Bellevue,1.98


### Roll-up
- Stepping up the level of aggregation to a large grouping
- e.g.city is summed as country

Write a query that calculates revenue (sales_amount) by day, rating, and country. Sort the data by revenue in descending order, and limit the data to the top 20 results. The first few rows of your output should match the table below.

In [6]:
%%time
%%sql
SELECT  dimDate.day, dimMovie.rating, dimCustomer.country, sum(sales_amount) as revenue
FROM factSales 
JOIN dimDate on (factSales.date_key = dimDate.date_key)
JOIN dimMovie ON (factSales.movie_key = dimMovie.movie_key)
JOIN dimCustomer ON (factSales.customer_key = dimCustomer.customer_key)
GROUP BY (dimDate.day, dimMovie.rating, dimCustomer.country)
ORDER BY revenue DESC
LIMIT 5

 * postgresql://postgres:***@127.0.0.1:5432/sakila
5 rows affected.
CPU times: user 8.25 ms, sys: 3.66 ms, total: 11.9 ms
Wall time: 55.4 ms


day,rating,country,revenue
18,NC-17,India,261.52
21,PG-13,India,261.46
17,PG-13,China,239.48
18,PG,India,237.54
21,R,India,233.5


### Drill-down
- Breaking up one of the dimensions to a lower level.
- e.g.city is broken up into districts

Write a query that calculates revenue (sales_amount) by day, rating, and district. Sort the data by revenue in descending order, and limit the data to the top 20 results. The first few rows of your output should match the table below.

In [7]:
%%time
%%sql
SELECT  dimDate.day, dimMovie.rating, dimCustomer.district, sum(sales_amount) as revenue
FROM factSales 
JOIN dimDate on (factSales.date_key = dimDate.date_key)
JOIN dimMovie ON (factSales.movie_key = dimMovie.movie_key)
JOIN dimCustomer ON (factSales.customer_key = dimCustomer.customer_key)
GROUP BY (dimDate.day, dimMovie.rating, dimCustomer.district)
ORDER BY revenue DESC
LIMIT 5

 * postgresql://postgres:***@127.0.0.1:5432/sakila
5 rows affected.
CPU times: user 9.88 ms, sys: 3.22 ms, total: 13.1 ms
Wall time: 158 ms


day,rating,district,revenue
21,PG-13,,1881.72
20,PG-13,,1837.74
18,PG-13,,1805.68
18,R,,1791.84
19,NC-17,,1759.88
