In [1]:
import psycopg2
%load_ext sql

### Connect to the local database

In [2]:
DB_ENDPOINT = "127.0.0.1"
DB = 'pagila'
DB_USER = 'postgres'
DB_PASSWORD = 'password'
DB_PORT = '5432'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)

postgresql://postgres:password@127.0.0.1:5432/pagila


In [3]:
%sql $conn_string

### ERD for the Star Schema

![Dimensional Model](Dimension-Model-Schema.jpg)

### Roll-up

- Stepping up the level of aggregation to a large grouping
- e.g.city is summed as country

**A query that calculates revenue (sales_amount) by day, rating, and country. Sort the data by revenue in descending order, and limit the data to the top 20 results.**

In [4]:
%%time
%%sql
SELECT t.day, m.rating, s.country, sum(f.sales_amount) as revenue
FROM factsales f 
JOIN dimdate t ON (f.date_key = t.date_key)
JOIN dimmovie m ON (f.movie_key = m.movie_key)
JOIN dimstore s ON (f.store_key = s.store_key)
GROUP BY t.day, m.rating, s.country
ORDER BY  revenue desc
LIMIT 20

 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
Wall time: 36.9 ms


day,rating,country,revenue
30,PG-13,Canada,784.21
30,G,Canada,730.48
30,R,Canada,683.46
30,NC-17,Australia,667.49
30,NC-17,Canada,646.51
30,PG-13,Australia,635.48
30,PG,Australia,593.57
30,G,Australia,587.58
20,PG-13,Canada,538.93
30,PG,Canada,521.78


### Drill-down

- Breaking up one of the dimensions to a lower level.
- e.g.city is broken up into districts

**A query that calculates revenue (sales_amount) by day, rating, and district. Sort the data by revenue in descending order**

In [5]:
%%time
%%sql
SELECT t.day, m.rating, s.district, sum(f.sales_amount) as revenue
FROM factsales f 
JOIN dimdate t ON (f.date_key = t.date_key)
JOIN dimmovie m ON (f.movie_key = m.movie_key)
JOIN dimstore s ON (f.store_key = s.store_key)
GROUP BY t.day, m.rating,  s.district
ORDER BY  revenue desc
LIMIT 20

 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
Wall time: 32.9 ms


day,rating,district,revenue
30,PG-13,Alberta,784.21
30,G,Alberta,730.48
30,R,Alberta,683.46
30,NC-17,QLD,667.49
30,NC-17,Alberta,646.51
30,PG-13,QLD,635.48
30,PG,QLD,593.57
30,G,QLD,587.58
20,PG-13,Alberta,538.93
30,PG,Alberta,521.78
