In [1]:
import psycopg2
%load_ext sql

### Connect to the local database

In [3]:
DB_ENDPOINT = "127.0.0.1"
DB = 'pagila'
DB_USER = 'postgres'
DB_PASSWORD = 'password'
DB_PORT = '5432'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)

postgresql://postgres:password@127.0.0.1:5432/pagila


In [4]:
%sql $conn_string

### ERD for the Star Schema

![Dimensional Model](Dimension-Model-Schema.jpg)

**A query that calculates the revenue (sales_amount) by day, rating, and city. Remember to join with the appropriate dimension tables to replace the keys with the dimension labels. Sort by revenue in descending order and limit to the first 20 rows. The first few rows of your output should match the table below.**

In [6]:
%%time
%%sql

SELECT 
    d.day, 
    m.rating,
    s.city,
    sum(f.sales_amount) AS revenue
FROM factSales f
JOIN dimDate d  ON f.date_key  = d.date_key
JOIN dimMovie m ON m.movie_key = f.movie_key
JOIN dimStore s ON s.store_key = f.store_key
GROUP BY (d.day, m.rating, s.city)
ORDER BY revenue DESC
LIMIT 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
Wall time: 36.9 ms


day,rating,city,revenue
30,PG-13,Lethbridge,784.21
30,G,Lethbridge,730.48
30,R,Lethbridge,683.46
30,NC-17,Woodridge,667.49
30,NC-17,Lethbridge,646.51
30,PG-13,Woodridge,635.48
30,PG,Woodridge,593.57
30,G,Woodridge,587.58
20,PG-13,Lethbridge,538.93
30,PG,Lethbridge,521.78


#### Slicing

Slicing is the reduction of the dimensionality of a cube by 1 e.g. 3 dimensions to 2, fixing one of the dimensions to a single value. In the example above, we have a 3-dimensional cube on day, rating, and country.

**A query that reduces the dimensionality of the above example by limiting the results to only include movies with a rating of "PG-13". Again, sort by revenue in descending order and limit to the first 20 rows.**

In [7]:
%%time
%%sql

SELECT 
    d.day, 
    m.rating,
    s.city,
    sum(f.sales_amount) AS revenue
FROM factSales f
JOIN dimDate d  ON f.date_key  = d.date_key
JOIN dimMovie m ON m.movie_key = f.movie_key
JOIN dimStore s ON s.store_key = f.store_key
WHERE m.rating = 'PG-13'
GROUP BY (d.day, m.rating, s.city)
ORDER BY revenue DESC
LIMIT 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
Wall time: 21.9 ms


day,rating,city,revenue
30,PG-13,Lethbridge,784.21
30,PG-13,Woodridge,635.48
20,PG-13,Lethbridge,538.93
21,PG-13,Lethbridge,499.92
17,PG-13,Woodridge,488.83
18,PG-13,Lethbridge,466.92
19,PG-13,Lethbridge,465.87
28,PG-13,Lethbridge,455.97
27,PG-13,Woodridge,444.9
19,PG-13,Woodridge,430.01


#### Dicing

Dicing is creating a subcube with the same dimensionality but fewer values for two or more dimensions.

A query to create a subcube of the initial cube that includes moves with:

- ratings of PG or PG-13
- in the city of Bellevue or Lancaster
- day equal to 1, 15, or 30

In [9]:
%%time
%%sql

SELECT t.day, m.rating, s.city, sum(f.sales_amount) as revenue
FROM factsales f 
JOIN dimdate t ON (f.date_key = t.date_key)
JOIN dimmovie m ON (f.movie_key = m.movie_key)
JOIN dimstore s ON (f.store_key = s.store_key)
WHERE m.rating in ('PG-13', 'PG') 
AND s.city in ('Lethbridge', 'Woodridge') 
AND t.day in ('1', '15', '30')
GROUP BY t.day, m.rating, s.city
ORDER BY  revenue desc
LIMIT 20

 * postgresql://postgres:***@127.0.0.1:5432/pagila
12 rows affected.
Wall time: 13 ms


day,rating,city,revenue
30,PG-13,Lethbridge,784.21
30,PG-13,Woodridge,635.48
30,PG,Woodridge,593.57
30,PG,Lethbridge,521.78
1,PG,Woodridge,316.16
1,PG-13,Lethbridge,310.3
1,PG-13,Woodridge,306.3
1,PG,Lethbridge,296.3
15,PG-13,Woodridge,195.54
15,PG-13,Lethbridge,151.61
