# 2.2_cloud_data_warehouse_postgres_olap_operations
<img src="https://upload.wikimedia.org/wikipedia/commons/2/29/Postgresql_elephant.svg" width="100" height="100">

# 1. Setup and connection

In [None]:
!PGPASSWORD=student createdb -h 127.0.0.1 -U student pagila_star
!PGPASSWORD=student psql -q -h 127.0.0.1 -U student -d pagila_star -f data/2.2_cloud_data_warehouse_postgres_olap_pagila_star.sql

In [None]:
#!pip install ipython-sql
%load_ext sql
from dotenv import load_dotenv
import os

# Load environment variables from .env file
dotenv_path = "../.env"
load_dotenv()

# Retrieve credentials securely from environment, or opt for the default ones
DB_ENDPOINT = os.getenv("POSTGRES_HOST", "127.0.0.1")
DB = os.getenv("POSTGRES_DB", "pagila_star")
DB_USER = os.getenv("postgres_username", "student")
DB_PASSWORD = os.getenv("postgres_password", "student")
DB_PORT = os.getenv("POSTGRES_PORT", "5432")

# Connection string postgresql://username:password@host:port/database
conn_string = f"postgresql://{DB_USER}:{DB_PASSWORD}@{DB_ENDPOINT}:{DB_PORT}/{DB}"

# Connect to database
%sql $conn_string

# Configure SQLMagic to avoid Keyerror DEFAULT
%config SqlMagic.style = "_DEPRECATED_DEFAULT"

## 1.1. Star Schema
This is the star schema created and connected to in the previous step. We'll use it to exercise OLAP operations.

<img src="images/2.1_cloud_data_warehouse_postgres_pagila_star.png" width="50%">

# 2. OLAP operations

## 2.1. Roll-up
- Purpose: summarization, hierarchical aggregation and analysis, drill-up. 
- Aggregates or combines values and reduces number of rows or columns.
- e.g.`city` is summed as `country`

Demo: revenue (`sales_amount`) by day, rating, and country (of a customer). 

In [None]:
%%sql
SELECT d.day, m.rating, c.country, SUM(f.sales_amount) AS revenue
FROM factsales f
JOIN dimdate d ON f.date_key = d.date_key
JOIN dimmovie m ON f.movie_key = m.movie_key
JOIN dimcustomer c ON f.customer_key = c.customer_key
GROUP BY ROLLUP (d.day, m.rating, c.country)
ORDER BY revenue DESC
LIMIT 5;

## 2.2. Drill-down
- Purpose: All possible groupings, full cross-tab or multi-level aggregation and analysis, drill-down.
- Decomposes values and increases number of rows or columns.
- e.g.`city` is broken up into  `districts`

Demo: revenue (`sales_amount`) by day, rating, and district (of a customer). 

In [None]:
%%sql
SELECT d.day, m.rating, c.district, SUM(f.sales_amount) AS revenue
FROM factsales f
JOIN dimdate d ON f.date_key = d.date_key
JOIN dimmovie m ON f.movie_key = m.movie_key
JOIN dimcustomer c ON f.customer_key = c.customer_key
GROUP BY CUBE (d.day, m.rating, c.district)
ORDER BY revenue DESC;

## 2.3. Slicing

Purpose: Dimensionality reduction; getting a specific subset of a dimension by WHERE filter.

Demo: Reduce the `rating` dimension to only include movies rated 'PG-13'.

In [None]:
%%sql
SELECT d.day, m.rating, c.city, SUM(f.sales_amount) AS revenue
FROM factsales f
JOIN dimdate d ON f.date_key = d.date_key
JOIN dimmovie m ON f.movie_key = m.movie_key
JOIN dimcustomer c ON f.customer_key = c.customer_key
WHERE m.rating = 'PG-13'
GROUP BY d.day, m.rating, c.city
ORDER BY revenue DESC
LIMIT 5;

## 2.4. Dicing
Purpose: Like slicing, but applied to 2+ dimensions.

Demo: subcube of the initial cube that includes moves with:
- ratings of PG or PG-13
- in the city (of customer) is Bellevue or Lancaster
- day equal to 1, 15, or 30

In [None]:
%%sql
SELECT d.day, m.rating, c.city, SUM(f.sales_amount) AS revenue
FROM factsales f
JOIN dimdate d ON f.date_key = d.date_key
JOIN dimmovie m ON f.movie_key = m.movie_key
JOIN dimcustomer c ON f.customer_key = c.customer_key
WHERE m.rating IN ('PG', 'PG-13') AND c.city IN ('Bellevue', 'Lancaster') AND d.day IN (1,15,30)
GROUP BY d.day, m.rating, c.city
ORDER BY revenue DESC
LIMIT 5;

# 3. Grouping Sets
We can easily roll-up and drill down a fact by:
- 0 dimensions (see revenue),
- 1 dimension (month or country),
- 2 dimensions (month and country).

With GROUP BY GROUPING SETS, we can pick and calculate all those aggregations in one go.

In [None]:
%%sql -- BY REVENUE
SELECT SUM(sales_amount) AS revenue FROM factsales;

In [None]:
%%sql -- BY MONTH
SELECT month, SUM(sales_amount) AS revenue 
FROM factsales f
JOIN dimdate d ON f.date_key = d.date_key
GROUP BY month;

In [None]:
%%sql -- BY COUNTRY
SELECT country, SUM(sales_amount) AS revenue 
FROM factsales f
JOIN dimstore s ON f.store_key = s.store_key
GROUP BY country;

In [None]:
%%sql -- BY MONTH, COUNTRY
SELECT month, country, SUM(sales_amount) AS revenue 
FROM factsales f
JOIN dimdate d ON f.date_key = d.date_key
JOIN dimstore s ON f.store_key = s.store_key
GROUP BY CUBE (month, country)
ORDER BY month, country, revenue;

In [None]:
%%sql
SELECT d.month, s.country, SUM(f.sales_amount) AS revenue
FROM factsales f
JOIN dimdate d ON f.date_key = d.date_key
JOIN dimstore s ON f.store_key = s.store_key
GROUP BY GROUPING SETS ((), (month), (country), (month, country))
ORDER BY month, country;

GROUP BY CUBE and GROUP BY GROUPING SETS have the same output? Yes, if GROUPING SETS includes all combinations that CUBE would generate.  
**Use CUBE when you want all combinations, use GROUPING SETS when you want control.**