# Exercise 10: Columnar Vs Row Storage

- The columnar storage extension used here: 
    - cstore_fdw by citus_data [https://github.com/citusdata/cstore_fdw](https://github.com/citusdata/cstore_fdw)
- The data tables are the ones used by citus_data to show the storage extension


## Row storage

### Connect to the database

In [2]:
# Load ipython-sql
%load_ext sql

# Setup database connection
# Define parameters
DB_ENDPOINT = 'localhost'
DB_NAME = 'reviews'
DB_USER = 'postgres'
DB_PASSWORD = 'postgres'
DB_PORT = '5432'

conn_string = f"postgresql://{DB_USER}:{DB_PASSWORD}@{DB_ENDPOINT}:{DB_PORT}/{DB_NAME}"

# Connect
%sql $conn_string

### Create a table with a normal  (Row) storage & load data

In [10]:
%%sql
DROP TABLE IF EXISTS customer_reviews_row;
CREATE TABLE customer_reviews_row (
    customer_id TEXT,
    review_date DATE,
    review_rating INTEGER,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10),
    product_title TEXT,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
)

 * postgresql://postgres:***@localhost:5432/reviews
Done.
Done.


[]

### Insert the data using command line:
1. Copy data to docker container: `docker cp cloud_data_warehouses/introduction/data/customer_reviews_1998.csv nano_de_degree_postgres:customer_reviews_1998.csv` 
2. Copy data to docker container: `docker cp cloud_data_warehouses/introduction/data/customer_reviews_1999.csv nano_de_degree_postgres:customer_reviews_1999.csv` 
3. Enter psql command line: `\c reviews` 
5. Copy the rows from csv file: `\copy customer_reviews_row FROM 'customer_reviews_1998.csv' WITH CSV;` 
6. Copy the rows from csv file: `\copy customer_reviews_row FROM 'customer_reviews_1999.csv' WITH CSV;` 

## Columnar storage

1. Setup new docker container including postgres server with columnar extension: `docker run -d --name nano_de_degree_postgres_columnar -e POSTGRES_PASSWORD=postgres -e PGDATA=/var/lib/postgres/data/udacity -p 5433:5433 abuckenhofer/columnarpostgresql`

In [None]:
CREATE FOREIGN TABLE customer_reviews_col (
    customer_id TEXT,
    review_date DATE,
    review_rating INTEGER,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10),
    product_title TEXT,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
)
SERVER cstore_server
OPTIONS(compression 'pglz');]
(Background on this error at: https://sqlalche.me/e/14/f405)

2. Enter psql command line: `\c reviews` 
3. Copy the rows from csv file: `\copy customer_reviews_col FROM 'customer_reviews_1998.csv' WITH CSV;` 
4. Copy the rows from csv file: `\copy customer_reviews_col FROM 'customer_reviews_1999.csv' WITH CSV;`

## Compare performance

In [None]:
%%time
%%sql
SELECT product_title, avg(review_rating)
FROM customer_reviews_row
WHERE review_date >= '1995-01-01' 
    AND review_date <= '1998-12-31'
GROUP BY product_title
ORDER by product_title
LIMIT 20;

 Then on `customer_reviews_col`:

In [None]:
%%time
%%sql
SELECT product_title, avg(review_rating)
FROM customer_reviews_col
WHERE review_date >= '1995-01-01' 
    AND review_date <= '1998-12-31'
GROUP BY product_title
ORDER by product_title
LIMIT 20;

## Conclusion: We can see that the columnar storage is faster!