# Greenplum Database  Concepts Explained (Part 4)

This is Part 4 of Greenplum Database  Concepts Explained, ***Table Storage Models***. 

- If you missed Part 1 (*Setup, Describe Input Dataset & Data Loading*) or wish to repeat, then click [here](AWS-GP-demo-1.ipynb).
- If you missed Part 2 (*Basic Table Functions*) or wish to repeat, then click [here](AWS-GP-demo-2.ipynb).
- If you missed Part 3 (*MPP Fundamentals and Partitioning*) or wish to repeat, then click [here](AWS-GP-demo-3.ipynb).

In [1]:
import os, re
from IPython.display import display_html

import pygments.lexers
from pygments import highlight
from pygments.formatters import HtmlFormatter

CONNECTION_STRING = os.getenv('AWSGPDBCONN')

cs = re.match('^postgresql:\/\/(\S+):(\S+)@(\S+):(\S+)\/(\S+)$', CONNECTION_STRING)

DB_USER   = cs.group(1)
DB_PWD    = cs.group(2)
DB_SERVER = cs.group(3)
DB_PORT   = cs.group(4)
DB_NAME   = cs.group(5)

%reload_ext sql
%sql $CONNECTION_STRING

'Connected: gpadmin@gpadmin'

In [None]:
%%sql $DB_USER@$DB_SERVER
SHOW gp_autostats_mode;
ALTER DATABASE gpadmin SET gp_autostats_mode TO 'NONE';
SHOW gp_autostats_mode;

In [2]:
query = !cat script/7-db-maintenance.sql
%sql $DB_USER@$DB_SERVER {''.join(query)}

Done.
Done.
Done.
1 rows affected.
Done.


[]

## 7. Table Storage Models

### 7.1. Comparing Greenplum Table Storage Models: Loading

Re-create the Amazon Reviews table, using 3 different table storage models, Heap table, Append-Optimized (AO)/Row-Oriented table with ZLib (Level 3) compression, and Append-Optimized (AO)/Column-Oriented table with ZLib (Level 3) compression, as shown below:

In [3]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-1-amzn-reviews-heap.sql
sqlfilecode2 = !pygmentize -f html -O full,style=colorful -l postgres script/7-1-amzn-reviews-ao-ro-zlib3.sql
sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-1-amzn-reviews-ao-co-zlib3.sql

display_html('\n'.join(sqlfilecode1), raw=True)
display_html('\n'.join(sqlfilecode2), raw=True)
display_html('\n'.join(sqlfilecode3), raw=True)

query1 = !cat script/7-1-amzn-reviews-heap.sql
query2 = !cat script/7-1-amzn-reviews-ao-ro-zlib3.sql
query3 = !cat script/7-1-amzn-reviews-ao-co-zlib3.sql

%sql $DB_USER@$DB_SERVER {''.join(query1)}
%sql $DB_USER@$DB_SERVER {''.join(query2)}
%sql $DB_USER@$DB_SERVER {''.join(query3)}

Done.
Done.
Done.
Done.
Done.
Done.


[]

Load the input dataset to each using the `gpload` utility, and compare loading times.

In [5]:
!scp -i ~/.ssh/aws-gp.pem script/7-1-gpload-amzn-reviews-heap.yaml $DB_USER@$DB_SERVER:gpload-amzn-reviews-heap.yaml
!ssh -i ~/.ssh/aws-gp.pem $DB_USER@$DB_SERVER gpload -d $DB_USER -f ./gpload-amzn-reviews-heap.yaml 2>&1 \
    | tee ./gpload-amzn-reviews-heap.yaml.log

7-1-gpload-amzn-reviews-heap.yaml             100%  377    59.4KB/s   00:00    
2019-09-23 16:15:13|INFO|gpload session started 2019-09-23 16:15:13
2019-09-23 16:15:13|INFO|no host supplied, defaulting to localhost
2019-09-23 16:15:13|INFO|started gpfdist -p 8000 -P 9000 -f "/var/tmp_s3_data/amazon_reviews_us*.tsv.gz" -t 30 -m 1000000
2019-09-23 16:15:13|INFO|did not find an external table to reuse. creating ext_gpload_reusable_f0f53be6_de14_11e9_a24a_067c34561bce
2019-09-23 16:21:51|WARN|3714 bad rows
2019-09-23 16:21:51|WARN|Please use following query to access the detailed error
2019-09-23 16:21:51|WARN|select * from gp_read_error_log('ext_gpload_reusable_f0f53be6_de14_11e9_a24a_067c34561bce') where cmdtime > to_timestamp('1569251713.48')
2019-09-23 16:21:51|INFO|running time: 398.14 seconds
2019-09-23 16:21:51|INFO|rows Inserted          = 103145273
2019-09-23 16:21:51|INFO|rows Updated           = 0
2019-09-23 16:21:51|INFO|data formatting errors = 3714


In [6]:
!scp -i ~/.ssh/aws-gp.pem script/7-1-gpload-amzn-reviews-ao-ro-zlib3.yaml $DB_USER@$DB_SERVER:gpload-amzn-reviews-ao-ro-zlib3.yaml
!ssh -i ~/.ssh/aws-gp.pem $DB_USER@$DB_SERVER gpload -d $DB_USER -f ./gpload-amzn-reviews-ao-ro-zlib3.yaml 2>&1 \
    | tee ./gpload-amzn-reviews-ao-ro-zlib3.log

7-1-gpload-amzn-reviews-ao-ro-zlib3.yaml      100%  383    83.3KB/s   00:00    
2019-09-23 16:21:52|INFO|gpload session started 2019-09-23 16:21:52
2019-09-23 16:21:52|INFO|no host supplied, defaulting to localhost
2019-09-23 16:21:52|INFO|started gpfdist -p 8000 -P 9000 -f "/var/tmp_s3_data/amazon_reviews_us*.tsv.gz" -t 30 -m 1000000
2019-09-23 16:21:52|INFO|did not find an external table to reuse. creating ext_gpload_reusable_dee8f694_de15_11e9_8ccb_067c34561bce
2019-09-23 16:27:58|WARN|3714 bad rows
2019-09-23 16:27:58|WARN|Please use following query to access the detailed error
2019-09-23 16:27:58|WARN|select * from gp_read_error_log('ext_gpload_reusable_dee8f694_de15_11e9_8ccb_067c34561bce') where cmdtime > to_timestamp('1569252112.7')
2019-09-23 16:27:58|INFO|running time: 365.44 seconds
2019-09-23 16:27:58|INFO|rows Inserted          = 103145273
2019-09-23 16:27:58|INFO|rows Updated           = 0
2019-09-23 16:27:58|INFO|data formatting errors = 3714


In [7]:
!scp -i ~/.ssh/aws-gp.pem script/7-1-gpload-amzn-reviews-ao-co-zlib3.yaml $DB_USER@$DB_SERVER:gpload-amzn-reviews-ao-co-zlib3.yaml
!ssh -i ~/.ssh/aws-gp.pem $DB_USER@$DB_SERVER gpload -d $DB_USER -f ./gpload-amzn-reviews-ao-co-zlib3.yaml 2>&1 \
    | tee ./gpload-amzn-reviews-ao-co-zlib3.log

7-1-gpload-amzn-reviews-ao-co-zlib3.yaml      100%  384    75.8KB/s   00:00    
2019-09-23 16:27:59|INFO|gpload session started 2019-09-23 16:27:59
2019-09-23 16:27:59|INFO|no host supplied, defaulting to localhost
2019-09-23 16:27:59|INFO|started gpfdist -p 8000 -P 9000 -f "/var/tmp_s3_data/amazon_reviews_us*.tsv.gz" -t 30 -m 1000000
2019-09-23 16:27:59|INFO|did not find an external table to reuse. creating ext_gpload_reusable_b96b82e6_de16_11e9_a845_067c34561bce
2019-09-23 16:34:06|WARN|3714 bad rows
2019-09-23 16:34:06|WARN|Please use following query to access the detailed error
2019-09-23 16:34:06|WARN|select * from gp_read_error_log('ext_gpload_reusable_b96b82e6_de16_11e9_a845_067c34561bce') where cmdtime > to_timestamp('1569252479.3')
2019-09-23 16:34:06|INFO|running time: 367.38 seconds
2019-09-23 16:34:06|INFO|rows Inserted          = 103145273
2019-09-23 16:34:06|INFO|rows Updated           = 0
2019-09-23 16:34:06|INFO|data formatting errors = 3714


In [8]:
cmd = 'grep -e '"'"'running'"'"' /home/gpadmin/gpload-amzn-reviews*\
    | awk '"'"'BEGIN{FS=":"} {print $1, "finished in", $5}'"'"'' 
grep_output = !ssh -i ~/.ssh/aws-gp.pem $DB_USER@$DB_SERVER $cmd | pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(grep_output), raw=True)

### 7.2. Comparing Greenplum Table Storage Models: Table Size and Disk Space Usage

In [9]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/7-2-table-size-comparison.sql
display_html('\n'.join(sqlfilecode), raw=True)
query = !cat script/7-2-table-size-comparison.sql
%sql $DB_USER@$DB_SERVER {''.join(query)}

5 rows affected.


schema,relation,tablesize,toastsize,othersize,tabledisksize,indexsize,uncompressedsize,compressionpercentage
demo,amzn_reviews_ao_co_zlib3,24 GB,800 kB,1600 kB,24 GB,0 bytes,55 GB,56.14
demo,amzn_reviews_ao_ro_zlib3,28 GB,195 MB,1600 kB,28 GB,0 bytes,57 GB,50.57
demo,amzn_reviews_by_marketplace,59 GB,196 MB,0 bytes,59 GB,0 bytes,59 GB,0.0
demo,amzn_reviews_heap,59 GB,195 MB,0 bytes,60 GB,0 bytes,60 GB,0.0
demo,calendar,768 kB,0 bytes,0 bytes,768 kB,0 bytes,768 kB,0.0


### 7.3. Comparing Greenplum Table Storage Models: Query Performance

#### 7.3.1. Narrow (*Few columns of the table*) `SELECT`

In [10]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-narrow-select-heap.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-narrow-select-heap.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode2 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-narrow-select-ao-ro.sql
display_html('\n'.join(sqlfilecode2), raw=True)
cmd2 = !echo $(cat script/7-3-narrow-select-ao-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd2), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-narrow-select-ao-co.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-narrow-select-ao-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)

#### 7.3.2. Super Narrow (*1 column of the table*) `SELECT`

In [11]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-super-narrow-select-heap.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-super-narrow-select-heap.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode2 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-super-narrow-select-ao-ro.sql
display_html('\n'.join(sqlfilecode2), raw=True)
cmd2 = !echo $(cat script/7-3-super-narrow-select-ao-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd2), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-super-narrow-select-ao-co.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-super-narrow-select-ao-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)

#### 7.3.3. Wide (*Most/Many columns of the table*) `SELECT`

In [12]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-wide-select-heap.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-wide-select-heap.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode2 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-wide-select-ao-ro.sql
display_html('\n'.join(sqlfilecode2), raw=True)
cmd2 = !echo $(cat script/7-3-wide-select-ao-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd2), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-wide-select-ao-co.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-wide-select-ao-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)

#### 7.3.4. Aggregate/Window Functions over a limited number of columns

In [13]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-aggr-select-heap.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-aggr-select-heap.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode2 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-aggr-select-ao-ro.sql
display_html('\n'.join(sqlfilecode2), raw=True)
cmd2 = !echo $(cat script/7-3-aggr-select-ao-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd2), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-aggr-select-ao-co.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-aggr-select-ao-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Total runtime') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)