# Greenplum Database  Concepts Explained (Part 4)

This is Part 4 of Greenplum Database  Concepts Explained, ***Table Storage Models***. 

- If you missed Part 1 (*Setup, Describe Input Dataset & Data Loading*) or wish to repeat, then click [here](AWS-GP-demo-1.ipynb).
- If you missed Part 2 (*Basic Table Functions*) or wish to repeat, then click [here](AWS-GP-demo-2.ipynb).
- If you missed Part 3 (*MPP Fundamentals and Partitioning*) or wish to repeat, then click [here](AWS-GP-demo-3.ipynb).

In [1]:
import os, re
from IPython.display import display_html

import pygments.lexers
from pygments import highlight
from pygments.formatters import HtmlFormatter

CONNECTION_STRING = os.getenv('AWSGPDBCONN')

cs = re.match('^postgresql:\/\/(\S+):(\S+)@(\S+):(\S+)\/(\S+)$', CONNECTION_STRING)

DB_USER   = cs.group(1)
DB_PWD    = cs.group(2)
DB_SERVER = cs.group(3)
DB_PORT   = cs.group(4)
DB_NAME   = cs.group(5)

%reload_ext sql
%sql $CONNECTION_STRING
%sql $DB_USER@$DB_NAME {"SELECT version();"}

1 rows affected.


version
"PostgreSQL 9.4.24 (Greenplum Database 6.12.0 build commit:4c176763c7619fb678ce38095e6b3e8fb9548186) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Oct 28 2020 19:42:15"


In [2]:
query = "SHOW gp_autostats_mode; \
ALTER DATABASE {} SET gp_autostats_mode TO 'NONE'; \
SHOW gp_autostats_mode;".format(DB_NAME)

%sql $DB_USER@$DB_NAME {''.join(query)}

1 rows affected.
Done.
1 rows affected.


gp_autostats_mode
none


In [3]:
query = !cat script/7-db-maintenance.sql
%sql $DB_USER@$DB_NAME {''.join(query)}

Done.
Done.


[]

## 7. Comparing Table Storage Models

Re-create the Amazon Reviews table, using 2 different table storage models, row-oriented and column-oriented, as shown below:

In [4]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-1-amzn-reviews-ro.sql
sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-1-amzn-reviews-co.sql

display_html('\n'.join(sqlfilecode1), raw=True)
display_html('\n'.join(sqlfilecode3), raw=True)

query1 = !cat script/7-1-amzn-reviews-ro.sql
query3 = !cat script/7-1-amzn-reviews-co.sql

%sql $DB_USER@$DB_SERVER {''.join(query1)}
%sql $DB_USER@$DB_SERVER {''.join(query3)}

Done.
Done.
Done.
Done.


[]

### 7.1 Loading

#### 7.1.1 Loading from another source table

Load the two tables using table `demo.amzn_reviews` as source (see [Notebook 1](AWS-GP-demo-1.ipynb)), and compare their loading times.

In [5]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-1-load-amzn-reviews-ro.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-1-load-amzn-reviews-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode2 = !pygmentize -f html -O full,style=colorful -l postgres script/7-1-load-amzn-reviews-co.sql
display_html('\n'.join(sqlfilecode2), raw=True)
cmd2 = !echo $(cat script/7-1-load-amzn-reviews-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd2), raw=True)

#### 7.1.2 Loading from a source file (Bulk Loading)

Let's drop & recreate the tables, load the input dataset in bulk to each using the `gpload` utility, and compare their loading times.

In [6]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-1-amzn-reviews-ro.sql
sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-1-amzn-reviews-co.sql

display_html('\n'.join(sqlfilecode1), raw=True)
display_html('\n'.join(sqlfilecode3), raw=True)

query1 = !cat script/7-1-amzn-reviews-ro.sql
query3 = !cat script/7-1-amzn-reviews-co.sql

%sql $DB_USER@$DB_SERVER {''.join(query1)}
%sql $DB_USER@$DB_SERVER {''.join(query3)}

Done.
Done.
Done.
Done.


[]

In [7]:
!ssh -i ~/.ssh/aws-gp.pem $DB_USER@$DB_SERVER 'if [ -f ./gpload-amzn-reviews-ro.log ]; then rm ./gpload-amzn-reviews-ro.log; fi'
!scp -i ~/.ssh/aws-gp.pem script/7-1-gpload-amzn-reviews-ro.yaml $DB_USER@$DB_SERVER:gpload-amzn-reviews-ro.yaml
cmd = "gpload -d {0} -f ./gpload-amzn-reviews-ro.yaml -l ./gpload-amzn-reviews-ro.log 2>&1".format(DB_NAME) 
!ssh -i ~/.ssh/aws-gp.pem $DB_USER@$DB_SERVER $cmd

7-1-gpload-amzn-reviews-ro.yaml               100%  377    44.1KB/s   00:00    
2020-12-22 17:26:21|INFO|gpload session started 2020-12-22 17:26:21
2020-12-22 17:26:21|INFO|no host supplied, defaulting to localhost
2020-12-22 17:26:21|INFO|started gpfdist -p 8000 -P 9000 -f "/data1/tmp_s3_data/amazon_reviews_us*.tsv.gz" -t 30 -m 1000000
2020-12-22 17:26:21|INFO|did not find an external table to reuse. creating ext_gpload_reusable_cf0d228c_447a_11eb_9163_064760077968
2020-12-22 17:37:53|WARN|7622 bad rows
2020-12-22 17:37:53|WARN|Please use following query to access the detailed error
2020-12-22 17:37:53|WARN|select * from gp_read_error_log('ext_gpload_reusable_cf0d228c_447a_11eb_9163_064760077968') where cmdtime > to_timestamp('1608657981.54')
2020-12-22 17:37:53|INFO|running time: 691.68 seconds
2020-12-22 17:37:53|INFO|rows Inserted          = 150955707
2020-12-22 17:37:53|INFO|rows Updated           = 0
2020-12-22 17:37:53|INFO|data formatting errors = 7622


In [8]:
!ssh -i ~/.ssh/aws-gp.pem $DB_USER@$DB_SERVER 'if [ -f ./gpload-amzn-reviews-co.log ]; then rm ./gpload-amzn-reviews-co.log; fi'
!scp -i ~/.ssh/aws-gp.pem script/7-1-gpload-amzn-reviews-co.yaml $DB_USER@$DB_SERVER:gpload-amzn-reviews-co.yaml
cmd = "gpload -d {0} -f ./gpload-amzn-reviews-co.yaml -l ./gpload-amzn-reviews-co.log 2>&1".format(DB_NAME) 
!ssh -i ~/.ssh/aws-gp.pem $DB_USER@$DB_SERVER $cmd

7-1-gpload-amzn-reviews-co.yaml               100%  376    44.5KB/s   00:00    
2020-12-22 17:37:54|INFO|gpload session started 2020-12-22 17:37:54
2020-12-22 17:37:54|INFO|no host supplied, defaulting to localhost
2020-12-22 17:37:54|INFO|started gpfdist -p 8000 -P 9000 -f "/data1/tmp_s3_data/amazon_reviews_us*.tsv.gz" -t 30 -m 1000000
2020-12-22 17:37:54|INFO|did not find an external table to reuse. creating ext_gpload_reusable_6c3efade_447c_11eb_a46a_064760077968
2020-12-22 17:48:48|WARN|7622 bad rows
2020-12-22 17:48:48|WARN|Please use following query to access the detailed error
2020-12-22 17:48:48|WARN|select * from gp_read_error_log('ext_gpload_reusable_6c3efade_447c_11eb_a46a_064760077968') where cmdtime > to_timestamp('1608658674.76')
2020-12-22 17:48:48|INFO|running time: 653.57 seconds
2020-12-22 17:48:48|INFO|rows Inserted          = 150955707
2020-12-22 17:48:48|INFO|rows Updated           = 0
2020-12-22 17:48:48|INFO|data formatting errors = 7622


In [9]:
cmd = 'grep -e '"'"'running'"'"' /home/gpadmin/gpload-amzn-reviews*\
    | awk '"'"'BEGIN{FS=":"} {print $1, "finished in", $5}'"'"'' 
grep_output = !ssh -i ~/.ssh/aws-gp.pem $DB_USER@$DB_SERVER $cmd | pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(grep_output), raw=True)

### 7.2 Table Size and Disk Space Usage

In [10]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/7-2-table-size-comparison.sql
display_html('\n'.join(sqlfilecode), raw=True)
query = !cat script/7-2-table-size-comparison.sql
%sql $DB_USER@$DB_SERVER {''.join(query)}

27 rows affected.


schema,relation,tablesize,toastsize,othersize,tabledisksize,indexsize,uncompressedsize,compressionpercentage
demo,amzn_reviews,79 GB,232 MB,0 bytes,80 GB,0 bytes,80 GB,0.0
demo,amzn_reviews_by_marketplace,79 GB,225 MB,0 bytes,80 GB,0 bytes,80 GB,0.0
demo,amzn_reviews_co,32 GB,1568 kB,3136 kB,32 GB,0 bytes,74 GB,56.33
demo,amzn_reviews_partitioned,0 bytes,1568 kB,0 bytes,1568 kB,0 bytes,1568 kB,0.0
demo,amzn_reviews_partitioned_1_prt_year1995,1504 kB,1568 kB,0 bytes,3072 kB,0 bytes,3072 kB,0.0
demo,amzn_reviews_partitioned_1_prt_year1996,4800 kB,1568 kB,0 bytes,6368 kB,0 bytes,6368 kB,0.0
demo,amzn_reviews_partitioned_1_prt_year1997,31 MB,1568 kB,0 bytes,33 MB,0 bytes,33 MB,0.0
demo,amzn_reviews_partitioned_1_prt_year1998,119 MB,1568 kB,0 bytes,121 MB,0 bytes,121 MB,0.0
demo,amzn_reviews_partitioned_1_prt_year1999,306 MB,1568 kB,0 bytes,308 MB,0 bytes,308 MB,0.0
demo,amzn_reviews_partitioned_1_prt_year2000,858 MB,1888 kB,0 bytes,860 MB,0 bytes,860 MB,0.0


### 7.3 Query Performance

#### 7.3.0 `ANALYZE` tables

In [11]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-analyze.sql
display_html('\n'.join(sqlfilecode), raw=True)
query = !cat script/7-3-analyze.sql
%sql $DB_USER@$DB_SERVER {''.join(query)}

Done.
Done.


[]

#### 7.3.1 Narrow (*Few columns of the table*) `SELECT`

In [12]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-narrow-select-ro.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-narrow-select-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-narrow-select-co.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-narrow-select-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)

#### 7.3.2.1 Super Narrow (*1 column of the table*) `SELECT`: "Short" Data Field example

In [13]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-super-narrow-select-ro.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-super-narrow-select-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-super-narrow-select-co.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-super-narrow-select-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)

#### 7.3.2.2 Super Narrow (*1 column of the table*) `SELECT`:  "Long" Data Field example

In [14]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-super-narrow-select-ro-2.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-super-narrow-select-ro-2.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-super-narrow-select-co-2.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-super-narrow-select-co-2.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)

#### 7.3.3 Wide (*Most/Many columns of the table*) `SELECT`

In [15]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-wide-select-ro.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-wide-select-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-wide-select-co.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-wide-select-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)

#### 7.3.4.1 Aggregate/Window Functions over a limited number of columns

In [16]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-aggr-select-ro.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-aggr-select-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-aggr-select-co.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-aggr-select-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)

#### 7.3.4.2 Aggregate/Window Functions over a more columns

In [17]:
sqlfilecode1 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-aggr-select-ro-2.sql
display_html('\n'.join(sqlfilecode1), raw=True)
cmd1 = !echo $(cat script/7-3-aggr-select-ro.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd1), raw=True)

sqlfilecode3 = !pygmentize -f html -O full,style=colorful -l postgres script/7-3-aggr-select-co-2.sql
display_html('\n'.join(sqlfilecode3), raw=True)
cmd3 = !echo $(cat script/7-3-aggr-select-co.sql | \
               psql $CONNECTION_STRING | \
               grep -e 'Execution time') | \
    pygmentize -f html -O full,style=colorful -l postgres
display_html('\n'.join(cmd3), raw=True)