# Greenplum Database  Concepts Explained - Part 2

This is Part 2 of Greenplum Database  Concepts Explained, ***Basic Table Functions***. 
- If you missed Part 1 (*Setup, Describe Input Dataset & Data Loading*) or wish to repeat, then click [here](AWS-GP-demo-1.ipynb).

In [1]:
import os, re
from IPython.display import display_html

import pygments.lexers
from pygments import highlight
from pygments.formatters import HtmlFormatter

CONNECTION_STRING = os.getenv('AWSGPDBCONN')

cs = re.match('^postgresql:\/\/(\S+):(\S+)@(\S+):(\S+)\/(\S+)$', CONNECTION_STRING)

DB_USER   = cs.group(1)
DB_PWD    = cs.group(2)
DB_SERVER = cs.group(3)
DB_PORT   = cs.group(4)
DB_NAME   = cs.group(5)

%reload_ext sql
%sql $CONNECTION_STRING

'Connected: gpadmin@gpadmin'

## 4. Basic Table Functions

### 4.1. DESCRIBE *demo.amzn_reviews* table using psql utility (`\d <table name>`)

In [2]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l psql script/4-1-psql-describe-amzn-reviews.sql

display_html('\n'.join(sqlfilecode), raw=True)

psql_out = !cat script/4-1-psql-describe-amzn-reviews.sql | psql -H $CONNECTION_STRING

display_html(''.join(psql_out), raw=True)

Column,Type,Collation,Nullable,Default
marketplace,text,,,
customer_id,bigint,,,
review_id,text,,,
product_id,text,,,
product_parent,bigint,,,
product_title,text,,,
product_category,text,,,
star_rating,integer,,,
helpful_votes,integer,,,
total_votes,integer,,,


### 4.2. DESCRIBE *demo.amzn_reviews* table using `information_schema` catalog table.

In [3]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/4-2-gp-describe-amzn-reviews.sql

display_html('\n'.join(sqlfilecode), raw=True)

query = !cat script/4-2-gp-describe-amzn-reviews.sql

%sql $DB_USER@$DB_SERVER {''.join(query)}

15 rows affected.


table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable
gpadmin,demo,amzn_reviews,product_parent,5,,YES,bigint,,,64.0,2.0,0.0,,,,,,,,,,,,,gpadmin,pg_catalog,int8,,,,,5,NO,NO,,,,,,,NEVER,,YES
gpadmin,demo,amzn_reviews,customer_id,2,,YES,bigint,,,64.0,2.0,0.0,,,,,,,,,,,,,gpadmin,pg_catalog,int8,,,,,2,NO,NO,,,,,,,NEVER,,YES
gpadmin,demo,amzn_reviews,total_votes,10,,YES,integer,,,32.0,2.0,0.0,,,,,,,,,,,,,gpadmin,pg_catalog,int4,,,,,10,NO,NO,,,,,,,NEVER,,YES
gpadmin,demo,amzn_reviews,helpful_votes,9,,YES,integer,,,32.0,2.0,0.0,,,,,,,,,,,,,gpadmin,pg_catalog,int4,,,,,9,NO,NO,,,,,,,NEVER,,YES
gpadmin,demo,amzn_reviews,star_rating,8,,YES,integer,,,32.0,2.0,0.0,,,,,,,,,,,,,gpadmin,pg_catalog,int4,,,,,8,NO,NO,,,,,,,NEVER,,YES
gpadmin,demo,amzn_reviews,review_body,14,,YES,text,,1073741824.0,,,,,,,,,,,,,,,,gpadmin,pg_catalog,text,,,,,14,NO,NO,,,,,,,NEVER,,YES
gpadmin,demo,amzn_reviews,review_headline,13,,YES,text,,1073741824.0,,,,,,,,,,,,,,,,gpadmin,pg_catalog,text,,,,,13,NO,NO,,,,,,,NEVER,,YES
gpadmin,demo,amzn_reviews,verified_purchase,12,,YES,text,,1073741824.0,,,,,,,,,,,,,,,,gpadmin,pg_catalog,text,,,,,12,NO,NO,,,,,,,NEVER,,YES
gpadmin,demo,amzn_reviews,vine,11,,YES,text,,1073741824.0,,,,,,,,,,,,,,,,gpadmin,pg_catalog,text,,,,,11,NO,NO,,,,,,,NEVER,,YES
gpadmin,demo,amzn_reviews,product_category,7,,YES,text,,1073741824.0,,,,,,,,,,,,,,,,gpadmin,pg_catalog,text,,,,,7,NO,NO,,,,,,,NEVER,,YES


### 4.3. Retrieve a sample of the _demo.amzn_reviews_ table data (10 rows).

In [4]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/4-3-select-sample-amzn-reviews.sql

display_html('\n'.join(sqlfilecode), raw=True)

query = !cat script/4-3-select-sample-amzn-reviews.sql

%sql $DB_USER@$DB_SERVER {''.join(query)}

10 rows affected.


marketplace,customer_id,review_id,product_id,product_parent,product_title,product_category,star_rating,helpful_votes,total_votes,vine,verified_purchase,review_headline,review_body,review_date
US,52988732,R25SXQ2VFIZQTQ,0935280006,683647337,When Hell Was in Session,Books,5,5,5,N,Y,"Duty, Honor, Country","With the passing of time, it is too easy to forget the sacrifices that members of the US military make for the residents of the United States and the free world. Of all the men most mistreated in the history of the US, the Prisoners of War in North Vietnam suffered most grievously, and for the longest period of time. Here Jeremiah Denton, a navy pilot at the time, and future US Senator, chronicles his captivity in North Vietnam. This is one of the best books written on the subject, as he never enhances the story, and reveals his faults and fears candidly. Rarely has a more modest and well grounded man been treated so brutally for so long with so much mental and emotional strength.<br /><br />Denton was launched in his A-6 from the USS Independence on his fateful flight while being observed by Robert McNamara who was there on a fact-finding junket. Needless to say most military members and especially pilots loathed and disrespected McNamara and his insane ideas about warfare, but did their best in spite of obstacles erected by both North Vietnam and Washington. For his trouble that day, Denton was singled out by his captors for extra torture as he was \""sent by McNamara personally\"", and was, like all captured American fliers, a \""war criminal.\""<br /><br />Throughout the book, Denton returns to the theme of self-discipline and the Code of Conduct. Never have men endured more torture and been more selfless and noble: this book gives a glimpse into what makes these men so great. Men like Denton, Leo Thorsness, Bud Day, Robbie Risner, Jim Stockdale, Everett Alvarez, and, yes, John McCain kept the faith under conditions far worse than anyone who wasn't there can imagine for the good of the United States and the American way of life. Free people everywhere should be eternally grateful to all these men.<br /><br />This book is harrowing and sad, but is also unexpectedly uplifting as it reveals the power of the human spirit to endure even when things appear darkest. Denton mentions several quotations that inspired him throughout his life, but my favorite is early in the book when he quotes an anonymous man who said \""The greatest heroes known are those that are afraid to go; but go.\"" Never have truer words been spoken. The book gives insight into how to survive physical, but more importantly, mental and psychological torture, and emphasizes the spiritual thinking required in a time of such duress. Sadly, after coming back to the US, Denton was confronted with changes in the fabric of society that saddened and disappointed him: the sixties ravaged our society while he was a POW, with an especially strong toll on families and youth. For this reason Denton has devoted himself to the cause of the American family with a greater vigor than anyone I can recall. I thought that his endurance as a POW would be the thing I admired most about Denton, but after reading this book it is clear that there is so much more to him.<br /><br />I cannot recommend this book more highly. Though it grimly reveals the inhumanity that human beings can show each other, it presents a calling for all of us to be better people every day. If men like Jeremiah Denton can make it through a Vietnamese POW camp, surely the trials most of us face on a daily basis will seem trivial indeed.<br /><br />Thank you, Senator Denton.ns that inspired him throughout his life, but my favorite is early in the book when he quotes an anonymous man who said \""The greatest heroes known are those that are afraid to go; but go.\"" Never have truer words been spoken. The book gives insight into how to survive physical, but more importantly, mental and psychological torture, and emphasizes the spiritual thinking required in a time of such duress. Sadly, after coming back to the US, Denton was confronted with changes in the fabric of society that saddened and disappointed him: the sixties ravaged our society while he was a POW, with an especially strong toll on families and youth. For this reason Denton has devoted himself to the cause of the American family with a greater vigor than anyone I can recall. I thought that his endurance as a POW would be the thing I admired most about Denton, but after reading this book it is clear that there is so much more to him. I cannot recommend this book more highly. Though it grimly reveals the inhumanity that human beings can show each other, it presents a calling for all of us to be better people every day. If men like Jeremiah Denton can make it through a Vietnamese POW camp, surely the trials most of us face on a daily basis will seem trivial indeed. Thank you, Senator Denton.",2008-11-23
US,45673339,R17835HIRR2872,B00TWHZ1AG,623396140,Ladies Denim Jean Hat Cap with Cream Black Flower,Apparel,4,0,0,N,Y,Runs large,"The flower is glued on so if you take it off you will have to add a pin or something to cover the marks left by the removal.Also, be prepared to take some tucks in the back because it is large. I have always worn off the rack hats but this needed adjustment. Otherwise it is a good buy: material is actually denim, the visor is shaped properlyand it looks nice on,",2015-03-25
US,34773214,RNL50PY0FAZRX,B004P8JXGA,584967539,30 Minute Plan,Digital_Ebook_Purchase,3,0,0,N,Y,"A real ""Zigg""uraut","This isn't a bd tale. It is set in a period after \""Zigs\"" or zombies as we know them have cut humanity back to a few small scientific outposts that are determined to figure out a way to deal with the undead menace. There is a nice twist to the tale as a few new elements are introduced. This would make a nice prolouge for a novel or novella. I really like the useof lingo, simplistic though it is, as it adds a nice touch. Give it a try.",2012-02-18
US,38201697,R2NROQASXNX1HT,B00EMXBDMA,803172158,The Martian: A Novel,Digital_Ebook_Purchase,5,0,0,N,Y,Awesome Adventure!,I was drawn in from the first line of the book. One of the best books I've read in awhile..,2015-01-26
US,31404730,R3HI7JIN2Y7CP6,B0049MY6B4,84068287,Anker® New Laptop Battery for Toshiba Satellite PA3534U-1BRS PA3534U-1BAS PA3727U-1BRS PA3535U-1BRS A200 A203 A205 A210 A300 A300D A305 A305D Series [Li-ion 6-cell 4400mAh/48WH],PC,5,1,1,N,Y,As good as the original!,"My computer is going to be 5 years old this year, and the original battery never lasted more than 1.5-2 hours, so I didn't expect this one to last any longer, which it doesn't. I get a solid 1.5 hours out of it each time. With this being $70 cheaper than buying name brand, it's definitely worth it (especially when the original was totally dead). Like others have said, it does fit a lot more snug into a Toshiba Satellite. A little too snug for mine. So snug that it pops out a little bit. That hasn't really been an issue...yet. I've only had the battery now for a few days, so if I have any issues I will update!",2013-06-30
US,42934173,R2EWDR7GZZBPG8,B00DZQE2Y6,477231720,Real Happy Family: A Novel,Digital_Ebook_Purchase,1,2,2,N,N,Real Unhappy Reader,"Real Happy Family marks my second venture into fiction trying to capitalize on the reality TV idea (the first one being the movie “EdTV”). I find the idea interesting and quite a twist to the otherwise insipid reality TV. EdTV wasn’t a hit for me, but I didn’t let that stop me from reading this book…<br /><br />Unfortunately, this book also wasn’t a hit for me, making it two failed tries in my attempt to find a good reality-TV-driven concept. The synopsis sounded interesting: a behind-the-scenes look involving a family who more or less have their feet in the door in terms of the entertainment industry. There are three main stories at play here: Lorelei’s story as the daughter who almost made it but became the laughingstock instead because of her mother, Colleen’s story as the mother who dotes on Lorelei a little too much and ends up being an embarrassment instead, and Robin’s story as the wife/sister-in-law/daughter-in-law who owns a talent agency, is technically Lorelei’s agent and sister-in-law, and is plagued with conception problems. It really sounds as if their stories should be connected together, but to me it felt like Robin’s story was the odd one out. Colleen and Lorelei’s stories work side by side because of their relationship, but Robin’s conundrums aren’t really worth the hassle other than the fact that she’s Lorelei’s sister-in-law and agent.<br /><br />The beginning of the book was also rather discombobulating; it jumped from one time period to another (quite unnecessarily, in my opinion) and it took a long while to establish our main characters. As the chapters jumped from one character focus to another, it wasn’t long before the whole task of keeping up with people became tedious.<br /><br />The ending was simply ho-hum to me, not really something that I was expecting. Then again, I wasn’t really expecting anything, considering how I got no impression of a common goal to be resolved among the characters. Just like reality TV, the whole thing droned on, spewed useless drama, and killed any attempt of establishing interest.reality TV, the whole thing droned on, spewed useless drama, and killed any attempt of establishing interest.",2014-05-01
US,21612288,R2TD5FE58CCI7S,B00CCHXF2E,575068519,Kissing Fire (Edge Series Book 3),Digital_Ebook_Purchase,5,0,0,N,Y,Fantastic,Preston and Avery were a great couple! Once again A.M. Hargrove wrote a fantastic book. I also enjoy the fact that you can read any books in her series as stand alone books or even better read the whole series. I can't wait for her next Romance book.,2014-02-08
US,14071923,RPIQMG02QUR6A,B0007PALCU,443257110,Lifehouse [Enhanced CD],Music,5,0,0,N,N,Great New Album!,"Lifehouse's new self titled album is great! After their first hit, \""Hanging By A Moment\"", Lifhuose hasn't stopped writing great music. This new CD is great! You'll be jamming and singing along to these awesome new tunes in no time!",2005-03-29
US,32624,RTSEM73GNW6WF,B004YDSNMQ,433528565,The Kennedys,Video DVD,5,1,1,N,Y,Five Stars,Great deal,2014-12-13
US,26236554,RAKPIDCF4MRBK,B00S9VX6DA,986545715,Romance: Quickies (Encounter 4) (Billionaire Romance),Digital_Ebook_Purchase,5,1,1,N,N,A painful past in Shae’s life has left her heartbroken,"A painful past in Shae’s life has left her heartbroken. Can you afford to let love in, no, or can she? There is the seductive millionaire Jack. Jack sets her on fire with the passion and hot sex. Could there more for them? You will just have to wait and read more. I am hooked on this series! I cannot wait to see what happens.",2015-01-22


### 4.4. Show *demo.amzn_reviews* table data distribution across segments:

In [5]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/4-4-data-distrib-amzn-reviews.sql

display_html('\n'.join(sqlfilecode), raw=True)

query = !cat script/4-4-data-distrib-amzn-reviews.sql

%sql $DB_USER@$DB_SERVER {''.join(query)}

24 rows affected.


gp_segment_id,count
0,4298868
1,4296740
2,4291976
3,4297200
4,4300819
5,4297083
6,4301096
7,4298972
8,4297364
9,4296533


### 4.5. *demo.amzn_reviews* Table Size and Disk Space Usage

#### 4.5.1. Using PostgreSQL System Administration Functions (PG 8.4)

In [6]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/4-5-1-object-size-and-disk-space.sql

display_html('\n'.join(sqlfilecode), raw=True)

query = !cat script/4-5-1-object-size-and-disk-space.sql

%sql $DB_USER@$DB_SERVER {''.join(query)}

2 rows affected.


schemaname,tablename,size,level
demo,amzn_reviews,59 GB,Disk space used by the table or index.
demo,amzn_reviews,60 GB,"Total disk space used by the table, including indexes and toasted data."


#### 4.5.2. Using the `gp_toolkit` Administrative Schema (Greenplum 5.x)

In [7]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/4-5-2-object-size-and-disk-space.sql

display_html('\n'.join(sqlfilecode), raw=True)

query = !cat script/4-5-2-object-size-and-disk-space.sql

%sql $DB_USER@$DB_SERVER {''.join(query)}

1 rows affected.


schema,relation,tablesize,toastsize,othersize,tabledisksize,indexsize
demo,amzn_reviews,59 GB,195 MB,0 bytes,60 GB,0 bytes


### 4.6. Check table for Data Skew
Data skew may be caused by uneven data distribution due to the wrong choice of distribution keys or single tuple table insert or copy operations. Present at the table level, data skew, is often the root cause of poor query performance and out of memory conditions. Skewed data affects scan (read) performance, but it also affects all other query execution operations, for instance, joins and group by operations.

It is very important to *validate* distributions to ensure that data is evenly distributed after the initial load. It is equally important to *continue* to validate distributions after incremental loads.

The following query shows the number of rows per segment as well as the variance from the minimum and maximum numbers of rows:

In [8]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/4-6-1-data-skew.sql

display_html('\n'.join(sqlfilecode), raw=True)

query = !cat script/4-6-1-data-skew.sql

%sql $DB_USER@$DB_SERVER {''.join(query)}

1 rows affected.


Table Name,Max Seg Rows,Min Seg Rows,Percentage Difference Between Max & Min
demo.amzn_reviews,4301096,4291976,0.2120389779721261


The `gp_toolkit` schema has two views that you can use to check for skew.
- The `gp_toolkit.gp_skew_coefficients` view shows data distribution skew by calculating the coefficient of variation (CV) for the data stored on each segment. The `skccoeff` column shows the coefficient of variation (CV), which is calculated as the standard deviation divided by the average. It takes into account both the average and variability around the average of a data series. The lower the value, the better. Higher values indicate greater data skew.

In [9]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/4-6-2-data-skew.sql

display_html('\n'.join(sqlfilecode), raw=True)

query = !cat script/4-6-2-data-skew.sql

%sql $DB_USER@$DB_SERVER {''.join(query)}

1 rows affected.


schemaname,tablename,coefficient
demo,amzn_reviews,0.0529644564132474


- The `gp_toolkit.gp_skew_idle_fractions` view shows data distribution skew by calculating the percentage of the system that is idle during a table scan, which is an indicator of computational skew. The `siffraction` column shows the percentage of the system that is idle during a table scan. This is an indicator of uneven data distribution or query processing skew. For example, a value of 0.1 indicates 10% skew, a value of 0.5 indicates 50% skew, and so on. Tables that have more than 10% skew should have their distribution policies evaluated.

In [10]:
sqlfilecode = !pygmentize -f html -O full,style=colorful -l postgres script/4-6-3-data-skew.sql

display_html('\n'.join(sqlfilecode), raw=True)

query = !cat script/4-6-3-data-skew.sql

%sql $DB_USER@$DB_SERVER {''.join(query)}

1 rows affected.


schemaname,tablename,fraction
demo,amzn_reviews,0.0007849840288769


## Continue to Part 3 of Greenplum Database  Concepts Explained; **[MPP Fundamentals and Partitioning](AWS-GP-demo-3.ipynb)**.