[Back to README](README.ipynb)


# aq_udb

## Overview
`aq_udb` command let users interact with UDB (User Database) to perform much more complex analysis and queries than aq_pp efficiently. UDB is distributed, hash-based, in memory database, and is designed to process vast amount of data in a computing cluster, where each node is responsible for unique set of primary keys in the database. 

`aq_udb` command is used to perform data cleaning and transforming with UDB once the database is created. 

**What is a primary key in UDB?**<br>
Primary key in udb is slightly different from the one in relational database, in a following way.
* It does not has to be unique throughout the table
* only composed of ONE column

In UDB, the primary key is used to **associate multiple records that belongs to one entity (people, period of time, or product categories for example)**. <br>
E.g. When analyzing customers' data from e-commerce website's access log, we can set customer_Id as our primary key, and analyze records that belongs to each customer.
### Components of UDB
* **Database:** User database (Database Server), that contains one or more of the following components
* **Table:** similar to TABLE in MySQL, with its schema that is composed of column names and it's corresponding datatypes, and attributes. Similar to column spec of aq_commands.
* **Vector:** a single record of data, on which can specify [attributes](http://auriq.com/documentation/source/reference/tables/index.html#attributes) with each column, and stream data within it to calculate values.
* **Variable:** Variable to store value

**Note:** Keep in mind that the definitions of these components are immutable once created. Users can only change data / values inside it with command. 

## Data and Database setup

We'll be using [amazon customer review dataset](https://s3.amazonaws.com/amazon-reviews-pds/readme.html)'s international marketplace's data, specifically customer reviews from Japan and UK.

**Database**<br>
Followings are some important info about the database we'll create.
* Database: `amazon`
* Table: `reviews`
> Column spec for table: `S:marketplace I:customer_id S:review_id S:product_id I:product_parent S:product_title S,pkey:product_category I:star_rating I:helpful_votes I:total_votes S:vine S:verified_purchase S:review_headline S:review_body S:review_date`
* Primary key (column): `product_category`

## Contents
In this notebook, first we'll go though the general steps of managing database, then go over examples for each option of `aq_udb` command.

### Manage UDB

#### [Prepare Database](#prep_db)


#### [Check Database State](#check_db)

#### [Clean Up Database](#clean_db)


### [aq_udb](#aq_udb_option)<br>

As a example notebook, some of the options will be covered here, but not all. For the list of available options, please refer to `man aq_udb` command.

* [`-exp`](#exp) - export data from udb.
* [`-top [Start:]Num`](#top) - limit the output result to `Num` of records from top of the DB
* [`-last [Start:]Num`](#last) - same as above, but from buttom
* [`-lim_rec Num`](#lim_rec) - output `Num` of records
* [`-lim_key Num`](#lim_key) - output `Num` of unique keys and associated records only
* [`-key_rec Num`](#key_rec) - output `Num` of records per unique key
 of the dB 
* [`-sort`](#sort) - sort data that are being exported from udb. Not the data within the database.
* [`-ord`](#ord) - sort keys in DB, or records in table internally
* [`-shf`](#shf) - shuffle keys or records in DB internally
* [`-cnt`](#cnt) - count unique primary keys in DB
* [`-eval`](#eval) - same as `aq_pp`'s option
* [`-filt`](#filt) - same as `aq_pp`'s option

#### Advanced Options (Under Construction)

* [`-pp`](#pp) - allows users to define processing group(s) in which complex ETL steps can be defined for each table.
* [`-var`](#var) - assign value to predefined variable (global variable)
* [`-bvar`](#bvar) - same as `-var` but for local variable


<a id='prep_db'></a>
### Preparing Database
In this section we'll cover steps to prepare udb for use for the first time.
This includes steps below
1. selecting datastore and creating data category
2. taking a look at data, and getting column spec
3. creating database schema, based on the column spec
4. starting database server

Let's start with preparing the data we'll use.

**Prepare Data**

In [2]:
# select datastore, which is s3 bucket
ess select essentia-playground

# display the directory structures and files stored in the bucket
ess ls /tsv/ | head -n 10

# create data category that pick only Japan and UK's reviews
ess category add amazon \
 '/tsv/amazon_reviews_multilingual_UK_v1_00.tsv.gz /tsv/amazon_reviews_multilingual_JP_v1_00.tsv.gz'
 
# get info about the category, print out only top 10 lines
ess summary amazon | head -n 10

 230M Nov 12 23:31    /tsv/amazon_reviews_multilingual_DE_v1_00.tsv.gz
  67M Nov 12 23:31    /tsv/amazon_reviews_multilingual_FR_v1_00.tsv.gz
  90M Nov 12 23:31    /tsv/amazon_reviews_multilingual_JP_v1_00.tsv.gz
 333M Nov 12 23:31    /tsv/amazon_reviews_multilingual_UK_v1_00.tsv.gz
 1.4G Nov 12 23:31    /tsv/amazon_reviews_multilingual_US_v1_00.tsv.gz
 618M Nov 12 23:31    /tsv/amazon_reviews_us_Apparel_v1_00.tsv.gz
 555M Nov 12 23:31    /tsv/amazon_reviews_us_Automotive_v1_00.tsv.gz
 340M Nov 12 23:31    /tsv/amazon_reviews_us_Baby_v1_00.tsv.gz
 871M Nov 12 23:31    /tsv/amazon_reviews_us_Beauty_v1_00.tsv.gz
 2.6G Nov 12 23:31    /tsv/amazon_reviews_us_Books_v1_00.tsv.gz
Name:        amazon
Pattern:     tsv/amazon_reviews_multilingual_JP_v1_00.tsv.gz /tsv/amazon_reviews_multilingual_UK_v1_00.tsv.gz
Exclude:     None
Date Format: auto
Date Regex:  
Archive:     
Delimiter:   Tab
# of files:  2
Total size:  423.5MB
File range:  1970-01-01 - 1970-01-01


Now we have some information that we need about the data in order to define database schema. We have column spec. 
Let's go ahead and create database and table now.

<a id='db_creation'></a>
**Database Creation**<br>
We'll create 
* database named `amazon` 
* table `reviews` with 
    * schema - column spec of the data from the data category.
    > When creating schema, we need to specify primary key column, much like SQL database. Here is the schema.
    `S:marketplace I:customer_id S:review_id S:product_id I:product_parent S:product_title S,pkey:product_category I:star_rating I:helpful_votes I:total_votes S:vine S:verified_purchase S:review_headline S:review_body S:review_date`
    * Note the `I,pkey:product_category`, `pkey` specify that this column is the primary key of the table.
    
We can use  `ess create {}` where you can specify entity to create and pass in name of the entity.

After that, we'll start database server.

In [3]:
# delete database, schema and data if they already exist
ess server reset 
# creating database named amazon
ess create database amazon
# creating table named reviews, with the schema
ess create table reviews S:marketplace I:customer_id S:review_id S:product_id I:product_parent S:product_title S,pkey:product_category I:star_rating I:helpful_votes I:total_votes S:vine S:verified_purchase S:review_headline S:review_body S:review_date
# creating vector for experiment
ess create vector users S,pkey:product_category I,+add:total_votes 
# create variable
ess create variable I:star_sum 
# start the db server
ess udbd start

ip-10-10-1-118: Starting udbd-10010.
ip-10-10-1-118: udbd-10010 (15578) started.


<a id='db_population'></a>
**Populate it with Data**<br>

Now we'll fill up the database with the review dataset, using essentia's stream and `aq_pp` command. Note that **`-imp` option is used to direct the output into the database**.

In [4]:
ess stream amazon "*" "*" 'aq_pp -f,+1,tsv,eok,qui - -d %cols -imp amazon:reviews' 

Using `aq_udb` with [`-exp`](#exp) command (which will export the data from db), we can see that the data is inside of the database. 

In [4]:
ess exec "aq_udb -exp amazon:reviews -lim_rec 40"

"marketplace","customer_id","review_id","product_id","product_parent","product_title","product_category","star_rating","helpful_votes","total_votes","vine","verified_purchase","review_headline","review_body","review_date"
"UK",10349,"R2YVNBBMXD8KVJ","B00MWK7BWG",307651059,"My Favourite Faded Fantasy","Music",5,0,0,"N","Y","Five Stars","The best album ever!","2014-12-29"
"UK",13070,"R1P16QCZR7RHM","B00004WMYB",530484605,"The Marshall Mathers LP","Music",1,1,7,"N","Y","scratches n a crack","im very disappointed in amazon, theyre startin 2 sell used albums in the new section, i ordered this new and it came with scratches and a crack, track 19 couldnt even play all the way, smh, amazon has got 2 do better","2013-07-30"
"UK",17139,"R75U5MUIZ9T0D","B009O36EO0",269758980,"Heal","Music",5,3,4,"N","Y","MAGIC!!!","Euphoria is one of the reason why I bought this album, since the victory @ Eurovision Song Contest 2012. I'm Indonesian, but I watched the show, and love her performance.<br />My Fave 

"UK",26110,"REZH011WSY07T","B00JDB4PEY",162064352,"Xscape","Music",5,0,1,"N","Y","Five Stars","very nice","2014-12-06"
"UK",26466,"R3MB8B8WF83RRX","B006M4RN3U",239925844,"Some Nights","Music",5,0,0,"N","Y","I love it","This is a great cd for all Fun fans,I bought this for myself.I have played this 100 times are more,its greatttttttt","2013-10-25"
"UK",26531,"R3T3MP46QP1UFI","B00O3UBB1U",384373789,"Rock Or Bust","Music",2,1,11,"N","Y","Two Stars","let not as good as rest of cds","2014-12-07"
"UK",26531,"R4TZRC8XV28LH","B00NPZI1ZS",260957349,"The Endless River","Music",3,0,0,"N","Y","Three Stars","could be better not as good as i expected","2014-12-07"
"UK",26531,"RXN0JWT8P1UDT","B00BRBJSHC",968604149,"Wrote A Song For Everyone","Music",4,0,0,"N","Y","Four Stars","nice cd","2015-04-30"
"UK",26810,"RE41TPZJV43SZ","B000069RDQ",347274879,"THE RISING","Music",4,0,0,"N","Y","Very good album","This is amazing stuff.  While initially it took a while to fully enjoy and appreciate this, we've had

<a id='check_db'></a>
### Checking the database state

Here, we'll go over 3 useful commands to check the state of database server. For details and syntax of each command, refer to man page. 

`ess server summary` provide overall information and status of the database server, such as database, table, vectors and its schema.

In [5]:
ess server summary

DATABASE : amazon (active)
   TABLE :reviews	S:marketplace I:customer_id S:review_id S:product_id I:product_parent S:product_title S,pkey:product_category I:star_rating I:helpful_votes I:total_votes S:vine S:verified_purchase S:review_headline S:review_body S:review_date
  VECTOR : (none)
     VAR : (none)

ip-10-10-1-118: (+) udbd-10010 (15473) running.


`udbd status` return if the server is running or not.

In [6]:
ess exec "udbd status"

ip-10-10-1-118: (+) udbd-10010 (15473) running.


`aq_udb`'s option `-inf` providee information about a specific database running on a server.

In [7]:
aq_udb -inf amazon

"memx","strx","pkey","var","reviews"
1277118528,4914141,33,0,1969910


<a id='clean_db'></a>
### Clean Up Dabase
Now we know how to create and get infomation about our ubd and server, let's learn how to clean it up after using it, based on user cases.


**1. Stop database server and delete schema**<br>
* `ess server reset`: use this when you're not planning on using the database and its schema again.

**2. Stop the database server, but preserve schema**<br>
* `(ess) udbd stop`: perfect for when you'd like to shut off your instance, but would like to come back and use the database again. 
* To use it again, start the server with `ess udbd start` and fill it up with data.

**3. Clear up the data inside of database**<br>
* `aq_udb -clr`: empty the data from database, tables, etc. Use this to repopulate tables/whole database. 
    * with Table name, empty the data but preserves the DB and table schema
    * with database name, empty the whole database, you can refill the data again without creating schema
            
            
On the following 3 cells, we'll demonstrate each commands, and output the results using `ess server summary`

In [39]:
# case 1, delete everything
ess server reset
ess server summary

ip-10-10-1-118: No running server detected.


In [43]:
# stop the database server, preserve schema
ess udbd stop
ess server summary

ip-10-10-1-118: Stopping udbd-10010 (5182).
ip-10-10-1-118: udbd-10010 stopped.
DATABASE : amazon (active)
   TABLE :reviews	S:marketplace I,pkey:customer_id S:review_id S:product_id I:product_parent S:product_title S:product_category I:star_rating I:helpful_votes I:total_votes S:vine S:verified_purchase S:review_headline S:review_body S:review_date
  VECTOR : (none)
     VAR : (none)

2019-11-15 01:40:54 ip-10-10-1-118 ess[5291]: ***Error*** ip-10-10-1-118: (-) udbd-10010 not running.



: 1

In [28]:
# clean up the data only
aq_udb -clr amazon:reviews
# check if the table is filled with data
aq_udb -exp amazon:reviews 

"marketplace","customer_id","review_id","product_id","product_parent","product_title","product_category","star_rating","helpful_votes","total_votes","vine","verified_purchase","review_headline","review_body","review_date"


<a id='aq_udb_option'></a>
## aq_udb options

Now we'll take a look at each options of `aq_udb` command. Before going through this section, go over [preparing database](#prep_db) section and make sure database is running and filled with data.

<a id='exp'></a>
### -exp

This option export the data from given `DatabaseName:TableName`, or if only `DatabaseName` is given, then it'll export the primary keys from database. <br>

This option is used with many other options in order to process the data.
Let's take a look.

In [12]:
# exporting the data from table (-top 20 limits the output to 20 records)
aq_udb -exp amazon:reviews -top 30

"marketplace","customer_id","review_id","product_id","product_parent","product_title","product_category","star_rating","helpful_votes","total_votes","vine","verified_purchase","review_headline","review_body","review_date"
"UK",10349,"R2YVNBBMXD8KVJ","B00MWK7BWG",307651059,"My Favourite Faded Fantasy","Music",5,0,0,"N","Y","Five Stars","The best album ever!","2014-12-29"
"UK",10629,"R2K4BOL8MN1TTY","B006CHML4I",835010224,"Seiko 5 Men's Automatic Watch with Black Dial Analogue Display and Blue Fabric Strap SNK807K2","Watches",4,0,0,"N","Y","Great watch from casio.","What a great watch. Both watches and strap is in a great quality, and the prize is low. Especially compared to the price here in Denmark.","2013-10-24"
"UK",12136,"R3P40IEALROVCH","B00IIFCJX0",271687675,"Dexter Season 8","Digital_Video_Download",5,0,0,"N","Y","fantastic","love watching all the episodes of Dexter, when i first heard about this series i wasnt too sure about watching it. it took me a very long time to start and 

"UK",20583,"R28IKPKZMZZV52","B00A6HL704",438645704,"The Twilight Saga: The Complete Collection [DVD]","Video DVD",4,0,3,"N","Y","all good apart from postage","didn't turn up on time and paid extra for day of release. DVD fine just the timing was awful which ruined the experience.","2013-03-15"
"UK",20725,"R21DHG6AOGXIZ6","B00IABBXIO",777928797,"RATHER BE - CLEAN BANDIT","Music",5,0,1,"N","Y","Top tune","Bought this single as it was number 1 the day my daughter was born. Didn't like it at 1st but it's grown on me and now I love it. Great catchy tune.","2014-04-12"
"UK",20849,"R2Z32STUPPU8O4","B00FDPK2JQ",650376702,"Salute","Music",5,0,0,"N","Y","music","bought for son for christmas he loves  liitle mix just started listening to music after turning 12 yrs of age great for teenages","2014-01-07"
"UK",20849,"R2K985KNJXCYX0","B004B8NBQW",702526844,"Bright Lights","Music",5,0,0,"N","Y","amazing","bought for hubby for christmas great fan of ellie goulding love all of her music fast delivery i

In [12]:
# now outputting primary keys only by DBName:. 
aq_udb -exp amazon -eval star_rating 'star_rating * 2'

Server(127.0.0.1:10010) error: processing rule: Column "star_rating" not found

aq_udb: Udb request invalid


: 34

In [11]:
aq_udb -exp amazon:reviews -c product_category star_rating

"product_category","star_rating"
"Music",5
"Music",1
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",1
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",2
"Music",3
"Music",4
"Music",4
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",1
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",2
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music"

"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",4
"Music",4
"Music",5
"Music",4
"Music",5
"Music",4
"Music",4
"Music",4
"Music",4
"Music",4
"Music",3
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",4


"Music",5
"Music",4
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",1
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",2
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",2
"Music",4
"Music",5
"Music",2
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5


"Music",5
"Music",4
"Music",4
"Music",4
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",3
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",2
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",4
"Music",1
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5


"Music",5
"Music",5
"Music",4
"Music",4
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",4
"Music",1
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",1
"Music",5
"Music",4
"Music",4
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",4
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",5
"Music",4
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5


IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",2
"Music",1
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",4
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5


"Music",5
"Music",3
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",4
"Music",4
"Music",3
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",2
"Music",4
"Music",2
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",4
"Music",5
"Music",4
"Music",5
"Music",3
"Music",1
"Music",4
"Music",5
"Music",2
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",3
"Music",5
"Music",1
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",4
"Music",5
"Music",3
"Music",4


"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",1
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",2
"Music",1
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",1
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",4
"Music",1
"Music",5
"Music",1
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5


"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",2
"Music",4
"Music",5
"Music",5
"Music",5
"Music",2
"Music",5
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",4
"Music",5
"Music",5
"Music",1
"Music",5
"Music",4
"Music",5
"Music",3
"Music",5
"Music",5
"Music",4
"Music",4
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",4
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",2
"Music",5
"Music",3
"Music",5
"Music",5
"Music",4


IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [39]:
# now outputting primary keys only by just DBName
aq_udb -exp amazon -c product_category -top 10

"product_category"
"Music"
"Watches"
"Digital_Video_Download"
"Toys"
"Digital_Ebook_Purchase"
"Books"
"Video DVD"
"Mobile_Apps"
"Wireless"
"Electronics"


In [47]:
## Trying out vector thing
aq_udb -exp amazon:reviews -o,notitle - -c product_category total_votes | \
aq_pp -f - -d s:product_category i:total_votes -imp amazon:users

<a id='top'></a>
### -top

This option limits the number of outputs to given number of records. Or users can specify the range of records to output. Let's take a look.

In [20]:
# output onlly the top 5 records
aq_udb -exp amazon:reviews -c marketplace review_id star_rating -top 5

"marketplace","review_id","star_rating"
"UK","R2YVNBBMXD8KVJ",5
"UK","R1P16QCZR7RHM",1
"UK","R75U5MUIZ9T0D",5
"UK","R3GVFV7NPBFAFJ",5
"UK","R21DHG6AOGXIZ6",5


In [94]:
# output records from 3 ~ 8
aq_udb -exp amazon:reviews -c marketplace review_id -top 3:5

"marketplace","review_id"
"UK","R75U5MUIZ9T0D"
"UK","R3GVFV7NPBFAFJ"
"UK","R21DHG6AOGXIZ6"
"UK","R2Z32STUPPU8O4"
"UK","R2K985KNJXCYX0"


<a id='last'></a>
### -last

Same as `-top`, but for retrieving record from the bottom of the database.

In [95]:
aq_udb -exp amazon:reviews -c marketplace review_id -last 5

"marketplace","review_id"
"UK","R1MNP9B6P9NZJX"
"UK","RLYB814UIM74F"
"JP","R27OUW6Q4KD4M"
"JP","R1EP2I6ARAO8V7"
"JP","RDB0HV9RADHQM"


In [97]:
# range can be specified also
aq_udb -exp amazon:reviews -c marketplace review_id -last 3:5

"marketplace","review_id"
"UK","R2O80C6DILMYCN"
"UK","R30YISPMTGGOGF"
"UK","R1MNP9B6P9NZJX"
"UK","RLYB814UIM74F"
"JP","R27OUW6Q4KD4M"


<a id='lim_rec'></a>
### -lim_rec

Limit the export result to provided number of records approximately.

In [8]:
aq_udb -exp amazon:reviews -c marketplace review_id -lim_rec 2

"marketplace","review_id"
"UK","R2YVNBBMXD8KVJ"
"UK","R1P16QCZR7RHM"


<a id='lim_key'></a>
### -lim_key

Limit the output result to approximately given number of unique keys.
Let's limit the output to 3 product categories.

In [10]:
# outputting 3 product categories' records only, then piping it to aq_cnt to clearify how many product_categories exist.
aq_udb -exp amazon:reviews -c product_category review_id -lim_key 3 | \
aq_cnt -f,+1 - -d S:product_category S:review_id -kx - categories product_category

"product_category"
"Digital_Video_Download"
"Watches"
"Music"


<a id='key_rec'></a>
### -key_rec

Limit the output of `-exp` to a number of records per each uniuqe key value.

In this example will output 5 numbers of records per product category.

In [100]:
aq_udb -exp amazon:reviews -c product_category review_id -key_rec 5

"product_category","review_id"
"Music","R2YVNBBMXD8KVJ"
"Music","R1P16QCZR7RHM"
"Music","R75U5MUIZ9T0D"
"Music","R3GVFV7NPBFAFJ"
"Music","R21DHG6AOGXIZ6"
"Watches","R2K4BOL8MN1TTY"
"Watches","R1ONBLBRBW5IH8"
"Watches","R3C0KKBKGI8DX2"
"Watches","R2FEAD5Z8YWCDA"
"Watches","R2ERSR2PE6L4H8"
"Digital_Video_Download","R3P40IEALROVCH"
"Digital_Video_Download","RCLXN2CQ9AZFQ"
"Digital_Video_Download","RFRQX0KZKLIVG"
"Digital_Video_Download","RZ4X6LS13SLD6"
"Digital_Video_Download","R1YBG92V65P5CV"
"Toys","R25XL1WWYRDLA9"
"Toys","RN6TFJLG0LK08"
"Toys","R253TMIYVMIAIR"
"Toys","R27DCVPHY22QWG"
"Toys","R3GW05VQC6HV3Y"
"Digital_Ebook_Purchase","RVTVB9YDXSFYH"
"Digital_Ebook_Purchase","R3DHVC6SGQS5JU"
"Digital_Ebook_Purchase","R335F97BGBKXY7"
"Digital_Ebook_Purchase","R2N8VYUJNFZ5VV"
"Digital_Ebook_Purchase","R1PCZWZL16D206"
"Books","R2F2I7T03D42TE"
"Books","R39A5MVUEY58YJ"
"Books","R309H0D9CVW4OW"
"Books","R3TAKVKZETW923"
"Books","R1FU9OH1KFV7YF"
"Video DVD","R3G5WIW7NNA1CS"
"Video DVD","R2KXGMVFS

<a id='sort'></a>
### -sort

You can sort the data based on given column name, as it is being exported. Note that the records in the database is not sorted.


In [12]:
# sorting based on star_rating column. feel free to try other column as well.
# -c option is used to output given column only
aq_udb -exp amazon:reviews -sort star_rating -top 10 -c customer_id star_rating

"customer_id","star_rating"
13070,1
21420,1
30733,1
53266,1
64588,1
82379,1
89960,1
89960,1
1307923,1
1311067,1


In [13]:
# note that this command does not change the order of data within the table, as you can see in the output
aq_udb -exp amazon:reviews -top 10 -c star_rating

"star_rating"
5
1
5
5
5
5
5
5
4
5


<a id='ord'></a>
### -ord

This option is different from usual sorting you'd encounter in 2 ways.
* sort the data internally (change the order of records within the database) and does not output the result
* sort the records within group of primary key (Groupby Primary key column, then sort) 

Note that **primary key is `product_category`**.

In [51]:
# sorting the records based on star_rating column (within same product category)
aq_udb -ord amazon:reviews star_rating

# display the result, 20 records per category only for clearity
aq_udb -exp amazon:reviews -c product_category star_rating -key_rec 20

"product_category","star_rating"
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video

"Software",1
"Software",1
"Software",1
"Software",1
"Software",1
"Software",1
"Software",2
"Software",2
"Software",2
"Software",2
"Software",2
"Software",2
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Pet Products",1
"Pet Products",1
"Pet Produ

In [52]:
# sort in decsending order as well
aq_udb -ord,dec amazon:reviews star_rating

aq_udb -exp amazon:reviews -c product_category star_rating -key_rec 20

"product_category","star_rating"
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video

"Software",5
"Software",5
"Software",5
"Software",5
"Software",5
"Software",5
"Software",5
"Software",5
"Software",5
"Software",5
"Software",5
"Software",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Pet Products",5
"Pet Products",5
"Pet Produ

<a id='shf'></a>
### -shf

This option shuffle the data internally by keys, so does not produce output. 
Let's check it out.


In [55]:
# First we will sort the data based on star_rating
aq_udb -ord amazon:reviews star_rating
# Output the first 20 records per category of sorted data
aq_udb -exp amazon:reviews -c product_category star_rating -key_rec 20

"product_category","star_rating"
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Music",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Watches",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video_Download",1
"Digital_Video

"Software",1
"Software",1
"Software",1
"Software",1
"Software",1
"Software",1
"Software",2
"Software",2
"Software",2
"Software",2
"Software",2
"Software",2
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Office Products",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",1
"Pet Products",1
"Pet Products",1
"Pet Produ

**Notice the record will be shuffled within each primary key group (product category)**.

In [61]:
# Now applying shuffling 
aq_udb -shf amazon:reviews

# output the result
# Output the first 20 records per category of sorted data
aq_udb -exp amazon:reviews -c product_category star_rating -key_rec 20

"product_category","star_rating"
"Music",5
"Music",4
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",4
"Music",3
"Music",5
"Music",4
"Music",3
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Music",5
"Watches",5
"Watches",2
"Watches",5
"Watches",4
"Watches",5
"Watches",4
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",4
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",5
"Watches",4
"Watches",5
"Watches",5
"Watches",5
"Digital_Video_Download",3
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",1
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",4
"Digital_Video_Download",2
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",4
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",2
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video_Download",5
"Digital_Video

"Software",5
"Software",4
"Software",4
"Software",2
"Software",5
"Software",1
"Software",5
"Software",5
"Software",5
"Software",5
"Software",5
"Software",4
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",4
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",3
"Office Products",5
"Office Products",5
"Office Products",4
"Office Products",5
"Office Products",5
"Office Products",1
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Office Products",5
"Lawn and Garden",4
"Lawn and Garden",4
"Lawn and Garden",5
"Lawn and Garden",1
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",3
"Lawn and Garden",4
"Lawn and Garden",4
"Lawn and Garden",2
"Lawn and Garden",5
"Lawn and Garden",5
"Lawn and Garden",1
"Lawn and Garden",2
"Lawn and Garden",5
"Lawn and Garden",1
"Lawn and Garden",1
"Lawn and Garden",2
"Lawn and Garden",3
"Pet Products",3
"Pet Products",4
"Pet Produ

<a id='cnt'></a>
### -cnt

Counts the numbers of primary keys in database, and if table/vector name is given also, it will outputs numbers of rows as well.

In this example, the primary key is set to `product_category`. So output will be the numbers of unique product category in the dataset.

In [14]:
# Counting numbers of primary keys only
aq_udb -cnt amazonstore

"field","count"
"pkey",33


In [15]:
# counting primary keys and rows also, by providing table's name too
aq_udb -cnt amazon:reviews

"field","count"
"pkey",33
"row",1969910


<a id='eval'></a>
### -eval

Same as the `-eval` option in `aq_pp` command, except that destination column must be either existing column or variable. <br>
Followings options are available as `Expr` as well.
* `aq_emod` functions
* Builtin Variables

**Note**<br>
Remember that the change made to the data by this option is permanent, as this modifies the data within the database.

In the example below, we'll provide `'star_rating * 2'` as expression and store the result in `star_rating` column.

In [15]:
# outputting and modifying the data
aq_pp -exp amazon:reviews -eval star_rating 'star_rating * 2' -top 20 -c star_rating

"star_rating"
10
2
10
10
10
10
10
10
8
10
2
10
6
10
10
10
10
10
10
10


In [16]:
# Change made by -eval option for udb is persistent, unlike aq_pp (it changes the data within the database)
aq_udb -exp amazon:reviews -top 20 -c star_rating

"star_rating"
10
2
10
10
10
10
10
10
8
10
2
10
6
10
10
10
10
10
10
10


In [35]:
# now resetting the change made to the data, by emptying the data and refilling it with original dataset
aq_udb -clr amazon:reviews
ess stream amazon "*" "*" 'aq_pp -f,+1,tsv,eok,qui - -d %cols -imp amazon:reviews' 

<a id='filt'></a>
### -filt
Same as `-filt` option in `aq_pp` command. Note that this command does not modify the data inside of database, unlike `-eval`.


In [37]:
# Filter the result by marketplace == Japan
aq_udb -exp amazon:reviews -filt 'marketplace == "JP"' -c marketplace review_headline -top 20

"marketplace","review_headline"
"JP","残念ながら…"
"JP","残念ながら…"
"JP","ドリームキャスト"
"JP","やっぱりマスト"
"JP","デビューからずっと凄い人"
"JP","Norma 海外版"
"JP","さいこー"
"JP","なんか聞いたことあるような"
"JP","ギターアルバム"
"JP","新生OPETH第２弾"
"JP","これとKIDAは本当に神アルバム"
"JP","映像をみるととても礼儀正しいです"
"JP","My mind is going, I can feel it"
"JP","映像は衝撃的でした"
"JP","時々、発作的に聴きたくなる"
"JP","早くも次作が楽しみ！"
"JP","キュアーを知らない人にもオススメです"
"JP","これはやったらアカンかったんちゃう？"
"JP","マニア向けのアイテム"
"JP","たしか本国では復活作として持て囃されてた記憶がある"


## Advanced Options (Under Construction)
<a id='pp'></a>
### -pp

Grouping option to specify unique processing steps for each tables in a database, enabling you to define extremely complex data processing steps.

In [31]:
aq_udb -scn amazon:. -pp reviews -eval star_rating 'star_rating * 2' -endpp

In [32]:
aq_udb -exp amazon:reviews -top 20 -c star_rating

"star_rating"
10
2
10
10
10
10
10
10
8
10
2
10
6
10
10
10
10
10
10
10


### -pp possible contents

Using `-pp` option, we can specify a table/vector or primary key set to scan and execute processing on, besides the data source (main data source) that are specified by `-exp` or `-scan` command prior. Options within the `-pp` group are executed prior to the scanning / export of the main data source. 

This allows to perform processing that involves data from multiple tables / vectors / primary keys. (For example, comparing data from 2 different tables, and store the larger value in a third table etc.)

So as an example, we'd like to have
* multiple database?
* multiple tables and vectors within / across database(s)
* `-scan dbName:. -pp tableName_within_the_DB ...`


<a id='var'></a>
### -var

Sets the value for predefined (at table creation stage with `ess create` command) variable.

We have defined `star_sum` variable, and stored 0 as initial value at the begining of the notebook ([database creation](#db_creation)). With this variable, we will
1. store the cumulative sum in `star_sum`. 
2. divide it by the `rowNum` to get the temporary average star rating at the row
3. store that value in the star_rating column with `-eval` option.

**FIX THIS NOT WORKING - HAS TO BE USED WITHIN -PP OPTION**

In [5]:
ess server summary

DATABASE : amazon (active)
   TABLE :reviews	S:marketplace I:customer_id S:review_id S:product_id I:product_parent S:product_title S,pkey:product_category I:star_rating I:helpful_votes I:total_votes S:vine S:verified_purchase S:review_headline S:review_body S:review_date
  VECTOR : users	S,pkey:product_category I,+add:total_votes
     VAR : I:star_sum

ip-10-10-1-118: (+) udbd-10010 (15578) running.
