# aq_rst

## Overview
This command allows users to convert flattened column keyed data into pivot table. Details of the syntax and available options can be referred by `man aq_rst`, but basic syntax is followings.

```aq_rst [input spec] -key colName1 -lab colName2 -val colname3```
- `-key` - column name(s) to form key (or unique row names in result's table), can be any data type. (This is close to pandas' `df.pivot_table()`'s `index`)
- `-lab` - column name to form the columns in the result table. (Close to pandas' `df.pivot_table()`'s `column`)

## Basics
Let's take a look at super simple example.

**Input Table**<br>

Sex|Class|Count
---|---|---
female|First|91
female|Second|70
female|Third|72
male|First|45
male|Second|17
male|Third|47

The table above is survivor count of RMS Titanic from [Titanic dataset](https://www.kaggle.com/c/titanic). which is the input to the `aq_rst` command. <br>
Roles of the each columns in the tables changes before and after `aq_rst` command's execution like below.

**Input Table** ->    **Result Table**
- `-key` - Sex -> Row Names
- `-lab` - Class -> Columns' Names
- `-val` - Count -> values in the cells

**Result Table**<br>

Sex|First|Second|Third
---|---|---|---
male|45|17|47
female|91|70|72


In practice, it is common to use `aq_cnt` command to generate "summary of data" as input table, and pass the output to `aq_rst`. <br>

However in this tutorial the input data are already created for you. 

In [30]:
#taking a look at the input dataset
class_sex="data/aq_rst/survivor_by_class_sex.csv"
class_sex_col="S:Sex S:Class I:Count"
aq_pp -f,+1 $class_sex -d $class_sex_col

"Sex","Class","Count"
"female","First",91
"female","Second",70
"female","Third",72
"male","First",45
"male","Second",17
"male","Third",47


In [31]:
aq_rst -f,+1 $class_sex -d $class_sex_col -key Sex -lab Class -val Count

"Sex","First","Second","Third"
"male",45,17,47
"female",91,70,72


## Advanced

**Using multiple columns as `-key` (row index)**<br>
You can set multiple columns as `-key` option's argument in order to create multi-level row index table.<br>
The input data is again summary of the titanic dataset of choosen columns, Sex, Alone (whether a passenger was on board alone), Class, Count.

In [44]:
# Takinga look at data for multiple column
mul_col_data="data/aq_rst/multiple_cols.csv"
mul_cols="S:Sex S:Alone S:Class I:Count"

aq_pp -f,+1 $mul_col_data -d $mul_cols

"Sex","Alone","Class","Count"
"female","False","First",60
"female","False","Second",44
"female","False","Third",84
"female","True","First",34
"female","True","Second",32
"female","True","Third",60
"male","False","First",47
"male","False","Second",36
"male","False","Third",83
"male","True","First",75
"male","True","Second",72
"male","True","Third",264


In [45]:
# Creating pivot table, Sex and Alone as key (row index)

aq_rst -f,+1 $mul_col_data -d $mul_cols -key Sex Alone -lab Class -val Count

"Sex","Alone","First","Second","Third"
"male","False",47,36,83
"female","True",34,32,60
"female","False",60,44,84
"male","True",75,72,264


Now the data is clearer to analyze. Given same gender,
more numbers of male who are alone survived, and more number of non-alone female survived than thier counterparts.

### Options

**`-def`**<br>
This option sets a default value to fill a blank, either 0 or empty string depending on label's data type.<br>
Let's consider a sample dataset below that demonstrates possible correlation between financial status, and employment and dating status.

**Input Table**<Br>
    
status|job|finance
---|---|---
single|yes|great
single|no|bad
taken|yes|okay


**Result Table**<br>

status|yes|no
------|---|---
taken |okay|
single|great|bad


Input table have every combinations of keys and labels, except status: taken, job:no. By default, this will be filled up with an empty string like result table above.

Let's take a look in action.

In [50]:
finance="data/aq_rst/def.csv"
fin_col="S:Status S:Job S:Finance"
aq_rst -f,+1 $finance -d $fin_col -key status -lab job -val finance 

"Status","no","yes"
"taken",,"okay"
"single","bad","great"


We can fill this empty cell with any default value we'd like with `-def` option

In [48]:
aq_rst -f,+1 $finance -d $fin_col -key status -lab job -val finance -def "negative"

"Status","no","yes"
"taken","negative","okay"
"single","bad","great"


**`-olab`**<br>

_Labels and Result's table's column names_<br>
When using `aq_rst`, sometimes you'd like to use columns that contains long strings (such as person's name or title of a book etc). In these cases, column names in resulting table might not be valid. 

This option let you output a lookup table of label value (column name) and columns' id. Using the titanic dataset with selected columns (name, sex, survived), which looks like below, let's take a look

**Input Table**<br>

Survived|Sex|Name
--------|---|----
0|male|Mr. Owen Harris Braund
1|female|Mrs. John Bradley (Florence Briggs Thayer) Cumings
1|female|Miss. Laina Heikkinen

**Output Table**<br>

Sex|Miss. Laina Heikkinen|Mr. Owen Harris Braund|Mrs. John Bradley (Florence Briggs Thayer) Cumings
male

In [55]:
# This will output the label's values as column name, might be invalid...
passengers="data/aq_rst/titanic_psg_names_.csv"
psg_col="I:Survived S:Sex S:Name"

# setting -def as 999 to distinguish between empty cell vs cell with 0 inside it
aq_rst -f,+1 $passengers -d $psg_col -key Sex -lab Name -val Survived -def "999"

"Sex","Miss. Laina Heikkinen","Mr. Owen Harris Braund","Mrs. John Bradley (Florence Briggs Thayer) Cumings"
"male",999,0,999
"female",1,999,1


This is messy and likely to contain invalid column names. Using `-olab` option, you can label the columns as `_1`, `_2`, `_3` .... and output lookup table for the column names. Let's take a look.

In [57]:
# outputting the lookup table on stdout by specifying - on -olab option.
aq_rst -f,+1 $passengers -d $psg_col -key Sex -lab Name -val Survived -def "999" -olab -

"id","label"
"_1","Miss. Laina Heikkinen"
"_2","Mr. Owen Harris Braund"
"_3","Mrs. John Bradley (Florence Briggs Thayer) Cumings"
"Sex","_1","_2","_3"
"male",999,0,999
"female",1,999,1


The top 4 lines above are the lookup table, while bottom 3 lines are the pivot table.<br>
In practice, you can save the lookup table in seperate file from the pivot table, in case when you have a pivot table with hundreds of long string columns.

**`-ord`**<br>

This option is intended for merge operation, but it sorts the output by the keys (row index).
Using the titanic's dataset (Class, Sex and Surviver Count), 

Sex|Class|Count
---|---|---
female|First|91
female|Second|70
female|Third|72
male|First|45
male|Second|17
male|Third|47

let's take a look.

In [61]:
# sorting by the class, in alphabetical order
aq_rst -f,+1 $class_sex -d $class_sex_col -key class -lab sex -val count -ord

"Class","female","male"
"First",91,45
"Second",70,17
"Third",72,47
