# aq_cat

## Overview

`aq_cat` command is used as an input multiplexier to combine multiple data streams / inputs into one, and forward them as a one data output. The output can be then stored in a file or can be piped into other command. <br>
Input to `aq_cat` can be delimiter-separated-values (CSV or TSV etc) format or aq_tool's internal binary format. 

There are **2 modes of actions in this command, as well as 2 types of input to the command.** <br> They are
**Modes of Actions**<br>
* with column spec
* without column spec

**Inputs**<br>
* named pipe feature
* bash's process substitution

Due to the functional constraints of the notebook, we'll only be able to experiment with bash's process substitution way of passing data to `aq_cat`, but feel free to try named pipe method on your own CLI.

Before we get to the example, have the official documentation at [here](http://auriq.com/documentation/source/reference/manpages/aq_cat.html) open on your side, or refer to man page by running `man aq_cat` on one of the cells below, for referrencing during experienting on the notebook.

With that, let's get started!

### Basic Syntax with Process Substitution

Note that this notebook assumes that you're using bash shell.
* [What is process substitution?](https://www.linuxjournal.com/content/shell-process-redirection)

```aq_cat ... -f <( Command_1 ) <( Command_2 ) | Command_3```<br>
where
* output from `command_1` and `command_2` are directed to `command_3`.

These `command_1/2` can be any command, as long as they produce output data stream.

Let's take a look at this with super simple files and commands.

## Data
We'll be using online customer review data from Japan and France, which are part of [amazon customer reviews dataset](https://s3.amazonaws.com/amazon-reviews-pds/readme.html), which looks like below.

In [20]:
# set file path as variable
jp='data/aq_cat/jp_amazon.csv'
fr='data/aq_cat/fr_amazon.csv'
ramen='data/aq_cat/ramen-ratings.csv'

cat $jp
cat $fr

MarketPlace,product_category,star_rating
JP,Mobile_Apps,1
JP,Mobile_Apps,5
JP,Mobile_Apps,3
JP,PC,5
JP,PC,5
MarketPlace,product_category,star_rating
FR,Shoes,5
FR,Toys,5
FR,Mobile_Apps,4
FR,Digital_Music_Purchase,3
FR,Mobile_Apps,5


Each dataset include 5 datapoints, and only includes reviews from thier own marketplace, Japan and France. <br>

### With Column Spec

**Advantages of providing column specs**<br>
* input and output can be delimiter-separated format or binary format
* output columns can be selected by its names, not the index nubmers

**Disadvantages:**<br>
* need to know the column specs in advance
* can only concatenate data stream with same schema / columns

With the same amazon dataset, we'll specify the column spec and set it to bash variable, then execute the command.


In [13]:
# input with column spec, without +1
cols="S:marketplace S:product_category I:star_rating"
aq_cat -f <(cat $jp) <(cat $fr) -d $cols

"marketplace","product_category","star_rating"
/dev/fd/62: Bad field value: byte=1+41 rec=1 field=star_rating
/dev/fd/63: Bad field value: byte=1+41 rec=1 field=star_rating
aq_cat: Input processing error


: 13

Opps, what happened here? It says `bad field value:...`. The reason why is because we're loading the headers of the data as data stream as well, and applying the column spec. Let's add `+1` to the input spec in order to skip the header.

In [14]:
aq_cat -f,+1 <(cat $jp) <(cat $fr) -d $cols

"marketplace","product_category","star_rating"
"FR","Shoes",5
"FR","Toys",5
"FR","Mobile_Apps",4
"FR","Digital_Music_Purchase",3
"FR","Mobile_Apps",5
"JP","Mobile_Apps",1
"JP","Mobile_Apps",5
"JP","Mobile_Apps",3
"JP","PC",5
"JP","PC",5


**Specifying Output Column**<br>
Let's output the artbitrary column. We'll try index and column names both.

In [16]:
# output column by column names
aq_cat -f+1 <(cat $jp) <(cat $fr) -d $cols -c marketplace star_rating

"marketplace","star_rating"
"FR",5
"FR",5
"FR",4
"FR",3
"FR",5
"JP",1
"JP",5
"JP",3
"JP",5
"JP",5


In [17]:
# output column by column index
aq_cat -f,+1 <(cat $jp) <(cat $fr) -d $cols -c 1 3

Column "1" not found
Invalid parameter "1": ... -c 1 3
aq_cat: Option spec error


: 2



### Without Column Spec
**Advantages of without column spec:**<br>
* simple to use
* allows you to concatenate data stream with different schema / columns

**Disadvantage of without column spec:**<br>
* only support output in delimiter separated format, nothing else (e.g. binary format)
* output column can only be selected by numbers, not column names

We'll use `cat` command to stream the data from these files, then substitute it into `aq_cat` command.

In [3]:
# Process substitution
aq_cat -f <(cat $jp) <(cat $fr)

MarketPlace,product_category,star_rating
JP,Mobile_Apps,1
JP,Mobile_Apps,5
JP,Mobile_Apps,3
JP,PC,5
JP,PC,5
MarketPlace,product_category,star_rating
FR,Shoes,5
FR,Toys,5
FR,Mobile_Apps,4
FR,Digital_Music_Purchase,3
FR,Mobile_Apps,5


As you can see, the input data are combined together, including header. (Did not specify `+1` on the input spec for `aq_cat`. Let's try that to skip the header.

In [6]:
aq_cat -f,+1 <(cat $jp) <(cat $fr)

FR,Shoes,5
FR,Toys,5
FR,Mobile_Apps,4
FR,Digital_Music_Purchase,3
FR,Mobile_Apps,5
JP,Mobile_Apps,1
JP,Mobile_Apps,5
JP,Mobile_Apps,3
JP,PC,5
JP,PC,5


**Specifying Output Column**<br>

Let's only output `marketplace` and `star_rating` column, using `-c` option. Because we are not providing column spec, we will use column index number to specify the output columns **(starting with 1, not 0)**.

In [11]:
# index starts at 1!
aq_cat -f,+1 <(cat $jp) <(cat $fr) -c 1 3

FR,5
FR,5
FR,4
FR,3
FR,5
JP,1
JP,5
JP,3
JP,5
JP,5


In [None]:
# does this work??
aq_cat -f,+1 <(cat $jp) <(cat $fr) -c marketplace star_rating

**Cocatenating Data Streams that have different Data Schemas / Columns**<br>
Sometimes you might want to concatenate data stream that have differnt columns, or shcema.

Can we do that with simple mode?? Let's see. <br> We will use `ramen-ratings-part.csv` data, which includes ratings of various instant noddles all over the globe. They have following columns<br>
* review: reviewID
* brand: brand's name
* country: country's name
* stars: star rating value

We'll try to concatenate this dataset with the amazon's customer review from Japanese marketplace.

In [21]:
# taking a look at the ramen dataset 
cat $ramen

"review","brand","country","stars"
2580,"New Touch","Japan",3.75
2579,"Just Way","Taiwan",1
2578,"Nissin","USA",2.25
2577,"Wei Lih","Taiwan",2.75
2576,"Ching's Secret","India",3.75
2575,"Samyang Foods","South Korea",4.75
2574,"Acecook","Japan",4
2573,"Ikeda Shoku","Japan",3.75
2572,"Ripe'n'Dry","Japan",0.25


In [26]:
# Once again take a look at Japanese Marketplace Reviews
cat $jp

MarketPlace,product_category,star_rating
JP,Mobile_Apps,1
JP,Mobile_Apps,5
JP,Mobile_Apps,3
JP,PC,5
JP,PC,5


In [27]:
# Let's try concatenating them!
aq_cat -f,+1 <(cat $jp) <(cat $ramen) 

JP,Mobile_Apps,1
JP,Mobile_Apps,5
JP,Mobile_Apps,3
JP,PC,5
JP,PC,5
2580,"New Touch","Japan",3.75
2579,"Just Way","Taiwan",1
2578,"Nissin","USA",2.25
2577,"Wei Lih","Taiwan",2.75
2576,"Ching's Secret","India",3.75
2575,"Samyang Foods","South Korea",4.75
2574,"Acecook","Japan",4
2573,"Ikeda Shoku","Japan",3.75
2572,"Ripe'n'Dry","Japan",0.25


In [28]:
# now output arbitrary columns
aq_cat -f,+1 <(cat $jp) <(cat $ramen) -c 1 4

2580,3.75
2579,1
2578,2.25
2577,2.75
2576,3.75
2575,4.75
2574,4
2573,3.75
2572,0.25
JP,
JP,
JP,
JP,
JP,
