# aq_pp -cmb

## Intuition
`-cmb` is an option to apply lookup table file (right table) on the original dataset (left table), based on join key column that are common in both tables. Based on join key column's value on the left table, it looks up for the matching record on the right table, and join the record horizontally to the left table.<br>
This option assumes that there are no duplicated key values on both tables, but when there are, 
- **On Left Table** - Keeps all of the table's records regardless of duplication or match, unless otherwise specified by option.
- **On Right Table** - only keeps the first match occurence when there are multiple matching duplicated records.

It takes a bit to wrap your head around this, but no worries we'll cover each cases on this notebook.

### Prerequisite
Readers are expected to have knowledge of the followings:
- bash and its commands
- aq_tool's [input, column](aq_input.ipynb) and [output](aq_output.ipynb) specs
- basic knowledge of SQL joins

## Syntax and Attributes

Details for this is available at [aq_pp -cmb](http://auriq.com/documentation/source/reference/manpages/aq_pp.html#cmb), so have it open so that you can refer to it as you go along this notebook.

## Data
We will use 2 tables for this example, one with a person's name and user IDs, the other with the user IDs and thier likes (or hobbies). They looks like below.


<h4><center>Users.csv (Left Table)</center></h4>

id|name                   
--|----
1|Patrik
1|Smith
2|Albert
3|Maria
4|Darwin
5|Elizabeth
5|Taylor

<br>

<h4><center>Likes.csv (Right Table)</center></h4>

id|likes
--|-----
4|Apples
5|Debate
5|Dancing
6|History
7|Hiking
8|Swimming

Note that there are duplicates on both tables, such as 
* `users.csv`
    * User ID 1, Patrik also entered his last name by accident, "Smith"
    * User ID 5, Elizabeth entered her last name "Taylor".
* `likes.csv`
    * User ID 5 has two hobbies, "Debate" and "Dancing"
    
We'll set the join key column as `id`.

## Samples

### Default Behavior

Let's start with the default behavior of the option. We'll be joining the 2 tables and Users.csv as left (as input spec to `aq_pp`) and Likes.csv as right (as input to `-cmb` option).


In [9]:
## Set up the file's path 
users="data/aq_pp/cmb/users.csv"
likes="data/aq_pp/cmb/likes.csv"

aq_pp -f,+1 $users -d i:id s:name -cmb,+1 $likes i:id s:likes

"id","name","likes"
1,"Patrik",
1,"Smith",
2,"Albert",
3,"Maria",
4,"Darwin","Apples"
5,"Elizabeth","Debate"
5,"Taylor","Debate"


Several important things to note on default behavior of the option:
* All the records from left table were retained regardless of duplicates or matches.
* Out of 2 duplicate matching records from right table (ID: 5, Likes of "Debate" and "Dancing"), only the first one was joined to left table's ID: 5, Names of "Elizabeth" and "Taylor".


### Attributes

Let's go over 2 attributes available to change lookup behaviors.

**`req`**<br>

On the example above, the unmatched records from the left table ("Patrik", "Albert" and "Maria") were still present on the result. 
We can use `req` option to keep only matching records from both tables.

In [10]:
aq_pp -f,+1 $users -d i:id s:name -cmb,+1,req $likes i:id s:likes

"id","name","likes"
4,"Darwin","Apples"
5,"Elizabeth","Debate"
5,"Taylor","Debate"


**`all`**<br>

This attribute allow the option to retain all of the duplicated matchinig records from the right table. 

In [17]:
aq_pp -f,+1 $users -d i:id s:name -cmb,+1,all $likes i:id s:likes

"id","name","likes"
1,"Patrik",
1,"Smith",
2,"Albert",
3,"Maria",
4,"Darwin","Apples"
5,"Elizabeth","Debate"
5,"Elizabeth","Dancing"
5,"Taylor","Debate"
5,"Taylor","Dancing"


You can see that now both of "Dancing" and "Debate" from right table for ID: 5 are present in the result.

### SQL `-cmb` version

By using `-cmb` option, you can achieve SQL equivalent of left/right inner join and inner join.

**Left/Right Inner Join**<br>

Left join will retain all of the records from left table (Users.csv), and retain the records from right table (Likes.csv) only when there's a match.<br>
Note that this demonstrates left join **with duplicates in key**. This can be achieved by using `all` option.

In [12]:
aq_pp -f,+1 $users -d i:id s:name -cmb,+1,all $likes i:id s:likes

"id","name","likes"
1,"Patrik",
1,"Smith",
2,"Albert",
3,"Maria",
4,"Darwin","Apples"
5,"Elizabeth","Debate"
5,"Elizabeth","Dancing"
5,"Taylor","Debate"
5,"Taylor","Dancing"


In order to perform right inner join, you can swap the dataset to provide to command, like example below.

In [15]:
aq_pp -f,+1 $likes -d i:id s:Likes -cmb,+1,all $users i:id s:Name

"id","Likes","Name"
4,"Apples","Darwin"
5,"Debate","Elizabeth"
5,"Debate","Taylor"
5,"Dancing","Elizabeth"
5,"Dancing","Taylor"
6,"History",
7,"Hiking",
8,"Swimming",


It's a bit confusing since the result are displayed in reversed manner (left and right), but you can see that all records from right table (`likes.csv`) are present while only matching record from left table (`users.csv`) does.

**Inner Join**<br>

To perform sql inner join (keep only matching records from left and right table, including duplicates), we will use both `all` and `req` option.

In [16]:
aq_pp -f,+1 $users -d i:id s:name -cmb,+1,req,all $likes i:id s:likes

"id","name","likes"
4,"Darwin","Apples"
5,"Elizabeth","Debate"
5,"Elizabeth","Dancing"
5,"Taylor","Debate"
5,"Taylor","Dancing"
