# aq_cnt tips and samples

This notebook goes over aq_cnt's options and it's sample usages. 
Based on AQ Tools version: 2.0.1-1.

I'll be adding examples specific to `-g` options for now.

Will be using [titanic](https://www.kaggle.com/c/titanic) dataset, and updated to add more samples.

In [4]:
# setting filename and column spec, and brief look at the dataset
file="data/titanic.csv"
colSpec=$(loginf -f,auto $file -o_pp_col -)
head $file

Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,3,Mr. Owen Harris Braund,male,22,1,0,7.25
1,1,Mrs. John Bradley (Florence Briggs Thayer) Cumings,female,38,1,0,71.2833
1,3,Miss. Laina Heikkinen,female,26,0,0,7.925
1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35,1,0,53.1
0,3,Mr. William Henry Allen,male,35,0,0,8.05
0,3,Mr. James Moran,male,27,0,0,8.4583
0,1,Mr. Timothy J McCarthy,male,54,0,0,51.8625
0,3,Master. Gosta Leonard Palsson,male,2,3,1,21.075
1,3,Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson,female,27,0,2,11.1333


Little bit about the data, here are some info about the columns.
### columns
- i:survivor:  1 indicates survivor, 0 indicates did not make it.
- i:Pclass:  Passenger class, the smaller the number, more luxurious. 
- s:Name:  full name of the passengers.
- s:sex:   male/female string
- i:Siblings/Spouses Aboad:  int, 1 if siblings/spouses are on board as well.
- i:Parents/Children Aboard: similar to above, but Parens/Children

Now let's take a look at the options.

## options

### -g 

In short, this option let users to specify column groups to apply `-k*` options.  

### Using Groupby with -k

**Little Refresher for `-k` option**

count numbers of unique values in specified column name. <br>
Close to executing 
```python
len(df[colName].unique())
```
in python&pandas stack.

#### Passengers per each passenger class

We'll use `-g` option to specify Pclass column as group, and within that group `-k` will count number of unique names. 

In [46]:
aq_cnt -f,+1 $file -d $colSpec -g Pclass -k head_counts Name | \
aq_ord -f,+1 - -d i:Pclass i:row i:head_counts -sort Pclass 

"Pclass","row","head_counts"
1,216,216
2,184,184
3,487,487


output is in format of <br>
```GroupbyCol(Pclass), row, count```

#### Passengers per each passenger class and Sex

This time using Sex and PClass as the group, counting names belongs to each category.

In [49]:
aq_cnt -f,+1 $file -d $colSpec -g Pclass Sex -k head_counts Name | \
aq_ord -f,+1 - -d i:Pclass s:Sex i:row i:head_counts -sort Pclass Sex

"Pclass","Sex","row","head_counts"
1,"female",94,94
1,"male",122,122
2,"female",76,76
2,"male",108,108
3,"female",144,144
3,"male",343,343


### Using Groupby with -kx

**Little Refresher for `-kx` option**

display /output actual unique values of the colName to stdout or file. 
Close to 

```python
df[column].unique()
```
in python&pandas stack.

#### TItle per each passenger class
Let's take a look at person's title (Mr., Miss., Master., etc), and display it within the group of passenger class. 
To that we'll extract title from name column and map it into new column, named title, using `aq_pp`. 
Feel free to skip to the counting part.

In [83]:
# extracting the title from name column
aq_pp -f,+1 $file -d $colSpec -mapf,pcre name "(M(rs?|is{2}|a(s|j).{1,2}r))" -mapc s:title "%%1%%" -c Pclass title | \

### display the titles in each groups.####
aq_cnt -f,+1 - -d i:Pclass s:title -g Pclass -kx - title_by_class title | \

aq_ord -f,+1 - -d i:Pclass s:title -sort Pclass

"Pclass","title"
1,"Major"
1,"Master"
1,
1,"Miss"
1,"Mr"
1,"Mrs"
2,
2,"Master"
2,"Miss"
2,"Mr"
2,"Mrs"
3,"Mrs"
3,"Master"
3,"Miss"
3,"Mr"


We can see that both class 2 and 3 have same passenger titles, but class 1 also has Major. 

### Using Groupby with -kX
For instance, let's count the numbers of people survived, within each passanger class(`Pclass`) using `-g` to group by Pclass, then apply `-kX` to display frequencies of each unique values in Survived column (0s and 1s). 

In [30]:
aq_cnt -f,+1 $file -d $colSpec -g Pclass -kX - survivor_by_class Survived

"Pclass","Survived","count"
2,0,97
2,1,87
1,0,80
1,1,136
3,1,119
3,0,368


You can see the format is in

`GroupByCol(Pclass), KeyCol(Survived), Count`

### Multiple Groupby
We can also specify multiple columns as groups to analyze data. 

Let's take a look at survivor counts in group of Pclass, and sex as well.
Groupby Columns will be Pclass and Sex.

In [36]:
aq_cnt -f,+1 $file -d $colSpec -g Pclass Sex -kX - survivor_by_class_sex survived | \
aq_ord -f,+1 - -d i:Pclass s:Sex i:survived i:count -sort Pclass Sex Survived #-sort,dec survived # ordering the results for visual

"Pclass","Sex","survived","count"
1,"female",0,3
1,"female",1,91
1,"male",0,77
1,"male",1,45
2,"female",0,6
2,"female",1,70
2,"male",0,91
2,"male",1,17
3,"female",0,72
3,"female",1,72
3,"male",0,296
3,"male",1,47


You can see the grouping structure of Pclass > Sex > Survived. <br>
What happends if we'd like to categorize by Sex first, then into Pclasses? 

In [33]:
aq_cnt -f,+1 $file -d $colSpec -g Sex Pclass -kX - survivor_by_class_sex survived | \
aq_ord -f,+1 - -d s:Sex i:Pclass i:survived i:count -sort Sex Pclass Survived #-sort,dec survived # ordering the results for visual

"Sex","Pclass","survived","count"
"female",1,0,3
"female",1,1,91
"female",2,0,6
"female",2,1,70
"female",3,0,72
"female",3,1,72
"male",1,0,77
"male",1,1,45
"male",2,0,91
"male",2,1,17
"male",3,0,296
"male",3,1,47


That worked by reversing the column names passed into `-g` option. However when you take a closer look at the comparisons<br>
of the 2 outputs, you can see that they're essentially the same data, in different order of row and columns.\


### Wait, but can we do this just by providing multiple colNames??

Let's see if we can achieve same result from the last example, using Sex and Pclass as group.

In [90]:
aq_cnt -f,+1 $file -d $colSpec -kX - survivor_by_class_sex Sex Pclass survived | \
aq_ord -f,+1 - -d s:Sex i:Pclass i:survived i:count -sort Sex Pclass Survived #-sort,dec survived # ordering the results

"Sex","Pclass","survived","count"
"female",1,0,3
"female",1,1,91
"female",2,0,6
"female",2,1,70
"female",3,0,72
"female",3,1,72
"male",1,0,77
"male",1,1,45
"male",2,0,91
"male",2,1,17
"male",3,0,296
"male",3,1,47


### Todo
As far as I know, you can, but there might exist some things that can only be done through `-g` option.

Will be updated in the futuer on this. 