# aq_pp command -eval

In this notebook, we'll go over common usage examples of the data preprocessing command, `aq_pp`, primarily focusing on `-eval` option.

## Objective

Objective of this notebook is to educate and familiarize new users of `aq_tool`. By the end of this sample, they should have understandings of basic syntax and usage of the option, and be able to perform basic operation comfortably. 
Advanced application, such as usage with other options will be covered in the future. 

Before going over this notebook, make sure you're faimilar with the followings.

* Bash commands
* Regular Expression
* aq_input / input-spec 

We won't go over input, column and output spec on this notebook. They can be found on 
- [this notebook](aq_input.ipynb).
- [aq_output notebook](aq_output.ipynb)
 

Also have the [aq_pp documentation](http://auriq.com/documentation/source/reference/manpages/aq_pp.html) ready on your side, so you can refer to the details of each options as needed.

**TOS**

**Basic Options**

## arithmetic
### Numeric
Done
### String Concat
Done
### Bitwise
Done

## buildtin Variable
Done
### RowNum
Done
### Random
Done

## Data Conversion
Only easy examples are covered for now.


**Advanced**

## builtin Functions
## multiple -eval options


**EOTOS**


## Overview

`-eval` option in `aq-pp` command is responsible for data manipulation and column creation. Given expression and destination column name, it _evaluate_ the expression, and store the result in the column. More details of the option is available at [eval section - aq_pp documentation](http://auriq.com/documentation/source/reference/manpages/aq_pp.html#eval)

### Syntax

```bash
aq-pp ... -eval ColSpec|ColName Expr
```
where
- `ColSpec`: new column's column spec to assign the result
- `ColName`: existing column name to assign the result
- `Expr`: expression to be evaluated.

## Data

[Ramen Ratings Dataset](https://www.kaggle.com/residentmario/ramen-ratings) from kaggle will be used in this sample, which contains ratings of 2500 ramen products. 

Review|Brand|Variety|Style|Country|Stars
---|---|---|---|---|---|
2580|New Touch|T's Restaurant Tantanmen|Cup|Japan|3.75
2579|Just Way|Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles|Pack|Taiwan|1
2578|Nissin|Cup Noodles Chicken Vegetable|Cup|USA|2.25
2577|Wei Lih|GGE Ramen Snack Tomato Flavor|Pack|Taiwan|2.75
2576|Ching's Secret|Singapore Curry|Pack|India|3.75
2575|Samyang Foods|Kimchi song Song Ramen|Pack|South Korea|4.75
2574|Acecook|Spice Deli Tantan Men With Cilantro|Cup|Japan|4
2573|Ikeda Shoku|Nabeyaki Kitsune Udon|Tray|Japan|3.75
2572|Ripe'n'Dry|Hokkaido Soy Sauce Ramen|Pack|Japan|0.25
2571|KOKA|The Original Spicy Stir-Fried Noodles|Pack|Singapore|2.5

Columns and corresponing data types for the dataset are follows.
- `int: Review #`: review id number, the more recent the review is, the bigger the number is
- `str: Brand`: brand / manufacture of the product
- `str: Variety`: title of the product
- `str: Style`: categorical styles of the products, cup, pack or tray
- `str: Country`: country of origin
- `float: stars`: star rating of each product

## Input and Column Specification

Here is the corresponding column specs for the data<br>
`i:reviewID s:brand s:variety s:style s:country f:stars`

**Note**<br>
When reading in the files with `aq-pp`, we'll be using bash's [variable substitution](http://www.compciv.org/topics/bash/variables-and-substitution/) to keep the command short and clean. For instance, 
```bash
# assign file name & path to variable 'file'
file='data/aq_pp/fileName.csv'
```

Now we are all set and ready, let's get started with numerical operation.

## Arithmetic

Some intro here about arithmetic

### Numerical Operation

Operators supported for numerical operation are<br>

_Arithmetic_
- `*`: multiplication
- `/`: division
- `%`: modulus
- `+`: addition
- `-`: subtraction

_Bitwise_
- `&`: AND
- `|`: OR
- `^`: XOR

First, we will double the value of star rating column, and assign it to a new column named `double_rating`. 

In [1]:
# First store filename and column spec in variable to simplify commands
file="data/aq_pp/ramen-ratings-part.csv"
cols="i:reviewID s:brand s:variety s:style s:country f:stars"
# now create a column called double_rating, and assign the value of 2 * stars
aq_pp -f,+1 $file -d $cols -eval f:double_rating '2*stars'

"reviewID","brand","variety","style","country","stars","double_rating"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,7.5
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,2
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,4.5
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,5.5
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,7.5
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,9.5
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,8
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,7.5
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,0.5


Now the new column `double_rating` contains the value twice as large as the `stars` value.

**Couple things to note**<br>
- **Column Datatype:** the destination column's datatype has to be same as the datatype of result of the `Expr`. In the example above, the result is float datatype, therefore we've declared `double_rating` as float.
- **Quotations:** you cannot quote `colName|colSpec`, while `Expr` needs to be quoted. Single quotation is recommended, in case string value is included which require further quotation.
Now we will perform the same operation, but store the result on existing column, `stars`.

In the below example, we'll assign the result to existing column `stars`, instead of creating new column.

In [2]:
aq_pp -f,+1 $file -d $cols -eval stars '2*stars'

"reviewID","brand","variety","style","country","stars"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",7.5
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",2
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",4.5
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",5.5
2576,"Ching's Secret","Singapore Curry","Pack","India",7.5
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",9.5
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",8
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",7.5
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.5


You can apply any of the other arithmetic operators just like above example. 

In the above example, `Expr` only contained existing column and a constant. We can also provide multiple column names as `Expr` and perform calculation.

We'll divide the `reviewID` (int) by `stars`(float), and store the result in new column `div`(float).

In [3]:
aq_pp -f,+1 $file -d $cols -eval f:div 'reviewID/stars'

"reviewID","brand","variety","style","country","stars","div"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,688
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,2579
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,1145.7777777777778
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,937.09090909090912
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,686.93333333333328
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,542.10526315789468
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,643.5
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,686.13333333333333
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,10288


### String Operation 
**+ operator with string**<br>
`+` operator can also be used to concatinate string values, besides numeric operation. As a example, we'll create a string column `s:info`, and store combined strings of `brand` and `country`, separated by ` - ` character. 

Note that only `+` operator supports string manipulation.

In [4]:
aq_pp -f,+1 $file -d $cols -eval s:info 'brand+" - "+country'

"reviewID","brand","variety","style","country","stars","info"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,"New Touch - Japan"
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,"Just Way - Taiwan"
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,"Nissin - USA"
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,"Wei Lih - Taiwan"
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,"Ching's Secret - India"
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,"Samyang Foods - South Korea"
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,"Acecook - Japan"
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,"Ikeda Shoku - Japan"
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,"Ripe'n'Dry - Japan"


`brand` and `country` are column names while ` - ` is a string constant, which is why it is double quoted. 

More complex string manipulations are possible with `aq_pp` by using `-map` options and/or [`builtin functions / aq-emod`](http://auriq.com/documentation/source/reference/manpages/aq-emod.html), which will be covered in other notebook.

### Bitwise Operation

Let's take a look at bitwise operator, which performs [bitwise logical operation](https://en.wikipedia.org/wiki/Bitwise_operation) on decimal numbers.

`aq_pp` supports operators below.


- `&`: AND
- `|`: OR
- `^`: XOR

We'll use different data containing decimal numbers to demonstrate the result of bitwise operation clearly, which looks like below.

number|mask
---|---
1|981
290|90
31|12
79|56
10|874

Let's perform `|`(bitwise OR) operator on `numbers` column, with a constant 32. The result will be stored in the new column `i:result`.

In [5]:
aq_pp -f,+1 data/aq_pp/bitwise.csv -d i:number i:mask -eval i:result 'number | 32'

"number","mask","result"
1,981,33
290,90,290
31,12,63
79,56,111
10,874,42


**Note**: <br>
`aq_pp` interpret numbers as decimal by default, therefore input to the operators will be interpretted as decimal, and output will be in decimal number. 

## Builtin Variables
`aq_pp` is equipped with [builtin variables](http://auriq.com/documentation/source/reference/manpages/aq_pp.html#eval) that can be used to substitue values. There are couple of them, and here we'll take a look at `$RowNum` and `$Random`.

**`RowNum`**<br>
represents the row number of the record, starting at 1.

On the example below, we'll create a new integer column `row` and store the row number.

In [6]:
aq_pp -f,+1 $file -d $cols -eval i:row '$RowNum'

"reviewID","brand","variety","style","country","stars","row"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,1
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,2
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,3
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,4
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,5
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,6
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,7
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,8
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,9


Since we are skipping the header row with `-f,+1` option, we'll correct the row numbers by addding 1 to each row number (counting the header as row 1).

In [7]:
aq_pp -f,+1 $file -d $cols -eval i:row '$RowNum +1'

"reviewID","brand","variety","style","country","stars","row"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,2
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,3
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,4
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,5
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,6
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,7
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,8
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,9
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,10


**`Random`**<br>
Represents a positive random number, and the value changes every time the variable is referenced.

In this example, we will use `Random` to generate random integer for every row, and store it in integer column named `random`.

In [8]:
aq_pp -f,+1 $file -d $cols -eval i:random '$random'

"reviewID","brand","variety","style","country","stars","random"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,476707713
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,1186278907
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,505671508
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,2137716191
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,936145377
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,1215825599
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,589265238
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,924859463
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,1182112391


This outputs very large positive integer. Sometimes we need random numbers within a certain range. Let's say between 0 and 10. Using modulus operator, 

In [9]:
aq_pp -f,+1 $file -d $cols -eval i:row '$random%10'

"reviewID","brand","variety","style","country","stars","row"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,3
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,7
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,8
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,1
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,7
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,9
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,8
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,3
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,1


Other than modulus, you can form and apply more complex numerical operations with builtin variables. 

## Data Conversion

Users can take advantage of powerful [builtin function / aq-emod](www.auriq.com/documentation/source/reference/manpages/aq-emod.html) that can be used for more complex data processing than one can do with combinations of `-eval` options. 

While there are variety of functions available, we'll take a look at ones for data type conversion in this section, specifically `ToI()` and `ToF()`. 

We'll set all columns' data types as string in column spec in the first step.

In [10]:
cols="s:reviewID s:brand s:variety s:style s:country s:stars"
# input every columns as string
aq_pp -f,+1 $file -d $cols 

"reviewID","brand","variety","style","country","stars"
"2580","New Touch","T's Restaurant Tantanmen ","Cup","Japan","3.75"
"2579","Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan","1"
"2578","Nissin","Cup Noodles Chicken Vegetable","Cup","USA","2.25"
"2577","Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan","2.75"
"2576","Ching's Secret","Singapore Curry","Pack","India","3.75"
"2575","Samyang Foods","Kimchi song Song Ramen","Pack","South Korea","4.75"
"2574","Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan","4"
"2573","Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan","3.75"
"2572","Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan","0.25"


Notice that `reviewID` and `stars` columns are quoted, showing that `aq_pp` is interpretting them as string. Let's convert them into appropriate data types with builtin functions, 
- `ToF(Val)`: convert `Val` to float
- `ToI(Val)`: convert `Val` to integer

where `Val` can be constant value or column names of string / numeric data type.

In [11]:
aq_pp -f,+1 $file -d $cols -eval i:int_reviewID 'ToI(reviewID)'

"reviewID","brand","variety","style","country","stars","int_reviewID"
"2580","New Touch","T's Restaurant Tantanmen ","Cup","Japan","3.75",2580
"2579","Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan","1",2579
"2578","Nissin","Cup Noodles Chicken Vegetable","Cup","USA","2.25",2578
"2577","Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan","2.75",2577
"2576","Ching's Secret","Singapore Curry","Pack","India","3.75",2576
"2575","Samyang Foods","Kimchi song Song Ramen","Pack","South Korea","4.75",2575
"2574","Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan","4",2574
"2573","Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan","3.75",2573
"2572","Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan","0.25",2572


We've provided column name as `Val` in the example above, but can also provide a string constant. Note that you should always quote the string values in `-eval` options' `Expr`. 

In [12]:
aq_pp -f,+1 $file -d $cols -eval i:int_reviewID 'ToI("13")'

"reviewID","brand","variety","style","country","stars","int_reviewID"
"2580","New Touch","T's Restaurant Tantanmen ","Cup","Japan","3.75",13
"2579","Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan","1",13
"2578","Nissin","Cup Noodles Chicken Vegetable","Cup","USA","2.25",13
"2577","Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan","2.75",13
"2576","Ching's Secret","Singapore Curry","Pack","India","3.75",13
"2575","Samyang Foods","Kimchi song Song Ramen","Pack","South Korea","4.75",13
"2574","Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan","4",13
"2573","Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan","3.75",13
"2572","Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan","0.25",13


Builtin function can also be combined with arithmetic expression. Let's convert+ `"13"`(str constant) and `reviewID` into int, then add them together, then store the result on `i:result` column this time.

In [13]:
aq_pp -f,+1 $file -d $cols -eval i:result 'ToI(reviewID) + ToI("13")'

"reviewID","brand","variety","style","country","stars","result"
"2580","New Touch","T's Restaurant Tantanmen ","Cup","Japan","3.75",2593
"2579","Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan","1",2592
"2578","Nissin","Cup Noodles Chicken Vegetable","Cup","USA","2.25",2591
"2577","Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan","2.75",2590
"2576","Ching's Secret","Singapore Curry","Pack","India","3.75",2589
"2575","Samyang Foods","Kimchi song Song Ramen","Pack","South Korea","4.75",2588
"2574","Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan","4",2587
"2573","Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan","3.75",2586
"2572","Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan","0.25",2585


13 is added to the original `reviewID` value, on `result` column. 

We can also combine numeric strings by using `+`, convert the result to numeric data type, then store in a numeric column. Let's take a look.



In [14]:
aq_pp -f,+1 $file -d $cols -eval i:result 'ToI(reviewID + "13")'

"reviewID","brand","variety","style","country","stars","result"
"2580","New Touch","T's Restaurant Tantanmen ","Cup","Japan","3.75",258013
"2579","Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan","1",257913
"2578","Nissin","Cup Noodles Chicken Vegetable","Cup","USA","2.25",257813
"2577","Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan","2.75",257713
"2576","Ching's Secret","Singapore Curry","Pack","India","3.75",257613
"2575","Samyang Foods","Kimchi song Song Ramen","Pack","South Korea","4.75",257513
"2574","Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan","4",257413
"2573","Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan","3.75",257313
"2572","Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan","0.25",257213


Phew! that was lots of examples, but now we know how to perform fundamental operations and manipulate data with `-eval` option. 
Let's take a look at some advanced examples.

## Advanced Examples

### Builtin Functions
Builtin functions (a.k.a aq-emod) are able to perform more complicated processing than ones done by aq_pp alone. 
Followings are the types of functions:

- [String property functions](#string_property)
- [Math functions](#math)
- [Comparison functions](#comparison)
- [Data extraction and encode/decode functions](#extract_code)
- [General data conversion functions](#conversion)
- [Date/Time conversion functions](#date_time)
- [Character set encoding conversion functions](#character_encoding)
- [Key hashing functions](#key_hashing)
- [Speciality functions](#speciality)
- [RTmetrics functions](#rtmetrics)
- [Udb specific functions](#udb)

Before we get started on the functions, we will redefine each columns' data type by modifying the column spec.
`reviewID` will be integer type, and `stars` will be float type, while the rest will remain string type.

In [15]:
cols="i:reviewID s:brand s:variety s:style s:country f:stars"
aq_pp -f,+1 $file -d $cols

"reviewID","brand","variety","style","country","stars"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25


<a id='string_property'></a>
#### String Property Functions


- `SHash(Val)`:Returns the numeric hash value of a string.<br>
Val can be a string column’s name, a string constant, or an expression that evaluates to a string.

    In this example, we'll hash `style` column, which value consists of Cup, Pack or Tray, and store the result in `style_hash` column.

In [16]:
aq_pp -f,+1 $file -d $cols -eval 'i:style_hash' 'SHash(style)'

"reviewID","brand","variety","style","country","stars","style_hash"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,193488781
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,2090607556
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,193488781
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,2090607556
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,2090607556
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,2090607556
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,193488781
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,2090769765
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,2090607556


You can see that same original string value results in equal hash.

- `SLeng(Val)`:Returns the length of a string.<br>
Val can be a string column’s name, a string constant, or an expression that evaluates to a string.
    
    Again in this example, we'll provide `style` column, and result will be stored in `style_len` column.

In [17]:
aq_pp -f,+1 $file -d $cols -eval 'i:style_len' 'SLeng(style)'

"reviewID","brand","variety","style","country","stars","style_len"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,3
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,4
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,3
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,4
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,4
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,4
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,3
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,4
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,4


<a id='math'></a>
### Math Functions
<br>

**Basics for math functions**

Except few functions, math function will take a single argument `Val` which can be numeric column, constant or expression that will result in numeric value. 
We will go over just a few of them here, plus functions with irregular syntax. For the list of math functions, refer to the [aq-emod documentation](http://auriq.com/documentation/source/reference/manpages/aq-emod.html#math-functions).

- `Ceil(Val)`: Rounds Val up to the nearest integral value and returns the result.

    Val can be a numeric column’s name, a numeric constant, or an expression that evaluates to a number.

`stars` column that contains average star rating will be provided and result will be stored in `ceiling` column.

In [18]:
aq_pp -f,+1 $file -d $cols -eval 'i:ceiling' 'Ceil(stars)'

"reviewID","brand","variety","style","country","stars","ceiling"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,4
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,1
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,3
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,3
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,4
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,5
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,4
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,4
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,1



- `Floor(Val)`: Rounds Val down to the nearest integral value and returns the result.

    Val can be a numeric column’s name, a numeric constant, or an expression that evaluates to a number.

Similary to `Ceil()`, we'll use `stars` column again. Notice the difference in the result compare to `Ceil()` function.


In [19]:
aq_pp -f,+1 $file -d $cols -eval 'i:floor' 'Floor(stars)'

"reviewID","brand","variety","style","country","stars","floor"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,3
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,1
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,2
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,2
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,3
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,4
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,4
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,3
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,0



- `Round(Val)`: Rounds Val up/down to the nearest integral value and returns the result. Half way cases are rounded away from zero.

    Val can be a numeric column’s name, a numeric constant, or an expression that evaluates to a number.

Given `star` column, the result will be rounded to the nearest integer.

In [20]:
aq_pp -f,+1 $file -d $cols -eval 'i:round' 'round(stars)' 

"reviewID","brand","variety","style","country","stars","round"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,4
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,1
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,2
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,3
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,4
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,5
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,4
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,4
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,0


- `Sqrt(Val)`: Computes the square root of Val.

    Val can be a numeric column’s name, a numeric constant, or an expression that evaluates to a number.

In this example, we will provide an constant as an argument for clearity.

In [21]:
aq_pp -f,+1 $file -d $cols -eval 'i:squared' 'Sqrt(9)' 

"reviewID","brand","variety","style","country","stars","squared"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,3
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,3
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,3
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,3
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,3
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,3
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,3
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,3
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,3


**Math functions with irregular syntax**

These functions require multiple values as their arguments to return the result.

- `Min(Val1, Val2 [, Val3 ...])`: Returns the smallest value among Val1, Val2 and so on.

    Each Val can be a numeric column’s name, a number, or an expression that evaluates to a number.
    If all values are integers, the result will also be an integer.
    If any value is a floating point number, the result will be a floating point number.
We'll provide a constant, as well as `stars` column to be compared with, and store the result in a column called `smaller`.

In [22]:
aq_pp -f,+1 $file -d $cols -eval 'f:smaller' 'Min(3, stars)' 

"reviewID","brand","variety","style","country","stars","smaller"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,3
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,1
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,2.25
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,2.75
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,3
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,3
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,3
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,3
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,0.25


- `Pow(Val, Power)`: Computes Val raised to the power of Power.

    Val and Power can be a numeric column’s name, a numeric constant, or an expression that evaluates to a number.

In this example, we'll calculate a 8th power of 2, meaning `Val = 2` and `Power = 8`, and result will be in a integer column called `byte`. 

In [23]:
aq_pp -f,+1 $file -d $cols -eval 'i:byte' 'Pow(2, 8)'

"reviewID","brand","variety","style","country","stars","byte"
2580,"New Touch","T's Restaurant Tantanmen ","Cup","Japan",3.75,256
2579,"Just Way","Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles","Pack","Taiwan",1,256
2578,"Nissin","Cup Noodles Chicken Vegetable","Cup","USA",2.25,256
2577,"Wei Lih","GGE Ramen Snack Tomato Flavor","Pack","Taiwan",2.75,256
2576,"Ching's Secret","Singapore Curry","Pack","India",3.75,256
2575,"Samyang Foods","Kimchi song Song Ramen","Pack","South Korea",4.75,256
2574,"Acecook","Spice Deli Tantan Men With Cilantro","Cup","Japan",4,256
2573,"Ikeda Shoku","Nabeyaki Kitsune Udon","Tray","Japan",3.75,256
2572,"Ripe'n'Dry","Hokkaido Soy Sauce Ramen","Pack","Japan",0.25,256


- `IsInf(Val)`: Tests if Val is infinite.

    Returns 1, -1 or 0 if the value is positive infinity, negative infinity or finite respectively.
    Val can be a numeric column’s name, a numeric constant, or an expression that evaluates to a number.

This one's little interesting. In order to provide "negative infinity", we'll provide an expression `-1.0/0`, and see if it returns -1.

**Note:** 
* In order to get positive / negative infinity, the expression needs to be evaluated as float(e.g. `1.0/0` instead of `1/0`).
* The column to assign the result needs to be a datatype of signed integer, either `is` or `ls`, in order to be able to display negative values correctly.

In [24]:
aq_pp -f,+1 $file -d $cols -eval 'is:IsInf' 'IsInf(-1.0/0)' -c IsInf

"IsInf"
-1
-1
-1
-1
-1
-1
-1
-1
-1


<a id='comparison'></a>
### Comparison Functions

Most of the comparision function compare 1 or more string constant or pattern or regex against whole / part of given string or string column. They return 1 if there are match, and 0 for no match. 

Let's start with function that compares beginning and end of the string with given pattern.



- `BegCmp(Val, BegStr [, BegStr ...])`: examine if string `Val` start exactly with `BegStr`. 

    * Returns 1 if there is a match, 0 otherwise.
    * `Val` can be a string column’s name, a string constant, or an expression that evaluates to a string.
    * Each `BegStr` is a string constant that specifies the starting string to match.

Let's use `style` column again to demonstrate this function. I will give "P" as a pattern to match, and this should return 1 whenever `style` column's content start with "Pa".

In [25]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'BegCmp(style, "P")' -c style beginWith

"style","beginWith"
"Cup",0
"Pack",1
"Cup",0
"Pack",1
"Pack",1
"Pack",1
"Cup",0
"Tray",0
"Pack",1


We can also provide multiple `BegStr` to match with string that starts with any of the given `BegStr`. You can observe that this time it returns 1 for string that start with either "P" or "Tr". 

In [26]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'BegCmp(style, "P", "Tr")' -c style beginWith

"style","beginWith"
"Cup",0
"Pack",1
"Cup",0
"Pack",1
"Pack",1
"Pack",1
"Cup",0
"Tray",1
"Pack",1


- `EndCmp(Val, EndStr [, EndStr ...])`: Compares one or more ending string EndStr with the tail of Val. All the comparisons are case sensitive.

    * Returns 1 if there is a match, 0 otherwise.
    * Val can be a string column’s name, a string constant, or an expression that evaluates to a string.
    * Each EndStr is a string constant that specifies the ending string to match.

This function is same as the one above (`BegCmp`) except this compares the ending of `Val`. 
Let's see it in action with `style` column, I will provide 2 `EndStr` this time as well.

In [27]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'EndCmp(style, "ck", "up")' -c style beginWith

"style","beginWith"
"Cup",1
"Pack",1
"Cup",1
"Pack",1
"Pack",1
"Pack",1
"Cup",1
"Tray",0
"Pack",1


1 is returned for `Cup` and `Pack` that ends with given pattern.

- `SubCmp(Val, SubStr [, SubStr ...])`: Compares one or more substring SubStr with with any part of Val. All the comparisons are case sensitive.

    * Returns 1 if there is a match, 0 otherwise.
    * Val can be a string column’s name, a string constant, or an expression that evaluates to a string.
    * Each SubStr is a string constant that specifies the substring to match.

I will provide "Noodle" as `SubStr` for `variety` column, to detect the ramen name which contains "Noodle"(case sensitive).

In [28]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'SubCmp(variety, "Noodle")' -c variety beginWith

"variety","beginWith"
"T's Restaurant Tantanmen ",0
"Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles",1
"Cup Noodles Chicken Vegetable",1
"GGE Ramen Snack Tomato Flavor",0
"Singapore Curry",0
"Kimchi song Song Ramen",0
"Spice Deli Tantan Men With Cilantro",0
"Nabeyaki Kitsune Udon",0
"Hokkaido Soy Sauce Ramen",0


Next, I will demonstrate to provide 2 string, "Noodle" and "Spic"(to match both "Spicy" and "Spice") to extract names which contains **EITHER** of the words.



In [29]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'SubCmp(variety, "Noodle", "Spic")' -c variety beginWith

"variety","beginWith"
"T's Restaurant Tantanmen ",0
"Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles",1
"Cup Noodles Chicken Vegetable",1
"GGE Ramen Snack Tomato Flavor",0
"Singapore Curry",0
"Kimchi song Song Ramen",0
"Spice Deli Tantan Men With Cilantro",1
"Nabeyaki Kitsune Udon",0
"Hokkaido Soy Sauce Ramen",0


- `SubCmpAll(Val, SubStr [, SubStr ...])`:Same as `SubCmp()`, except when multiple `SubStr` are provided, it will return 1 only if `Val` contains every single one of `SubStr`.

Similary to the example `SubCmp()` above, we'll provide "Noodle" and "Spic" to be compared with `variety` column. This time though it'll return 1 only if `variety` contains **BOTH** "Noodle" and "Spic".

In [30]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'SubCmpAll(variety, "Noodle", "Spic")' -c variety beginWith

"variety","beginWith"
"T's Restaurant Tantanmen ",0
"Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles",1
"Cup Noodles Chicken Vegetable",0
"GGE Ramen Snack Tomato Flavor",0
"Singapore Curry",0
"Kimchi song Song Ramen",0
"Spice Deli Tantan Men With Cilantro",0
"Nabeyaki Kitsune Udon",0
"Hokkaido Soy Sauce Ramen",0


- `MixedCmp(Val, SubStr, Typ [, SubStr, Typ ...])`: You can this of this function as more versatile version of `BegCmp`, `EndCmp`, and `SubCmp`. Given `Val`, you'll provide `SubStr` and `Typ` which is:
* `BEG` - Match with the head of Val.
* `END` - Match with the tail of Val.
* `SUB` - Match with any part of Val.

Note when provided more than 2 `SubStr`, this function will return 1 for matching with **EITHER** of the provided `SubStr`'s pattern. This will be demonstrated at **`SUB`** section later.

**`BEG`**<br>
Let's start with `BEG`. We will use `style` column, and give `C` to get `style` that begin with "C".

In [31]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'MixedCmp(style, "C", BEG)' -c style beginWith

"style","beginWith"
"Cup",1
"Pack",0
"Cup",1
"Pack",0
"Pack",0
"Pack",0
"Cup",1
"Tray",0
"Pack",0


**`END`**<br>

Next I will provide "ck" as `SubStr`, and `END` as `Typ` to extract record with style which end with "ck" (Pack type).

In [32]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'MixedCmp(style, "ck", END)' -c style beginWith

"style","beginWith"
"Cup",0
"Pack",1
"Cup",0
"Pack",1
"Pack",1
"Pack",1
"Cup",0
"Tray",0
"Pack",1


**`SUB`**<br>
Lastly, I will provide "Noodle" and "Spic" to match with `variety` column as `Val` to extract records that contains EITHER of these strings.
Since we're looking for substring match in any position of `variety`, `SUB` will be the `Typ`.

In [33]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'MixedCmp(variety, "Noodle", SUB, "Spic", SUB)' -c variety beginWith

"variety","beginWith"
"T's Restaurant Tantanmen ",0
"Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles",1
"Cup Noodles Chicken Vegetable",1
"GGE Ramen Snack Tomato Flavor",0
"Singapore Curry",0
"Kimchi song Song Ramen",0
"Spice Deli Tantan Men With Cilantro",1
"Nabeyaki Kitsune Udon",0
"Hokkaido Soy Sauce Ramen",0


- `MixedCmpAll(Val, SubStr, Typ [, SubStr, Typ ...])`: You can this of this function as more versatile version of `BegCmp`, `EndCmp`, and `SubCmp`. Given `Val`, you'll provide `SubStr` and `Typ` which is:
* `BEG` - Match with the head of Val.
* `END` - Match with the tail of Val.
* `SUB` - Match with any part of Val.

Same as `MixedCmp()` above, except when provided with more than 2 `SubStr`, it'll return 1 for matching **ALL** of the patterns. Here, I will demonstrate it using **`SUB`**, with `variety` column.
This should only return 1 for the record that contains both "Noodle" and "Spic" in `variety` column.

In [34]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'MixedCmpAll(variety, "Noodle", SUB, "Spic", SUB)' -c variety beginWith

"variety","beginWith"
"T's Restaurant Tantanmen ",0
"Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles",1
"Cup Noodles Chicken Vegetable",0
"GGE Ramen Snack Tomato Flavor",0
"Singapore Curry",0
"Kimchi song Song Ramen",0
"Spice Deli Tantan Men With Cilantro",0
"Nabeyaki Kitsune Udon",0
"Hokkaido Soy Sauce Ramen",0


- `Contain(Val, SubStrs)`: Compares the substrings in SubStrs with any part of Val. All the comparisons are case sensitive.

    * Returns 1 if there is a match, 0 otherwise.
    * Val can be a string column’s name, a string constant, or an expression that evaluates to a string.
    * SubStrs is a string constant that specifies what substrings to match. It is a comma-newline separated list of literal substrings of the form “`SubStr1,[\r]\nSubStr2...`”.

Let's test this function by providing "Noodle" and "Spic" as `SubStrs`, and `variety` as `Val`.

In [35]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'contain(variety, "Noodle,\nSpic")' -c variety beginWith

"variety","beginWith"
"T's Restaurant Tantanmen ",0
"Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles",1
"Cup Noodles Chicken Vegetable",1
"GGE Ramen Snack Tomato Flavor",0
"Singapore Curry",0
"Kimchi song Song Ramen",0
"Spice Deli Tantan Men With Cilantro",1
"Nabeyaki Kitsune Udon",0
"Hokkaido Soy Sauce Ramen",0


**`ContainAll(Val, SubStrs)`**:Compares the substrings in SubStrs with any part of Val. All the comparisons are case sensitive.

* Returns 1 if all the substrings match, 0 otherwise.
* Val can be a string column’s name, a string constant, or an expression that evaluates to a string.
* SubStrs is a string constant that specifies what substrings to match. It is a comma-newline separated list of literal substrings of the form “`SubStr1,[\r]\nSubStr2...`”.

Same as `Contain()`, except that when provided with multiple `SubStrs`, it'll return 1 only if all the patterns are present in `Val`. 
Using `variety` column with values of "Noodle" and "Spic" again, we'll see that only record with **BOTH** words present in thier `variety` column will have 1 as return value.

In [36]:
aq_pp -f,+1 $file -d $cols -eval 'is:beginWith' 'ContainAll(variety, "Noodle,\nSpic")' -c variety beginWith

"variety","beginWith"
"T's Restaurant Tantanmen ",0
"Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles",1
"Cup Noodles Chicken Vegetable",0
"GGE Ramen Snack Tomato Flavor",0
"Singapore Curry",0
"Kimchi song Song Ramen",0
"Spice Deli Tantan Men With Cilantro",0
"Nabeyaki Kitsune Udon",0
"Hokkaido Soy Sauce Ramen",0


**FOR 3 EXAMPLES BELOW, WAIT FOR KO'S PERMISSION TO USE ANA'S DATA AS EXAMPLE**

**`PatCmp(Val, Pattern [, AtrLst])`**:Compares a generic wildcard pattern with Val.

* Returns 1 if it matches, 0 otherwise. `Pattern` must match the _entire_ `Val` to be successful.
* `Val` can be a string column’s name, a string constant, or an expression that evaluates to a string.
* `Pattern` is a string constant that specifies the pattern to match. It is a simple wildcard pattern containing just '*' (matches any number of bytes) and ‘?’ (matches any 1 byte) only; literal ‘*’, ‘?’ and ‘\’ in the pattern must be ‘\’ escaped.
* Optional `AtrLst` is a list of `|` separated attributes containing:
    * `ncas` - Perform a case insensitive match (default is case sensitive). For ASCII data only.


**`RxCmp(Val, Pattern [, AtrLst])`**:Compares a string with a regular expression.

* Returns 1 if they match, 0 otherwise. `Pattern` only needs to match a subpart of `Val` to be successful.
* `Val` can be a string column’s name, a string constant, or an expression that evaluates to a string.
* `Pattern` is a string constant that specifies the regular expression to match.
* Optional `AtrLst` is a list of `|` separated regular expression attributes.


**`NumCmp(Val1, Val2, Delta)`**:Tests if Val1 and Val2 are within Delta of each other - i.e., whether `Abs(Val1 - Val2) <= Delta`.

* Returns 1 if true, 0 otherwise.
* `Val1`, `Val2` and `Delta` can be a numeric column’s name, a numeric constant, or an expression that evaluates to a number.
* `Delta` should be greater than or equal to zero.



<a id='extract_code'></a>
### Data extraction and encode / decode Functions



**`SubStr(Val, Start [, Length])`**: Returns a substring of a string.

* `Val` can be a string column’s name, a string constant, or an expression that evaluates to a string.
* `Start` is the starting position (zero-based) of the substring in `Val`. It can be a numeric column’s name, a number, or an expression that evaluates to a number.
     * If `Start` is negative, the length of `Val` will be added to it. If it is still negative, 0 will be used.(Think of it as a pythonic way of indexing the string from backwards)
* Optional `Length` specifies the length of the substring in `Val`. It can be a numeric column’s name, a number, or an expression that evaluates to a number.
    * Max length is length of `Val` minus `Start`.
    * If `Length` is not specified, max length is assumed.
    * If `Length` is negative, max length will be added to it. If it is still negative, 0 will be used.

For this example, to keep things simple we'll use a file containing 2 row, one with numeric string and the other with good old "Hello World", which looks like below.


simple_str|
---|
0123456789
Hello World

starting from zero as `Val`, and will extract substring at index 3 ~ last index. We can do this like following.

In [45]:
file="data/aq_pp/substr.csv"
aq_pp -f,+1 $file -d s:val_str -eval 's:subStr' 'SubStr(val_str, 3)' -c  val_str SubStr

"val_str","subStr"
"0123456789","3456789"
"Hello World","lo World"


As you can see, string from 3rd index (counting from zero) are extracted as substring. <br>

**`Length`**<br>
Providing this argument will allow users to specify the length of **extracted substring**. Note that this is NOT the ending index of the substring. 

For example, in order to extract substring at index 3 ~ 7 in the original `Val` string, we'd need to provide `3` as `Start` and `5` as `Length`, since the substring extracted will be the length of 5.

In [48]:
aq_pp -f,+1 $file -d s:val_str -eval 's:subStr' 'SubStr(val_str, 3, 5)' -c  val_str SubStr

"val_str","subStr"
"0123456789","34567"
"Hello World","lo Wo"


**Negative Index**<br>
Users can specify index from right side of the `Val` string, by using negative indexing (Similar to python's string).

For example, say we'd like to extract the word "World" using the negative index. Letter "W" is the 5th character from the right side of the string, so we'll provide `-5` as `Start`.

aq_pp -f,+1 $file -d s:val_str -eval 's:subStr' 'SubStr(val_str, -5)' -c  val_str SubStr

`Length` argument can also be negative number. Let's provide `-2` as `Length` parameter, and `0` for `Start`.

In [53]:
aq_pp -f,+1 $file -d s:val_str -eval 's:subStr' 'SubStr(val_str, 0, -2)' -c  val_str SubStr

"val_str","subStr"
"0123456789","567"
"Hello World","Wor"


This works exactly as the reverse ending indexing, such that we've extracted substring that ends before the 2nd character from right side of the original string.

**ANA EXAMPLE BELOW IF CAN**

In [None]:
# ANA EXAMPLE

**`ClipStr(Val, ClipSpec)`**:Returns a substring of a string, based on `clipSpec`.

* `Val` can be a string column’s name, a string constant, or an expression that evaluates to a string.

* `ClipSpec` is a string constant that specifies how to clip the substring from the source. It is a sequence of individual clip elements separated by “;”:

**ONLY EXPLAIN SOME OF THEM (IMPORTANT ONES) AND REFER TO THE DOCUMENTATION FOR THE REST**

Each clip elements exacts either the starting or trailing portion of the source. The first element clips the input Val, the second element clips the result from the first, and so on. The components in a clip element are:

    ! - The negation operator inverts the result of the clip. In other words, if the original clipped result is the starting portion of the source, negating that gives the tailing portion.
    Num - The number of bytes or separators (see Sep below) to clip.
    - (a dash) - Do not include the last separator (see Sep below) in the result.
    Dir - The clip direction. Specify a “>” to clip from the beginning to the end. Specify a “<” to clip backward from the end to the beginning.
    Sep - Optional single byte clip separator. If given, a substring containing up to (and including, unless a “-” is given) Num separators will be clipped in the Dir direction. If no separator is given, Num bytes will be clipped in the the same way.

Do not put a “;” at the end of ClipSpec. The reason is that it could be misinterpreted as the Sep for the last clip element.


We'll use a list of web URLs as an example here, to demonstrate how `ClipSpec` would be useful.It is a single column list with web URL string.

First, let's say you'd like to extract the first 5 characters of the URL. We can specify `ClipSpec` as ...

In [58]:
urls="data/aq_pp/clipstr.csv"
aq_pp -f,+1 $urls -d s:val_str -eval 's:subStr' 'ClipStr(val_str, "5>")' -c  val_str SubStr

"val_str","subStr"
"http://auriq.com/documentation/source/reference/manpages/aq-emod.html","http:"
"https://auriq.com/boosting-trading-models-with-sagemaker-and-essentia/","https"


In [63]:
aq_pp -f,+1 $urls -d s:val_str -eval 's:subStr' 'ClipStr(val_str, "3>/")' -c  val_str SubStr

"val_str","subStr"
"http://auriq.com/documentation/source/reference/manpages/aq-emod.html","http://auriq.com/"
"https://auriq.com/boosting-trading-models-with-sagemaker-and-essentia/","https://auriq.com/"


In [64]:
aq_pp -f,+1 $urls -d s:val_str -eval 's:subStr' 'ClipStr(val_str, "1</")' -c  val_str SubStr

"val_str","subStr"
"http://auriq.com/documentation/source/reference/manpages/aq-emod.html","/aq-emod.html"
"https://auriq.com/boosting-trading-models-with-sagemaker-and-essentia","/boosting-trading-models-with-sagemaker-and-essentia"


<a id='conversion'></a>
### General Data Conversion Functions

<a id='date_time'></a>
### Date/Time conversion Functions

<a id='character_encoding'></a>
### Character set encoding conversion Functions





## multiple -eval options
