# Apply functions or lambdas to columns

Sample table and function:

In [1]:
from lemuras import Table

def process_phone(x, cut=10):
    try:
        return str(int(x))[-cut:]
    except:
        return ''

cols = ['type', 'size', 'weight', 'tel']
rows = [
    ['A', 1, 12, '+79360360193'],
    ['B', 4, 12, 84505505151],
    ['A', 3, 10, '+31415926535'],
    ['B', 6, 14, ''],
    ['A', 4, 15, '23816326412'],
    ['A', 2, 11, None],
]

df1 = Table(cols, rows, 'Sample')
df2 = df1.copy()
df2

type,size,weight,tel
A,1,12,79360360193.0
B,4,12,84505505151.0
A,3,10,31415926535.0
B,6,14,
A,4,15,23816326412.0
A,2,11,


The method `.apply(task, *args, **kwargs)` takes a function or lambda (which takes a single column value), and returns new column (similar to Python's `map` function):

In [2]:
df2['tel'].apply(process_phone)

To override existing values, you can save new column or specify `separate` argument (the preferred way):

In [3]:
df2['tel'] = df2['tel'].apply(process_phone)
df2['size'].apply(lambda x: x*2, separate=False)
df2

type,size,weight,tel
A,2,12,9360360193.0
B,8,12,4505505151.0
A,6,10,1415926535.0
B,12,14,
A,8,15,3816326412.0
A,4,11,


All the other arguments will be passed to the funtion:

In [4]:
df2 = df1.copy()
df2['tel'].apply(process_phone, 3, separate=False)
# Or..
df2['tel'].apply(process_phone, cut=3, separate=False)
df2

type,size,weight,tel
A,1,12,193.0
B,4,12,151.0
A,3,10,535.0
B,6,14,
A,4,15,412.0
A,2,11,


## Default apply-functions

Here is the list of embedded functions:

- **`'isnull'`** - `True` if value is `None`, `False` otherwise.

- **`'lengths'`** - returns string length of a value.

- **`'isin'`** - takes additional argument and checks if a value is in it.

- **`'istype'`**, **`'isinstance'`** - takes additional argument and checks if a value is an instance of it.

Usage example:

In [5]:
df2 = df1.copy()
df2['weight_10or15'] = df2['weight'].apply('isin', (10,15))
df2['tel_is_str'] = df2['tel'].apply('istype', str)
df2['tel_len'] = df2['tel'].apply('lengths')
df2['tel_null'] = df2['tel'].apply('isnull')
df2

type,size,weight,tel,weight_10or15,tel_is_str,tel_len,tel_null
A,1,12,79360360193.0,False,True,12,False
B,4,12,84505505151.0,False,False,11,False
A,3,10,31415926535.0,True,True,12,False
B,6,14,,False,True,0,False
A,4,15,23816326412.0,True,True,11,False
A,2,11,,False,False,4,True


## Type functions

In addition, there are several function names for type conversion:

- **`'str'`** - tries to convert to string.

- **`'int'`** - tries to convert to integer number.

- **`'float'`** - tries to convert to fractional number.

- **`'date'`** - tries to convert to date object (can process many string formats).

- **`'datetime'`** - tries to convert to datetime object (can deal with many string formats).

All these functions take `default` argument which is by default `0` for int/float and `None` for others. There is one more helpful function to deal with types:

- **`'none_to'`** - replaces `None` values with a given one.

In contrast to other apply-functions, these ones have `separate=False` by default. An example:

In [6]:
# Clean sample data
df2 = df1.copy()
df2['tel'].apply('none_to', -1)
df2['tel'].apply('int', default=0)
df2

type,size,weight,tel
A,1,12,79360360193
B,4,12,84505505151
A,3,10,31415926535
B,6,14,0
A,4,15,23816326412
A,2,11,-1


# Aggregate columns

You can also aggregate columns – i.e calculate a single value from a column (while applying functions create columns from columns), it's useful for statistics. The following aggregation functions are embedded and by available strings:

- **`'count'`** - elements count, group size.

- **`'min'`** - the lowest value.

- **`'max'`** - the highest value.

- **`'sum'`** - elements sum.

- **`'avg'`**, **`'mean'`** - average value.

- **`'mode'`** - the most common value.

- **`'middle'`**, **`'median'`** - a number where half of the numbers are lower and half the numbers are higher.

- **`'q1'`** - 1st quantile, or 25% percentile.

- **`'q2'`** - 2nd quantile, or 50% percentile (the same as **median**).

- **`'q3'`** - 3d quantile, or 75% percentile.

- **`'std'`** - standard deviation.

- **`'first'`** - the first value of column (it's handy when you don't need a specific element, but just an example).

- **`'last'`** - the last value of column (similar reasons).

- **`'get'`** - returns any specific value of column, just pass an index as an argument.

- **`'nunique'`** - number of unique values in the column.

- **`'nones'`**, **`'nulls'`** - number of None values in the column.

And of course you can create your own functions or lambdas. For aggregation, method `.calc(task, *args, **kwargs)` is used. It takes a function or lambda which takes an iterable with all the column values.

In [7]:
print('{} {}'.format(df2['weight'].calc('count'), df2['weight'].calc('nunique')))
print('{} {}'.format(df2['tel'].calc('first'), df2['tel'].calc('last')))

6 5
79360360193 -1


# Code simplification

All theese built-in functions can be used as methods, without `.apply` or `.calc`. The arguments works too. And both columns and tables support this behaviour:

In [8]:
# Clean sample data
df2 = df1.copy()
df2

type,size,weight,tel
A,1,12,79360360193.0
B,4,12,84505505151.0
A,3,10,31415926535.0
B,6,14,
A,4,15,23816326412.0
A,2,11,


---
**Column type conversion**

In [9]:
df2['tel'].int()

**Column applying**  

In [10]:
df2['size'].isin((2,4), separate=False)
df2

type,size,weight,tel
A,False,12,79360360193
B,True,12,84505505151
A,False,10,31415926535
B,False,14,0
A,True,15,23816326412
A,True,11,0


---
**Column aggregation**

In [11]:
print('Weight is about {:.1f} ± {:.1f}'.format(df2['weight'].avg(), df2['weight'].std()))

Weight is about 12.3 ± 1.7


---
### Tables

In [12]:
# Clean sample data
df2 = df1.copy()
df2

type,size,weight,tel
A,1,12,79360360193.0
B,4,12,84505505151.0
A,3,10,31415926535.0
B,6,14,
A,4,15,23816326412.0
A,2,11,


---
**Table aggregation**  
It always creates new Table object

In [13]:
df2.first()

Column,first
type,A
size,1
weight,12
tel,+79360360193


---
**Table type conversion**  
It always changes original table

In [14]:
df2.int()
df2

type,size,weight,tel
0,1,12,79360360193
0,4,12,84505505151
0,3,10,31415926535
0,6,14,0
0,4,15,23816326412
0,2,11,0


---
**Table applying**  
It always creates new Table object

In [15]:
df2.isin((2,3,4,12))

type,size,weight,tel
False,False,True,False
False,True,True,False
False,True,False,False
False,False,False,False
False,True,False,False
False,True,False,False


---
# Applying functions to table rows

Sometimes you need to build new column using entire row, not just a single column. To do this, you can use `Table.calc(task, *args, **kwargs)` method that takes a function (which takes a row object and must return a value that will be used to create new column). Other `calc` arguments are just passed to your function. It may sound complicated, but there is nothing difficult:

In [16]:
# Clean sample data
df2 = df1.copy()
df2

type,size,weight,tel
A,1,12,79360360193.0
B,4,12,84505505151.0
A,3,10,31415926535.0
B,6,14,
A,4,15,23816326412.0
A,2,11,


Custom function:

In [17]:
def process(row, special):
    if row['type'] == special:
        return row['size'] + row['weight']
    else:
        return row['size'] * row['weight']

Applying:

In [18]:
df2['something'] = df2.calc(process, special='A')
df2

type,size,weight,tel,something
A,1,12,79360360193.0,13
B,4,12,84505505151.0,48
A,3,10,31415926535.0,13
B,6,14,,84
A,4,15,23816326412.0,19
A,2,11,,13
