# Function Application

To apply your own or another library’s functions to Pandas objects, you should be aware of the three important methods.  
- Table wise Function Application: `pipe()`
- Row or Column Wise Function Application: `apply()`
- Element wise Function Application: `applymap()`

The appropriate method to use depends on whether your function expects to operate on an entire DataFrame, row or column-wise, or element wise.

In [1]:
# !pip install numpy
# !pip install pandas

In [2]:
import numpy as np
import pandas as pd

Defining a function for adding two elements:

In [3]:
def add2(elem1, elem2):
    '''
    Add two numbers
    '''
    return(elem1 + elem2)

Creating a DataFrame df:

In [4]:
df = pd.DataFrame({'Col1':[1,1,1,1],'Col2':[1,2,3,4],'Col3':[-1,0,1,2]})
df

Unnamed: 0,Col1,Col2,Col3
0,1,1,-1
1,1,2,0
2,1,3,1
3,1,4,2


## Table-wise Function Application

**Table-wise Function Application**: Custom operations can be performed by passing the function and the appropriate number of parameters as pipe arguments. Thus, operation is performed on the whole DataFrame.

`pipe()` function performs the custom operation for the entire DataFrame. 

In below example we will using `pipe()` function to add value 5 to the entire dataframe

In [5]:
df.pipe(add2,5)

Unnamed: 0,Col1,Col2,Col3
0,6,6,4
1,6,7,5
2,6,8,6
3,6,9,7


In [6]:
# df.pipe does not affect the original DataFrame df
df

Unnamed: 0,Col1,Col2,Col3
0,1,1,-1
1,1,2,0
2,1,3,1
3,1,4,2


In [7]:
df2 = df.pipe(add2,10)
df2

Unnamed: 0,Col1,Col2,Col3
0,11,11,9
1,11,12,10
2,11,13,11
3,11,14,12


## Row or Column Wise Function Application

`apply()` function performs the custom operation for either row wise or column wise.

In [8]:
# Calculating mean by columns
df.apply(np.mean)

Col1    1.0
Col2    2.5
Col3    0.5
dtype: float64

In [9]:
# You can calculate the mean by columns without the apply function
df.mean()

Col1    1.0
Col2    2.5
Col3    0.5
dtype: float64

Let's define a function that calculates the sum of the squares values of a list.

In [10]:
def sum_squares(l):
    '''
    Return the sum of squares of the list
    '''
    return(sum(x**2 for x in l))

In [11]:
df.apply(sum_squares)

Col1     4
Col2    30
Col3     6
dtype: int64

In [12]:
# The parameter axis=0 gives the same result
df.apply(sum_squares, axis=0)

Col1     4
Col2    30
Col3     6
dtype: int64

In [13]:
# Using the parameter axis=1, operations will be performed row wise:
df.apply(sum_squares,axis=1)

0     3
1     5
2    11
3    21
dtype: int64

We can create a lambda function to determine the range = max - min by column:

In [14]:
display(df)
df.apply(lambda x: x.max()-x.min())

Unnamed: 0,Col1,Col2,Col3
0,1,1,-1
1,1,2,0
2,1,3,1
3,1,4,2


Col1    0
Col2    3
Col3    3
dtype: int64

or by row (specifying axis=1)

In [15]:
df.apply(lambda x: x.max()-x.min(), axis=1)

0    2
1    2
2    2
3    3
dtype: int64

## Element Wise Function Application

`applymap()` function performs the specified operation for all the elements the dataframe.

Let's use a lambda function to multiply every element in the dataframe by 100

In [16]:
display(df)
df.applymap(lambda x: x*100)

Unnamed: 0,Col1,Col2,Col3
0,1,1,-1
1,1,2,0
2,1,3,1
3,1,4,2


Unnamed: 0,Col1,Col2,Col3
0,100,100,-100
1,100,200,0
2,100,300,100
3,100,400,200


In [17]:
# Combining multiple functions
df.applymap(lambda x: x*100).apply(np.mean)

Col1    100.0
Col2    250.0
Col3     50.0
dtype: float64

`map()` is used to substitute each value in a Series with another value.

Let's use the `map()` function over the Serie: Col1.

In [18]:
display(df.Col1)
df.Col1.map(lambda x: x*100)

0    1
1    1
2    1
3    1
Name: Col1, dtype: int64

0    100
1    100
2    100
3    100
Name: Col1, dtype: int64

Let's use `pipe()` to do the same over the whole DataFrame.

In [19]:
df.pipe(lambda x: x*100)

Unnamed: 0,Col1,Col2,Col3
0,100,100,-100
1,100,200,0
2,100,300,100
3,100,400,200


In [20]:
# Remember those function applications do not modify the original DataFrame df
display(df)

Unnamed: 0,Col1,Col2,Col3
0,1,1,-1
1,1,2,0
2,1,3,1
3,1,4,2


## Iterations

The behavior of basic iteration over Pandas objects depends on the type. 

When iterating over a Series, it is regarded as array-like, and basic iteration produces the values. 

Other data structures, like DataFrames, follow the dict-like convention of iterating over the keys of the objects.

In [21]:
N = 3

In [22]:
dfi = pd.DataFrame({
   'A': pd.date_range(start='2021-02-15',periods=N,freq='D'),
   'B': np.linspace(0,stop=N-1,num=N),
   'C': np.random.rand(N),
   'D': np.random.choice(['Low','Medium','High'],N).tolist(),
   'E': np.random.normal(100, 10, size=(N)).tolist()
})
df.head()

Unnamed: 0,Col1,Col2,Col3
0,1,1,-1
1,1,2,0
2,1,3,1
3,1,4,2


In [23]:
dfi = pd.DataFrame({
   'Date': pd.date_range(start='2022-05-15',periods=N,freq='D'),
   'Values': np.linspace(0,stop=N-1,num=N),
   'Categ': np.random.choice(['Low','Medium','High'],N).tolist()
})
dfi.head()

Unnamed: 0,Date,Values,Categ
0,2022-05-15,0.0,Low
1,2022-05-16,1.0,Medium
2,2022-05-17,2.0,Medium


In [24]:
# Getting the names of the columns
for col in dfi:
    print(col)

Date
Values
Categ


To iterate over the rows of the DataFrame, we can use the following functions:
- `iteritems()`: to iterate over the (key,value) pairs
- `iterrows()`: iterate over the rows as (index,series) pairs
- `itertuples()`: iterate over the rows as namedtuples

`iteritems()`: Iterates over each column as key, and column value as a Series object.

In [25]:
for key,value in dfi.iteritems():
   print('\nKey =', key)
   print('Value =\n',value)


Key = Date
Value =
 0   2022-05-15
1   2022-05-16
2   2022-05-17
Name: Date, dtype: datetime64[ns]

Key = Values
Value =
 0    0.0
1    1.0
2    2.0
Name: Values, dtype: float64

Key = Categ
Value =
 0       Low
1    Medium
2    Medium
Name: Categ, dtype: object


`iterrows()`: returns the iterator yielding each index value along with a series containing the data in each row.

In [26]:
for row in dfi.iterrows():
   print(row,'\n')

(0, Date      2022-05-15 00:00:00
Values                    0.0
Categ                     Low
Name: 0, dtype: object) 

(1, Date      2022-05-16 00:00:00
Values                    1.0
Categ                  Medium
Name: 1, dtype: object) 

(2, Date      2022-05-17 00:00:00
Values                    2.0
Categ                  Medium
Name: 2, dtype: object) 



`itertuples()`:  returns an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values. To preserve dtypes while iterating over the rows, it is better to use itertuples()

In [27]:
for row in dfi.itertuples():
    print(row,'\n')

Pandas(Index=0, Date=Timestamp('2022-05-15 00:00:00'), Values=0.0, Categ='Low') 

Pandas(Index=1, Date=Timestamp('2022-05-16 00:00:00'), Values=1.0, Categ='Medium') 

Pandas(Index=2, Date=Timestamp('2022-05-17 00:00:00'), Values=2.0, Categ='Medium') 



## Sorting

There are two kinds of sorting available in Pandas: by label and by actual values.

In [28]:
unsorted_df=pd.DataFrame({
    'Z': np.random.randint(0, 5, size=10),
    'A': np.random.randint(2, 5, size=10),
    'K': np.random.randint(1, 8, size=10)},
    index=[1, 4, 6, 2, 3, 5, 9, 8, 0, 7])    
unsorted_df

Unnamed: 0,Z,A,K
1,1,4,4
4,2,2,4
6,0,3,3
2,4,2,6
3,0,2,7
5,3,4,4
9,4,4,1
8,1,3,5
0,2,4,6
7,3,4,5


In [29]:
# Sorting by label
unsorted_df.sort_index()

Unnamed: 0,Z,A,K
0,2,4,6
1,1,4,4
2,4,2,6
3,0,2,7
4,2,2,4
5,3,4,4
6,0,3,3
7,3,4,5
8,1,3,5
9,4,4,1


In [30]:
# Sorting labels in descending order
unsorted_df.sort_index(ascending=False)

Unnamed: 0,Z,A,K
9,4,4,1
8,1,3,5
7,3,4,5
6,0,3,3
5,3,4,4
4,2,2,4
3,0,2,7
2,4,2,6
1,1,4,4
0,2,4,6


In [31]:
# Sort the column names
unsorted_df.sort_index(axis=1)

Unnamed: 0,A,K,Z
1,4,4,1
4,2,4,2
6,3,3,0
2,2,6,4
3,2,7,0
5,4,4,3
9,4,1,4
8,3,5,1
0,4,6,2
7,4,5,3


In [32]:
# Sort the column names in descending order
unsorted_df.sort_index(axis=1, ascending=False)

Unnamed: 0,Z,K,A
1,1,4,4
4,2,4,2
6,0,3,3
2,4,6,2
3,0,7,2
5,3,4,4
9,4,1,4
8,1,5,3
0,2,6,4
7,3,5,4


Sorting by value

In [33]:
# Sorting values by column Z
unsorted_df.sort_values(by='Z')

Unnamed: 0,Z,A,K
6,0,3,3
3,0,2,7
1,1,4,4
8,1,3,5
4,2,2,4
0,2,4,6
5,3,4,4
7,3,4,5
2,4,2,6
9,4,4,1


In [34]:
# Sorting values by columns Z and A
unsorted_df.sort_values(by=['Z','A'])

Unnamed: 0,Z,A,K
3,0,2,7
6,0,3,3
8,1,3,5
1,1,4,4
4,2,2,4
0,2,4,6
5,3,4,4
7,3,4,5
2,4,2,6
9,4,4,1


Reference:
- VanderPlas, J. (2017) Python Data Science Handbook: Essential Tools for Working with Data. USA: O’Reilly Media, Inc. chapter 3