# Function Application

To apply your own or another library’s functions to Pandas objects, you should be aware of the three important methods.  
- Table wise Function Application: `pipe()`
- Row or Column Wise Function Application: `apply()`
- Element wise Function Application: `applymap()`

The appropriate method to use depends on whether your function expects to operate on an entire DataFrame, row or column-wise, or element wise.

In [1]:
# !pip install numpy
# !pip install pandas

In [2]:
import numpy as np
import pandas as pd

Defining a function for adding two elements:

In [3]:
def add2(elem1, elem2):
    return(elem1 + elem2)

Creating a DataFrame df:

In [4]:
df = pd.DataFrame({'Col1':[1,2,3,4],'Col2':[5,6,7,8],'Col3':[9,10,11,12]})
df

Unnamed: 0,Col1,Col2,Col3
0,1,5,9
1,2,6,10
2,3,7,11
3,4,8,12


## Table-wise Function Application

**Table-wise Function Application**: Custom operations can be performed by passing the function and the appropriate number of parameters as pipe arguments. Thus, operation is performed on the whole DataFrame.

`pipe()` function performs the custom operation for the entire DataFrame. 

In below example we will using `pipe()` function to add value 5 to the entire dataframe

In [5]:
df.pipe(add2,5)

Unnamed: 0,Col1,Col2,Col3
0,6,10,14
1,7,11,15
2,8,12,16
3,9,13,17


In [6]:
df.pipe(add2,100)

Unnamed: 0,Col1,Col2,Col3
0,101,105,109
1,102,106,110
2,103,107,111
3,104,108,112


## Row or Column Wise Function Application

`apply()` function performs the custom operation for either row wise or column wise.

In [7]:
# Calculating mean by columns
df.apply(np.mean)

Col1     2.5
Col2     6.5
Col3    10.5
dtype: float64

In [8]:
# Calculating sum by columns
df.apply(np.sum)

Col1    10
Col2    26
Col3    42
dtype: int64

In [9]:
# The parameter axis=0 gives the same result
df.apply(np.mean,axis=0)

Col1     2.5
Col2     6.5
Col3    10.5
dtype: float64

In [10]:
# Using the parameter axis=1, operations will be performed row wise:
df.apply(np.mean,axis=1)

0    5.0
1    6.0
2    7.0
3    8.0
dtype: float64

We can create a lambda function to determine the range = max - min by column:

In [11]:
display(df)
df.apply(lambda x: x.max()-x.min())

Unnamed: 0,Col1,Col2,Col3
0,1,5,9
1,2,6,10
2,3,7,11
3,4,8,12


Col1    3
Col2    3
Col3    3
dtype: int64

or by row (specifying axis=1)

In [12]:
df.apply(lambda x: x.max()-x.min(), axis=1)

0    8
1    8
2    8
3    8
dtype: int64

## Element Wise Function Application

**applymap()**: function performs the specified operation for all the elements the dataframe.

Let's use a lambda function to multiply every element in the dataframe by 100

In [13]:
display(df)
df.applymap(lambda x: x*100)

Unnamed: 0,Col1,Col2,Col3
0,1,5,9
1,2,6,10
2,3,7,11
3,4,8,12


Unnamed: 0,Col1,Col2,Col3
0,100,500,900
1,200,600,1000
2,300,700,1100
3,400,800,1200


**map()** is used to substitute each value in a Series with another value.

Let's use the **map()** function over the Serie: Col1.

In [14]:
display(df)
df['Col1'].map(lambda x: x*100)

Unnamed: 0,Col1,Col2,Col3
0,1,5,9
1,2,6,10
2,3,7,11
3,4,8,12


0    100
1    200
2    300
3    400
Name: Col1, dtype: int64

Let's use **pipe()**to do the same over the whole DataFrame.

In [15]:
df.pipe(lambda x: x*100)

Unnamed: 0,Col1,Col2,Col3
0,100,500,900
1,200,600,1000
2,300,700,1100
3,400,800,1200


## Iterations

The behavior of basic iteration over Pandas objects depends on the type. 

When iterating over a Series, it is regarded as array-like, and basic iteration produces the values. 

Other data structures, like DataFrames, follow the dict-like convention of iterating over the keys of the objects.

In [16]:
N = 3

In [17]:
df = pd.DataFrame({
   'A': pd.date_range(start='2021-02-15',periods=N,freq='D'),
   'B': np.linspace(0,stop=N-1,num=N),
   'C': np.random.rand(N),
   'D': np.random.choice(['Low','Medium','High'],N).tolist(),
   'E': np.random.normal(100, 10, size=(N)).tolist()
})
df.head()

Unnamed: 0,A,B,C,D,E
0,2021-02-15,0.0,0.012791,Low,99.884162
1,2021-02-16,1.0,0.154102,Medium,103.726948
2,2021-02-17,2.0,0.58049,High,90.140869


In [18]:
# Getting the names of the columns
for col in df:
    print(col)

A
B
C
D
E


To iterate over the rows of the DataFrame, we can use the following functions:
- `iteritems()`: to iterate over the (key,value) pairs
- `iterrows()`: iterate over the rows as (index,series) pairs
- `itertuples()`: iterate over the rows as namedtuples

**`iteritems()`**: Iterates over each column as key, and column value as a Series object.

In [19]:
for key,value in df.iteritems():
   print('\nKey =', key)
   print('Value =\n',value)


Key = A
Value =
 0   2021-02-15
1   2021-02-16
2   2021-02-17
Name: A, dtype: datetime64[ns]

Key = B
Value =
 0    0.0
1    1.0
2    2.0
Name: B, dtype: float64

Key = C
Value =
 0    0.012791
1    0.154102
2    0.580490
Name: C, dtype: float64

Key = D
Value =
 0       Low
1    Medium
2      High
Name: D, dtype: object

Key = E
Value =
 0     99.884162
1    103.726948
2     90.140869
Name: E, dtype: float64


**`iterrows()`**: returns the iterator yielding each index value along with a series containing the data in each row.

In [20]:
for row in df.iterrows():
   print(row,'\n')

(0, A    2021-02-15 00:00:00
B                    0.0
C               0.012791
D                    Low
E              99.884162
Name: 0, dtype: object) 

(1, A    2021-02-16 00:00:00
B                    1.0
C               0.154102
D                 Medium
E             103.726948
Name: 1, dtype: object) 

(2, A    2021-02-17 00:00:00
B                    2.0
C                0.58049
D                   High
E              90.140869
Name: 2, dtype: object) 



**`itertuples()`**:  returns an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values. To preserve dtypes while iterating over the rows, it is better to use itertuples()

In [21]:
for row in df.itertuples():
    print(row,'\n')

Pandas(Index=0, A=Timestamp('2021-02-15 00:00:00'), B=0.0, C=0.01279133140863209, D='Low', E=99.88416197137813) 

Pandas(Index=1, A=Timestamp('2021-02-16 00:00:00'), B=1.0, C=0.1541024551908443, D='Medium', E=103.72694837141779) 

Pandas(Index=2, A=Timestamp('2021-02-17 00:00:00'), B=2.0, C=0.5804900039570102, D='High', E=90.1408688856469) 



## Sorting

There are two kinds of sorting available in Pandas: by label and by actual values.

In [22]:
unsorted_df=pd.DataFrame(np.random.randn(10,2),
            index=[1,4,6,2,3,5,9,8,0,7],columns=['ZZZ','AAA'])
unsorted_df

Unnamed: 0,ZZZ,AAA
1,0.47957,-0.792184
4,0.234167,1.75848
6,0.926081,-1.12079
2,2.223561,0.014275
3,-0.442473,0.871344
5,-0.775522,1.405597
9,0.111265,-0.188005
8,1.527415,-0.437406
0,-1.381337,0.139278
7,-0.320916,0.66203


In [23]:
# Sorting by label
sort_df=unsorted_df.sort_index()
sort_df

Unnamed: 0,ZZZ,AAA
0,-1.381337,0.139278
1,0.47957,-0.792184
2,2.223561,0.014275
3,-0.442473,0.871344
4,0.234167,1.75848
5,-0.775522,1.405597
6,0.926081,-1.12079
7,-0.320916,0.66203
8,1.527415,-0.437406
9,0.111265,-0.188005


In [24]:
sort_df=unsorted_df.sort_index(ascending=False)
sort_df

Unnamed: 0,ZZZ,AAA
9,0.111265,-0.188005
8,1.527415,-0.437406
7,-0.320916,0.66203
6,0.926081,-1.12079
5,-0.775522,1.405597
4,0.234167,1.75848
3,-0.442473,0.871344
2,2.223561,0.014275
1,0.47957,-0.792184
0,-1.381337,0.139278


In [25]:
# Sort the columns
sort_df=unsorted_df.sort_index(axis=1)
sort_df

Unnamed: 0,AAA,ZZZ
1,-0.792184,0.47957
4,1.75848,0.234167
6,-1.12079,0.926081
2,0.014275,2.223561
3,0.871344,-0.442473
5,1.405597,-0.775522
9,-0.188005,0.111265
8,-0.437406,1.527415
0,0.139278,-1.381337
7,0.66203,-0.320916


Sorting by value

In [26]:
unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]})
print('Unsorted dataframe\n',unsorted_df)
sorted_df = unsorted_df.sort_values(by='col1')
print('\nSorted by col1\n',sorted_df)

Unsorted dataframe
    col1  col2
0     2     1
1     1     3
2     1     2
3     1     4

Sorted by col1
    col1  col2
1     1     3
2     1     2
3     1     4
0     2     1


In [27]:
sorted_df = unsorted_df.sort_values(by=['col1','col2'])
sorted_df

Unnamed: 0,col1,col2
2,1,2
1,1,3
3,1,4
0,2,1


Reference:
- VanderPlas, J. (2017) Python Data Science Handbook: Essential Tools for Working with Data. USA: O’Reilly Media, Inc. chapter 3