# Function Application

To apply your own or another library’s functions to Pandas objects, you should be aware of the three important methods.  
- Table wise Function Application: `pipe()`
- Row or Column Wise Function Application: `apply()`
- Element wise Function Application: `applymap()`

The appropriate method to use depends on whether your function expects to operate on an entire DataFrame, row or column-wise, or element wise.

In [1]:
# !pip install numpy
# !pip install pandas

In [2]:
import numpy as np
import pandas as pd

In [3]:
# Defining a function for adding two elements
def add2(elem1, elem2):
    return(elem1 + elem2)

In [4]:
# Creating a DataFrame df
df = pd.DataFrame({'Col1':[1,2,3,4,5],'Col2':[6,7,8,9,10],'Col3':[11,12,13,14,15]})
df

Unnamed: 0,Col1,Col2,Col3
0,1,6,11
1,2,7,12
2,3,8,13
3,4,9,14
4,5,10,15


## Table-wise Function Application

**Table-wise Function Application**: Custom operations can be performed by passing the function and the appropriate number of parameters as pipe arguments. Thus, operation is performed on the whole DataFrame.

`pipe()` function performs the custom operation for the entire DataFrame. 

In below example we will using `pipe()` function to add value 5 to the entire dataframe

In [5]:
df.pipe(add2,5)

Unnamed: 0,Col1,Col2,Col3
0,6,11,16
1,7,12,17
2,8,13,18
3,9,14,19
4,10,15,20


In [6]:
df.pipe(add2,100)

Unnamed: 0,Col1,Col2,Col3
0,101,106,111
1,102,107,112
2,103,108,113
3,104,109,114
4,105,110,115


## Row or Column Wise Function Application

`apply()` function performs the custom operation for either row wise or column wise.

In [7]:
# Calculating mean by columns
df.apply(np.mean)

Col1     3.0
Col2     8.0
Col3    13.0
dtype: float64

In [8]:
# Calculating sum by columns
df.apply(np.sum)

Col1    15
Col2    40
Col3    65
dtype: int64

In [9]:
# The parameter axis=0 gives the same result
df.apply(np.mean,axis=0)

Col1     3.0
Col2     8.0
Col3    13.0
dtype: float64

In [10]:
# Using the parameter axis=1, operations will be performed row wise:
df.apply(np.mean,axis=1)

0     6.0
1     7.0
2     8.0
3     9.0
4    10.0
dtype: float64

We can create a lambda function to determine the range = max - min by column:

In [11]:
display(df)
df.apply(lambda x: x.max()-x.min())

Unnamed: 0,Col1,Col2,Col3
0,1,6,11
1,2,7,12
2,3,8,13
3,4,9,14
4,5,10,15


Col1    4
Col2    4
Col3    4
dtype: int64

or by row (specifying axis=1)

In [12]:
df.apply(lambda x: x.max()-x.min(), axis=1)

0    10
1    10
2    10
3    10
4    10
dtype: int64

## Element Wise Function Application

`applymap()` Function performs the specified operation for all the elements the dataframe.

Let's use a lambda function to multiply every element in the dataframe by 100

In [13]:
display(df)
df.applymap(lambda x: x*100)

Unnamed: 0,Col1,Col2,Col3
0,1,6,11
1,2,7,12
2,3,8,13
3,4,9,14
4,5,10,15


Unnamed: 0,Col1,Col2,Col3
0,100,600,1100
1,200,700,1200
2,300,800,1300
3,400,900,1400
4,500,1000,1500


`map()` is used to substitute each value in a Series with another value.

Let's use the `map()` function over the Serie: Col1.

In [14]:
display(df)
df['Col1'].map(lambda x: x*100)

Unnamed: 0,Col1,Col2,Col3
0,1,6,11
1,2,7,12
2,3,8,13
3,4,9,14
4,5,10,15


0    100
1    200
2    300
3    400
4    500
Name: Col1, dtype: int64

Let's use `pipe()`to do the same over the whole DataFrame.

In [15]:
df.pipe(lambda x: x*100)

Unnamed: 0,Col1,Col2,Col3
0,100,600,1100
1,200,700,1200
2,300,800,1300
3,400,900,1400
4,500,1000,1500


## Iterations

The behavior of basic iteration over Pandas objects depends on the type. 

When iterating over a Series, it is regarded as array-like, and basic iteration produces the values. 

Other data structures, like DataFrames, follow the dict-like convention of iterating over the keys of the objects.

In [16]:
N = 3

In [17]:
df = pd.DataFrame({
   'A': pd.date_range(start='2021-02-15',periods=N,freq='D'),
   'B': np.linspace(0,stop=N-1,num=N),
   'C': np.random.rand(N),
   'D': np.random.choice(['Low','Medium','High'],N).tolist(),
   'E': np.random.normal(100, 10, size=(N)).tolist()
})
df.head()

Unnamed: 0,A,B,C,D,E
0,2021-02-15,0.0,0.050627,Medium,105.976508
1,2021-02-16,1.0,0.391913,Low,96.60378
2,2021-02-17,2.0,0.856296,Low,107.349941


In [18]:
# Getting the names of the columns
for col in df:
    print(col)

A
B
C
D
E


To iterate over the rows of the DataFrame, we can use the following functions:
- `iteritems()`: to iterate over the (key,value) pairs
- `iterrows()`: iterate over the rows as (index,series) pairs
- `itertuples()`: iterate over the rows as namedtuples

`iteritems()`: Iterates over each column as key, and column value as a Series object.

In [19]:
for key,value in df.iteritems():
   print('\nKey =', key)
   print('Value =\n',value)


Key = A
Value =
 0   2021-02-15
1   2021-02-16
2   2021-02-17
Name: A, dtype: datetime64[ns]

Key = B
Value =
 0    0.0
1    1.0
2    2.0
Name: B, dtype: float64

Key = C
Value =
 0    0.050627
1    0.391913
2    0.856296
Name: C, dtype: float64

Key = D
Value =
 0    Medium
1       Low
2       Low
Name: D, dtype: object

Key = E
Value =
 0    105.976508
1     96.603780
2    107.349941
Name: E, dtype: float64


`iterrows()`: returns the iterator yielding each index value along with a series containing the data in each row.

In [20]:
for row in df.iterrows():
   print(row,'\n')

(0, A    2021-02-15 00:00:00
B                    0.0
C               0.050627
D                 Medium
E             105.976508
Name: 0, dtype: object) 

(1, A    2021-02-16 00:00:00
B                    1.0
C               0.391913
D                    Low
E               96.60378
Name: 1, dtype: object) 

(2, A    2021-02-17 00:00:00
B                    2.0
C               0.856296
D                    Low
E             107.349941
Name: 2, dtype: object) 



`itertuples()`:  returns an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values. To preserve dtypes while iterating over the rows, it is better to use itertuples()

In [21]:
for row in df.itertuples():
    print(row,'\n')

Pandas(Index=0, A=Timestamp('2021-02-15 00:00:00'), B=0.0, C=0.05062735182638867, D='Medium', E=105.97650822625796) 

Pandas(Index=1, A=Timestamp('2021-02-16 00:00:00'), B=1.0, C=0.3919129548161956, D='Low', E=96.60377994449843) 

Pandas(Index=2, A=Timestamp('2021-02-17 00:00:00'), B=2.0, C=0.8562962028381961, D='Low', E=107.3499407883847) 



## Sorting

There are two kinds of sorting available in Pandas: by label and by actual values.

In [22]:
unsorted_df=pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns=['ZZZ','AAA'])
unsorted_df

Unnamed: 0,ZZZ,AAA
1,0.266684,-0.896683
4,0.803401,-0.04276
6,-1.923721,0.121504
2,0.303728,-2.16719
3,-0.451747,-1.631839
5,0.99676,-1.898521
9,1.003958,-0.49565
8,1.04826,-0.814264
0,2.72736,-0.337402
7,0.015127,0.063375


In [23]:
# Sorting by label
sort_df=unsorted_df.sort_index()
sort_df

Unnamed: 0,ZZZ,AAA
0,2.72736,-0.337402
1,0.266684,-0.896683
2,0.303728,-2.16719
3,-0.451747,-1.631839
4,0.803401,-0.04276
5,0.99676,-1.898521
6,-1.923721,0.121504
7,0.015127,0.063375
8,1.04826,-0.814264
9,1.003958,-0.49565


In [24]:
sort_df=unsorted_df.sort_index(ascending=False)
sort_df

Unnamed: 0,ZZZ,AAA
9,1.003958,-0.49565
8,1.04826,-0.814264
7,0.015127,0.063375
6,-1.923721,0.121504
5,0.99676,-1.898521
4,0.803401,-0.04276
3,-0.451747,-1.631839
2,0.303728,-2.16719
1,0.266684,-0.896683
0,2.72736,-0.337402


In [25]:
# Sort the columns
sort_df=unsorted_df.sort_index(axis=1)
sort_df

Unnamed: 0,AAA,ZZZ
1,-0.896683,0.266684
4,-0.04276,0.803401
6,0.121504,-1.923721
2,-2.16719,0.303728
3,-1.631839,-0.451747
5,-1.898521,0.99676
9,-0.49565,1.003958
8,-0.814264,1.04826
0,-0.337402,2.72736
7,0.063375,0.015127


Sorting by value

In [26]:
unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]})
print('Unsorted dataframe\n',unsorted_df)
sorted_df = unsorted_df.sort_values(by='col1')
print('\nSorted by col1\n',sorted_df)

Unsorted dataframe
    col1  col2
0     2     1
1     1     3
2     1     2
3     1     4

Sorted by col1
    col1  col2
1     1     3
2     1     2
3     1     4
0     2     1


In [27]:
sorted_df = unsorted_df.sort_values(by=['col1','col2'])
sorted_df

Unnamed: 0,col1,col2
2,1,2
1,1,3
3,1,4
0,2,1
