#**Python | Pandas.apply()**

Pandas.apply allow the users to pass a function and apply it on every single value of the Pandas series. It comes as a huge improvement for the pandas library as this function helps to segregate data according to the conditions required due to which it is efficiently used in data science and machine learning.

In [1]:
import pandas as pd

# reading csv
s = pd.read_csv("/content/nba (1).csv")

def fun(num):

    if num<200:
        return "Low"

    elif num>= 200 and num<260:
        return "Normal"

    else:
        return "High"


In [5]:
s

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [None]:
type(s)

In [None]:
type(s['Weight'])

In [2]:
# passing function to apply and storing returned series in new
new = s['Weight'].apply(fun)

# printing first 3 element
print(new.head(3))


0       Low
1    Normal
2    Normal
Name: Weight, dtype: object


In [4]:
new

Unnamed: 0,Weight
0,Low
1,Normal
2,Normal
3,Low
4,Normal
...,...
453,Normal
454,Low
455,Normal
456,Normal


In [None]:
#Printing elements somewhere middle of the series
new[205],new[265],new[190]

('Normal', 'Normal', 'Normal')

In [None]:
#Printing elements somewhere last of the series
new.tail()

Unnamed: 0,Weight
453,Normal
454,Low
455,Normal
456,Normal
457,High


**Example #2:**

In the following example, a temporary anonymous function is made in .apply itself using lambda. It adds 5 to each value in series and returns a new series.

In [None]:
# adding 5 to each value
new = s['Number'].apply(lambda num : num + 5)

# printing first 5 elements of old and new series
print(s['Number'].head(), '\n', new.head())

# printing last 5 elements of old and new series
print('\n\n', s['Number'].tail(), '\n', new.tail())

0     0.0
1    99.0
2    30.0
3    28.0
4     8.0
Name: Number, dtype: float64 
 0      5.0
1    104.0
2     35.0
3     33.0
4     13.0
Name: Number, dtype: float64


 453     8.0
454    25.0
455    21.0
456    24.0
457     NaN
Name: Number, dtype: float64 
 453    13.0
454    30.0
455    26.0
456    29.0
457     NaN
Name: Number, dtype: float64


#**Apply Function to Every Row in a Pandas DataFrame**

There are various ways to Perform element-wise operations on DataFrame columns. here we are discussing some examples for Perform element-wise operations on DataFrame columns those are following.

Applying User-Defined Function to Every Row of Pandas DataFrame

Apply Lambda to Every Row of DataFrame

Apply built-in function to Every Row


**1.Applying User-Defined Function to Every Row of Pandas DataFrame**

In [8]:
import pandas as pd

# Function to add
def add_values(row):
    return row['A']+row['B']+row['C']


# Create a dictionary with three fields each
data = {
        'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9]}

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
print("Original DataFrame:\n", df)

# Apply the user-defined function to every row
df['add'] = df.apply(add_values,axis=1)

print('\nAfter Applying Function: ')
# Print the new DataFrame
print(df)


Original DataFrame:
    A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

After Applying Function: 
   A  B  C  add
0  1  4  7   12
1  2  5  8   15
2  3  6  9   18


**2.Apply Lambda to Every Row of DataFrame**

In [None]:
# Import pandas package
import pandas as pd
# Function to add

def add(a, b, c):
    return a + b + c

data = {
        'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9]}

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
print("Original DataFrame:\n", df)

df['add'] = df.apply(lambda row: add(row['A'],
                                         row['B'], row['C']), axis=1)

print('\nAfter Applying Function: ')
# printing the new dataframe
print(df)


Original DataFrame:
    A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

After Applying Function: 
   A  B  C  add
0  1  4  7   12
1  2  5  8   15
2  3  6  9   18


**3.Apply NumPy.sum() to Every Row**

In [None]:
 import numpy as np
 # create a dictionary
 data = {'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9]}

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
print("Original DataFrame:\n", df)

# applying function to each row in the dataframe
# and storing result in a new column
df['add'] = df.apply(np.sum, axis=1)

print('\nAfter Applying Function: ')
# printing the new dataframe
print(df)

Original DataFrame:
    A  B  C
0  1  4  7
1  2  5  8
2  3  6  9

After Applying Function: 
   A  B  C  add
0  1  4  7   12
1  2  5  8   15
2  3  6  9   18


#Python | Pandas Series.apply()

Pandas Series.apply() function invoke the passed function on each element of the given series object.

**Example #1:** Use Series.apply() function to change the city name to ‘Montreal’ if the city is ‘Rio’.

In [None]:
# importing pandas as pd
import pandas as pd

# Creating the Series
sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio'])

# Create the Index
index_ = ['City 1', 'City 2', 'City 3', 'City 4', 'City 5']

# set the index
sr.index = index_

# Print the series
print(sr)

City 1    New York
City 2     Chicago
City 3     Toronto
City 4      Lisbon
City 5         Rio
dtype: object


In [None]:
# change 'Rio' to 'Montreal'
# we have used a lambda function
result = sr.apply(lambda x : 'Montreal' if x =='Rio' else x )

# Print the result
print(result)

City 1    New York
City 2     Chicago
City 3     Toronto
City 4      Lisbon
City 5    Montreal
dtype: object


**Example #2 :** Use Series.apply() function to return True if the value in the given series object is greater than 30 else return False.

In [None]:
# importing pandas as pd
import pandas as pd

# Creating the Series
sr = pd.Series([11, 21, 8, 18, 65, 18, 32, 10, 5, 32, None])

# Create the Index
# apply yearly frequency
index_ = pd.date_range('2010-10-09 08:45', periods = 11, freq ='A')

# set the index
sr.index = index_

# Print the series
print(sr)

2010-12-31 08:45:00    11.0
2011-12-31 08:45:00    21.0
2012-12-31 08:45:00     8.0
2013-12-31 08:45:00    18.0
2014-12-31 08:45:00    65.0
2015-12-31 08:45:00    18.0
2016-12-31 08:45:00    32.0
2017-12-31 08:45:00    10.0
2018-12-31 08:45:00     5.0
2019-12-31 08:45:00    32.0
2020-12-31 08:45:00     NaN
Freq: A-DEC, dtype: float64


In [None]:
# return True if greater than 30
# else return False
result = sr.apply(lambda x : True if x>30 else False)

# Print the result
print(result)

2010-12-31 08:45:00    False
2011-12-31 08:45:00    False
2012-12-31 08:45:00    False
2013-12-31 08:45:00    False
2014-12-31 08:45:00     True
2015-12-31 08:45:00    False
2016-12-31 08:45:00     True
2017-12-31 08:45:00    False
2018-12-31 08:45:00    False
2019-12-31 08:45:00     True
2020-12-31 08:45:00    False
Freq: A-DEC, dtype: bool


#Pandas dataframe.aggregate() Syntax in Python

In [None]:
#Syntax: DataFrame.aggregate(func, axis=0, *args, **kwargs)
#func : callable, string, dictionary, or list of string/callables. Function to use for aggregating the data.

In [None]:
# importing pandas package
import pandas as pd

# making data frame from csv file
df = pd.read_csv('/content/nba (1).csv')

# printing the first 10 rows of the dataframe
df[:10]

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
9,Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0


In [None]:
df.dropna(inplace=True)

In [None]:
# First, convert the height column into inches for easier comparison.
def height_in_float(height):
    feet, inches = map(int, height.split('-'))
    return feet+inches/12

df['Height_inches'] = df['Height'].apply(height_in_float)
tall_players = df[df['Height_inches'] > (6.6)]
tall_players

In [None]:

import re

def convert_height(height_str):
    """
    Converts a height string in the format 'X-Y' to a float value in the format X.Y.

    Args:
        height_str (str): The height string to be converted.

    Returns:
        float: The converted height value.
    """
    if isinstance(height_str, str):
        feet, inches =  height_str.split('-')
        return float(f"{feet}.{inches}")
    else:
        return height_str

In [None]:
convert_height('6-2')

6.2

In [None]:
# Convert height to numeric format
df['Height'] = df['Height'].apply(convert_height)

In [None]:
num_cols=df[['Number','Age','Weight','Salary','Height']]

In [None]:
# Applying aggregation across all the columns
num_cols.aggregate(['sum', 'min'])

Unnamed: 0,Number,Age,Weight,Salary,Height
sum,8079.0,12311.0,101236.0,2159837000.0,2978.33
min,0.0,19.0,161.0,30888.0,5.11


In [None]:
# We are going to find aggregation for these columns
df.aggregate({"Number":['sum', 'min'],
              "Age":['max', 'min'],
              "Weight":['min', 'sum'],
              "Height":['min','max'],
              "Salary":['sum']})

Unnamed: 0,Number,Age,Weight,Height,Salary
sum,8079.0,,101236.0,,2159837000.0
min,0.0,19.0,161.0,5.11,
max,,40.0,,7.3,


#Pandas DataFrame.mean() Examples

In [None]:
# importing pandas as pd
import pandas as pd

# Creating the dataframe
df = pd.DataFrame({"A":[12, 4, 5, 44, 1],
                "B":[5, 2, 54, 3, 2],
                "C":[20, 16, 7, 3, 8],
                "D":[14, 3, 17, 2, 6]})

# Print the dataframe
df

Unnamed: 0,A,B,C,D
0,12,5,20,14
1,4,2,16,3
2,5,54,7,17
3,44,3,3,2
4,1,2,8,6


In [None]:
df.mean(axis = 0)

Unnamed: 0,0
A,13.2
B,13.2
C,10.8
D,8.4


In [None]:
# importing pandas as pd
import pandas as pd

# Creating the dataframe
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
                "B":[7, 2, 54, 3, None],
                "C":[20, 16, 11, 3, 8],
                "D":[14, 3, None, 2, 6]})
print(df)
# skip the Na values while finding the mean
df.mean(axis = 1, skipna = True)

      A     B   C     D
0  12.0   7.0  20  14.0
1   4.0   2.0  16   3.0
2   5.0  54.0  11   NaN
3   NaN   3.0   3   2.0
4   1.0   NaN   8   6.0


Unnamed: 0,0
0,13.25
1,6.25
2,23.333333
3,2.666667
4,5.0


#Python | Pandas Series.mean()

In [None]:
# importing pandas as pd
import pandas as pd

# Creating the Series
sr = pd.Series([10, 25, 3, 25, 24, 6])

# Create the Index
index_ = ['Coca Cola', 'Sprite', 'Coke', 'Fanta', 'Dew', 'ThumbsUp']

# set the index
sr.index = index_

# Print the series
print(sr)

Coca Cola    10
Sprite       25
Coke          3
Fanta        25
Dew          24
ThumbsUp      6
dtype: int64


In [None]:
# return the mean
result = sr.mean()

# Print the result
print(result)

15.5


#Python | Pandas Series.value_counts()

In [None]:
# importing pandas as pd
import pandas as pd

# Creating the Series
sr = pd.Series(['New York', 'Chicago', 'Toronto', 'Lisbon', 'Rio', 'Chicago', 'Lisbon'])

# Print the series
print(sr)

0    New York
1     Chicago
2     Toronto
3      Lisbon
4         Rio
5     Chicago
6      Lisbon
dtype: object


In [None]:
# find the value counts
sr.value_counts()

Unnamed: 0,count
Chicago,2
Lisbon,2
New York,1
Toronto,1
Rio,1


**Python | Pandas Index.value_counts()**

In [None]:
sr.index.value_counts()

Unnamed: 0,count
0,1
1,1
2,1
3,1
4,1
5,1
6,1


In [None]:
# importing pandas as pd
import pandas as pd

# Creating the Series
sr = pd.Series(['New York', 'Chicago', 'New York', 'Lisbon', 'Rio'])

# Create the Index
index_ = ['City 1', 'City 2', 'City 1', 'City 4', 'City 5']

# set the index
sr.index = index_

# Print the series
print(sr)

City 1    New York
City 2     Chicago
City 1    New York
City 4      Lisbon
City 5         Rio
dtype: object


In [None]:
sr.index.value_counts()

Unnamed: 0,count
City 1,2
City 2,1
City 4,1
City 5,1


In [None]:
# importing pandas as pd
import pandas as pd

# Creating the index
idx = pd.Index(['Harry', 'Mike', 'Arther', 'Nick',
                'Harry', 'Arther'], name ='Student')

# Print the Index
print(idx)

Index(['Harry', 'Mike', 'Arther', 'Nick', 'Harry', 'Arther'], dtype='object', name='Student')


In [None]:
# find the count of unique values in the index
idx.value_counts()

Unnamed: 0_level_0,count
Student,Unnamed: 1_level_1
Harry,2
Arther,2
Mike,1
Nick,1


#Applying Lambda Functions to Pandas

Below are some methods and ways by which we can apply lambda functions to Pandas:

Dataframe.assign() on a Single Column

Dataframe.assign() on Multiple Columns

Dataframe.apply() on a Single Row

Dataframe.apply() on Multiple Rows

Lambda Function on Multiple Rows and Columns Simultaneously

**1.Dataframe.assign() on a Single Colum**

In this example, we will apply the lambda function Dataframe.assign() to a single column. The function is applied to the ‘Total_Marks’ column, and a new column ‘Percentage’ is formed with its help.

In [None]:
# importing pandas library
import pandas as pd

# creating and initializing a list
values= [['Rohan',455],['Elvish',250],['Deepak',495],
         ['Soni',400],['Radhika',350],['Vansh',450]]

# creating a pandas dataframe
df = pd.DataFrame(values,columns=['Name','Total_Marks'])

# Applying lambda function to find
# percentage of 'Total_Marks' column
# using df.assign()
df = df.assign(Percentage = lambda x: (x['Total_Marks'] /500 * 100))

# displaying the data frame
df

Unnamed: 0,Name,Total_Marks,Percentage
0,Rohan,455,91.0
1,Elvish,250,50.0
2,Deepak,495,99.0
3,Soni,400,80.0
4,Radhika,350,70.0
5,Vansh,450,90.0


**2.Dataframe.assign() on Multiple Columns**

In this example, we will apply the lambda function Dataframe.assign() to multiple columns. The lambda function is applied to 3 columns i.e., ‘Field_1’, ‘Field_2’, and ‘Field_3’.

In [None]:
# importing pandas library
import pandas as pd

# creating and initializing a nested list
values_list = [[15, 2.5, 100], [20, 4.5, 50], [25, 5.2, 80],
               [45, 5.8, 48], [40, 6.3, 70], [41, 6.4, 90],
               [51, 2.3, 111]]

# creating a pandas dataframe
df = pd.DataFrame(values_list, columns=['Field_1', 'Field_2', 'Field_3'])

# Applying lambda function to find
# the product of 3 columns using
# df.assign()
df = df.assign(Product=lambda x: (x['Field_1'] * x['Field_2'] * x['Field_3']))

# printing dataframe
df

Unnamed: 0,Field_1,Field_2,Field_3,Product
0,15,2.5,100,3750.0
1,20,4.5,50,4500.0
2,25,5.2,80,10400.0
3,45,5.8,48,12528.0
4,40,6.3,70,17640.0
5,41,6.4,90,23616.0
6,51,2.3,111,13020.3


**3.Dataframe.apply() on a Single Row**

In this example, we will apply the lambda function Dataframe.apply() to single row. The lambda function is applied to a row starting with ‘d’ and hence square all values corresponding to it.

In [None]:
# importing pandas and numpy libraries
import pandas as pd
import numpy as np

# creating and initializing a nested list
values_list = [[15, 2.5, 100], [20, 4.5, 50], [25, 5.2, 80],
               [45, 5.8, 48], [40, 6.3, 70], [41, 6.4, 90],
               [51, 2.3, 111]]

# creating a pandas dataframe
df = pd.DataFrame(values_list, columns=['Field_1', 'Field_2', 'Field_3'],
                  index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])


# Apply function numpy.square() to square
# the values of one row only i.e. row
# with index name 'd'
df = df.apply(lambda x: np.square(x) if x.name == 'd' else x, axis=1)


# printing dataframe
df

Unnamed: 0,Field_1,Field_2,Field_3
a,15.0,2.5,100.0
b,20.0,4.5,50.0
c,25.0,5.2,80.0
d,2025.0,33.64,2304.0
e,40.0,6.3,70.0
f,41.0,6.4,90.0
g,51.0,2.3,111.0


**4. Dataframe.apply() on Multiple Rows**

In this example, we will apply the lambda function to multiple rows using Dataframe.apply(). The lambda function is applied to 3 rows starting with ‘a’, ‘e’, and ‘g’.

In [None]:
# importing pandas and numpylibraries
import pandas as pd
import numpy as np

# creating and initializing a nested list
values_list = [[15, 2.5, 100], [20, 4.5, 50], [25, 5.2, 80],
               [45, 5.8, 48], [40, 6.3, 70], [41, 6.4, 90],
               [51, 2.3, 111]]

# creating a pandas dataframe
df = pd.DataFrame(values_list, columns=['Field_1', 'Field_2', 'Field_3'],
                  index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])


# Apply function numpy.square() to square
# the values of 3 rows only i.e. with row
# index name 'a', 'e' and 'g' only
df = df.apply(lambda x: np.square(x) if x.name in [
              'a', 'e', 'g'] else x, axis=1)

# printing dataframe
df

Unnamed: 0,Field_1,Field_2,Field_3
a,225.0,6.25,10000.0
b,20.0,4.5,50.0
c,25.0,5.2,80.0
d,45.0,5.8,48.0
e,1600.0,39.69,4900.0
f,41.0,6.4,90.0
g,2601.0,5.29,12321.0


**5. Lambda Function on Multiple Rows and Columns Simultaneously**

In this example, we will apply the lambda function simultaneously to multiple columns and rows using dataframe.assign() and dataframe.apply().

In [None]:
# importing pandas and numpylibraries
import pandas as pd
import numpy as np

# creating and initializing a nested list
values_list = [[1.5, 2.5, 10.0], [2.0, 4.5, 5.0], [2.5, 5.2, 8.0],
               [4.5, 5.8, 4.8], [4.0, 6.3, 70], [4.1, 6.4, 9.0],
               [5.1, 2.3, 11.1]]

# creating a pandas dataframe
df = pd.DataFrame(values_list, columns=['Field_1', 'Field_2', 'Field_3'],
                  index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])


# Apply function numpy.square() to square
# the values of 2 rows only i.e. with row
# index name 'b' and 'f' only
df = df.apply(lambda x: np.square(x) if x.name in ['b', 'f'] else x, axis=1)

# Applying lambda function to find product of 3 columns
# i.e 'Field_1', 'Field_2' and 'Field_3'
df = df.assign(Product=lambda x: (x['Field_1'] * x['Field_2'] * x['Field_3']))


# printing dataframe
df

Unnamed: 0,Field_1,Field_2,Field_3,Product
a,1.5,2.5,10.0,37.5
b,4.0,20.25,25.0,2025.0
c,2.5,5.2,8.0,104.0
d,4.5,5.8,4.8,125.28
e,4.0,6.3,70.0,1764.0
f,16.81,40.96,81.0,55771.5456
g,5.1,2.3,11.1,130.203
