Lesson Note: Vectorized Operations with apply(), map(), and Lambda Functions in Pandas
Overview
Vectorized operations in pandas allow for efficient data manipulation and transformation without the need for explicit loops. This lesson will cover how to use apply(), map(), and lambda functions to perform vectorized operations on DataFrames and Series. We will use a dataset from statsmodels to provide practical examples.

Learning Objectives
Understand the concept of vectorized operations in pandas.
Learn how to use apply() to perform operations on DataFrame rows or columns.
Utilize map() for element-wise transformations on Series.
Implement lambda functions for concise, anonymous function definitions.
Prerequisites
Basic understanding of Python and pandas.
Familiarity with DataFrame and Series structures in pandas.
Dataset
We will use the iris dataset from statsmodels. This dataset contains measurements for various flower species.

In [45]:
import pandas as pd
import seaborn as sns

# Load Iris dataset
iris = sns.load_dataset("iris")
iris.head()



Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Section 1: Using apply()
The apply() function is used to apply a function along an axis of the DataFrame (either rows or columns).

Example 1: Applying a function to each column

In [46]:
# Function to calculate the square of each value
def square(x):
    return x ** 2

# Apply the function to each column
squared_df = iris.iloc[:, :-1].apply(square)
squared_df.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,26.01,12.25,1.96,0.04
1,24.01,9.0,1.96,0.04
2,22.09,10.24,1.69,0.04
3,21.16,9.61,2.25,0.04
4,25.0,12.96,1.96,0.04


Section 2: Using map()
The map() function is used for element-wise operations on a Series.

Example: Mapping a dictionary to replace values

In [47]:
# Create a dictionary to map species names to numerical values
species_mapping = {'setosa': 0, 'versicolor': 1, 'virginica': 2}

# Map the species column to numerical values
iris['species'] = iris['species'].map(species_mapping)
iris.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


Section 3: Using Lambda Functions
Lambda functions provide a way to write small, anonymous functions inline.

Example 1: Using a lambda function with apply()

In [48]:
# Using lambda function to calculate the length of each species name
species_name_length = iris['species'].apply(lambda x: len(str(x)))
species_name_length.head()


0    1
1    1
2    1
3    1
4    1
Name: species, dtype: int64

Example 2: Using a lambda function with map()

In [49]:
# Using lambda function to convert species numbers back to strings
reverse_species_mapping = {0: 'setosa', 1: 'versicolor', 2: 'virginica'}
iris['species'] = iris['species'].map(lambda x: reverse_species_mapping[x])
iris.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Section 4: Combining apply(), map(), and Lambda Functions
Example: Standardizing numeric columns

In [50]:
# Function to standardize a series
standardize = lambda x: (x - x.mean()) / x.std()

# Apply the standardize function to each numeric column
standardized_df = iris.iloc[:, :-1].apply(standardize)
standardized_df.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,-0.897674,1.015602,-1.335752,-1.311052
1,-1.1392,-0.131539,-1.335752,-1.311052
2,-1.380727,0.327318,-1.392399,-1.311052
3,-1.50149,0.097889,-1.279104,-1.311052
4,-1.018437,1.24503,-1.335752,-1.311052


Example: Categorizing data based on conditions

In [51]:
# Lambda function to categorize sepal length
categorize_sepal_length = lambda x: 'short' if x < 5 else 'long'

# Apply the lambda function to the 'Sepal.Length' column
iris['Sepal.Length.Category'] = iris['sepal_length'].apply(categorize_sepal_length)
iris.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,Sepal.Length.Category
0,5.1,3.5,1.4,0.2,setosa,long
1,4.9,3.0,1.4,0.2,setosa,short
2,4.7,3.2,1.3,0.2,setosa,short
3,4.6,3.1,1.5,0.2,setosa,short
4,5.0,3.6,1.4,0.2,setosa,long
