In [12]:
import pandas as pd
import numpy as np

# Set the seed for NumPy's random number generator
np.random.seed(42)

# Bulk Mapping with Pandas

Pandas provides several methods for efficient bulk mapping, including map(), apply() and vectorized operations.

## map() method

**Note that in [version 2.1.0](https://pandas.pydata.org/docs/whatsnew/v2.1.0.html#new-dataframe-map-method-and-support-for-extensionarrays)  DataFrame.applymap was deprecated and renamed to DataFrame.map.**

The [map()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.map.html) method is used to map values of a DataFrame from one value to another.

This method applies a function that accepts and returns a scalar to every element of a DataFrame. This function can be a built-in function, a user-defined function, or even a lambda function.

### Examples:

In [3]:
# Example DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
})
df

Unnamed: 0,A,B
0,1,4
1,2,5
2,3,6


In [4]:
# Function to be applied to each element
def multiply_by_two(x):
    return x * 2

# Applying the function to each element of the DataFrame
mapped_df = df.map(multiply_by_two)
mapped_df

Unnamed: 0,A,B
0,2,8
1,4,10
2,6,12


In [5]:
# you can use a lambda function with map:
mapped_df = df.map(lambda x:x*2)
mapped_df

Unnamed: 0,A,B
0,2,8
1,4,10
2,6,12


### Use Case: 

map() is ideal for performing a specific operation that affects each element individually, without consideration for row or column context. Common uses include formatting numbers, converting data types, or applying a mathematical transformation to every element.

### Limitations: 

Cannot be used for operations that need to consider the entire row or column, such as aggregations or operations that depend on a specific axis.

## apply() method

The [apply()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html)  method can be used to apply a function along an axis of the DataFrame (axis=0 for columns, and axis=1 for rows).  This function can be a built-in function, a user-defined function, or even a lambda function.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1)

This method is more flexible than map() and can be used for operations that affect entire rows or columns, rather than individual elements.

#### Examples:

In [14]:
### Applying a Function to Each Column

# Sample DataFrame
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 3)), columns=['A', 'B', 'C'])
df

Unnamed: 0,A,B,C
0,52,1,87
1,29,37,1
2,63,59,20
3,32,75,57
4,21,88,48


In [17]:
# apply simple function to calculate range of each column:
range_df = df.apply(lambda x: x.max()-x.min())
range_df

A    42
B    87
C    86
dtype: int64

In [19]:
### Applying a Function to Each Row

# Using lambda to calculate a custom metric across columns for each row
df['custom_metric'] = df.apply(lambda row: (row['A'] + row['B']) / row['C'], axis=1)
df

Unnamed: 0,A,B,C,custom_metric
0,52,1,87,0.609195
1,29,37,1,66.0
2,63,59,20,6.1
3,32,75,57,1.877193
4,21,88,48,2.270833


#### Using apply() for custom aggregation

Applying a function to aggregate over a series of elements in a pandas DataFrame can be achieved with the apply() method. This approach is particularly useful for performing custom aggregations that aren't directly supported by built-in pandas methods.

Let's consider next example: suppose we have a DataFrame representing scores in three different subjects for a group of students, and we want to calculate a custom aggregate metric for each student that takes into account their scores across all subjects.

In [21]:
# Create a sample DataFrame with student scores
data = {
    'Math': np.random.randint(50, 100, size=5),
    'Science': np.random.randint(50, 100, size=5),
    'English': np.random.randint(50, 100, size=5)
}
students_df = pd.DataFrame(data)

# add 'Student' column with student names values
student_names = ['Student A', 'Student B', 'Student C', 'Student D', 'Student E']
students_df.insert(loc=0, column='Student', value=student_names)

students_df

Unnamed: 0,Student,Math,Science,English
0,Student A,74,51,93
1,Student B,63,69,57
2,Student C,99,77,96
3,Student D,58,96,84
4,Student E,75,56,63


In [22]:
# Apply the function across rows (axis=1) to calculate the consistency metric for each student
students_df['Consistency'] = (
    students_df[['Math', 'Science', 'English']]
    .apply(lambda row: row.max() - row.min(), axis=1)
)

# Display the DataFrame with the added consistency metric
students_df

Unnamed: 0,Student,Math,Science,English,Consistency
0,Student A,74,51,93,42
1,Student B,63,69,57,12
2,Student C,99,77,96,22
3,Student D,58,96,84,38
4,Student E,75,56,63,19


## replace() method

The [replace()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html) method in pandas is another tool that allows for replacing values in a DataFrame or Series based on some criteria. 

While not specifically designed for applying custom functions across elements like map() or apply(), replace() can be used to perform transformations by mapping specific values to their replacements. 

It is very versatile and can handle single values, lists of values, or even dictionaries of replacements, making it quite powerful for data cleaning tasks.

replace() is particularly useful for replacing missing values or for standardized replacements of categorical data. For instance, replacing various versions of "Yes" (e.g., "yes", "YES", "Y") with a standardized "Yes" across your dataset.

### Example: using list of values to be replaced

In [30]:
# Sample DataFrame
data = {
    'Response': ['yes', 'No', 'Y', 'n', 'YES', 'no', 'Yes', 'N'],
}
df = pd.DataFrame(data)
df

Unnamed: 0,Response
0,yes
1,No
2,Y
3,n
4,YES
5,no
6,Yes
7,N


In [33]:
# Define variations of "Yes" that you want to standardize
variations_of_yes = ['yes', 'YES', 'Y', 'y']
variations_of_no = ['no', 'NO', 'n', 'N']

df = df.replace(variations_of_yes, 'Yes')
df = df.replace(variations_of_no, 'No')
df

Unnamed: 0,Response
0,Yes
1,No
2,Yes
3,No
4,Yes
5,No
6,Yes
7,No


### Example: using mapping dictionary for replacements

Suppose you have a DataFrame containing ages of individuals and you want to categorize these ages into predefined age groups. Using replace() with a dictionary mapping can efficiently achieve this conversion.

In [35]:
# Sample DataFrame with ages
data = {
    'Name': ['Ivan', 'Maria', 'Georgi', 'Sofia', 'Petar'],
    'Age': [23, 37, 12, 45, 67]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Ivan,23
1,Maria,37
2,Georgi,12
3,Sofia,45
4,Petar,67


In [37]:

# Define a dictionary for replacing ages with age groups
age_groups = {
    12: 'Child',
    23: 'Young Adult',
    37: 'Adult',
    45: 'Middle Aged',
    67: 'Senior'
}

# Replace ages with corresponding age groups
df['Age Group'] = df['Age'].replace(age_groups)
df

Unnamed: 0,Name,Age,Age Group
0,Ivan,23,Young Adult
1,Maria,37,Adult
2,Georgi,12,Child
3,Sofia,45,Middle Aged
4,Petar,67,Senior
