In [1]:
import pandas as pd
import numpy as np

# Set the seed for NumPy's random number generator
np.random.seed(42)

# <a id='toc1_'></a>[Bulk Mapping with Pandas](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [Bulk Mapping with Pandas](#toc1_)    
  - [map() method](#toc1_1_)    
    - [Examples:](#toc1_1_1_)    
    - [Use Case:](#toc1_1_2_)    
    - [Limitations:](#toc1_1_3_)    
  - [apply() method](#toc1_2_)    
      - [Examples:](#toc1_2_1_1_)    
      - [Using apply() for custom aggregation](#toc1_2_1_2_)    
  - [replace() method](#toc1_3_)    
    - [Example: using list of values to be replaced](#toc1_3_1_)    
    - [Example: using mapping dictionary for replacements](#toc1_3_2_)    
  - [rename() method](#toc1_4_)    
    - [How to Use `rename()`](#toc1_4_1_)    
    - [Key Parameters](#toc1_4_2_)    
    - [Example: Renaming Columns Using a Dictionary](#toc1_4_3_)    
    - [Example: Renaming Index with a Function](#toc1_4_4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

Pandas provides several methods for efficient bulk mapping, including map(), apply() and vectorized operations.

## <a id='toc1_1_'></a>[map() method](#toc0_)

**Note that in [version 2.1.0](https://pandas.pydata.org/docs/whatsnew/v2.1.0.html#new-dataframe-map-method-and-support-for-extensionarrays)  DataFrame.applymap was deprecated and renamed to DataFrame.map.**

The [map()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.map.html) method is used to map values of a DataFrame from one value to another.

This method applies a function that accepts and returns a scalar to every element of a DataFrame. This function can be a built-in function, a user-defined function, or even a lambda function.

### <a id='toc1_1_1_'></a>[Examples:](#toc0_)

In [2]:
# Example DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
})
df

Unnamed: 0,A,B
0,1,4
1,2,5
2,3,6


In [3]:
# Function to be applied to each element
def multiply_by_two(x):
    return x * 2

# Applying the function to each element of the DataFrame
mapped_df = df.map(multiply_by_two)
mapped_df

Unnamed: 0,A,B
0,2,8
1,4,10
2,6,12


In [4]:
# you can use a lambda function with map:
mapped_df = df.map(lambda x:x*2)
mapped_df

Unnamed: 0,A,B
0,2,8
1,4,10
2,6,12


### <a id='toc1_1_2_'></a>[Use Case:](#toc0_)

map() is ideal for performing a specific operation that affects each element individually, without consideration for row or column context. Common uses include formatting numbers, converting data types, or applying a mathematical transformation to every element.

### <a id='toc1_1_3_'></a>[Limitations:](#toc0_)

Cannot be used for operations that need to consider the entire row or column, such as aggregations or operations that depend on a specific axis.

## <a id='toc1_2_'></a>[apply() method](#toc0_)

The [apply()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html)  method can be used to apply a function along an axis of the DataFrame (axis=0 for columns, and axis=1 for rows).  This function can be a built-in function, a user-defined function, or even a lambda function.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1)

This method is more flexible than map() and can be used for operations that affect entire rows or columns, rather than individual elements.

#### <a id='toc1_2_1_1_'></a>[Examples:](#toc0_)

In [5]:
### Applying a Function to Each Column

# Sample DataFrame
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 3)), columns=['A', 'B', 'C'])
df

Unnamed: 0,A,B,C
0,51,92,14
1,71,60,20
2,82,86,74
3,74,87,99
4,23,2,21


In [6]:
# apply simple function to calculate range of each column:
range_df = df.apply(lambda x: x.max()-x.min())
range_df

A    59
B    90
C    85
dtype: int64

In [7]:
### Applying a Function to Each Row

# Using lambda to calculate a custom metric across columns for each row
df['custom_metric'] = df.apply(lambda row: (row['A'] + row['B']) / row['C'], axis=1)
df

Unnamed: 0,A,B,C,custom_metric
0,51,92,14,10.214286
1,71,60,20,6.55
2,82,86,74,2.27027
3,74,87,99,1.626263
4,23,2,21,1.190476


#### <a id='toc1_2_1_2_'></a>[Using apply() for custom aggregation](#toc0_)

Applying a function to aggregate over a series of elements in a pandas DataFrame can be achieved with the apply() method. This approach is particularly useful for performing custom aggregations that aren't directly supported by built-in pandas methods.

Let's consider next example: suppose we have a DataFrame representing scores in three different subjects for a group of students, and we want to calculate a custom aggregate metric for each student that takes into account their scores across all subjects.

In [8]:
# Create a sample DataFrame with student scores
data = {
    'Math': np.random.randint(50, 100, size=5),
    'Science': np.random.randint(50, 100, size=5),
    'English': np.random.randint(50, 100, size=5)
}
students_df = pd.DataFrame(data)

# add 'Student' column with student names values
student_names = ['Student A', 'Student B', 'Student C', 'Student D', 'Student E']
students_df.insert(loc=0, column='Student', value=student_names)

students_df

Unnamed: 0,Student,Math,Science,English
0,Student A,51,51,93
1,Student B,73,70,74
2,Student C,93,82,98
3,Student D,79,61,76
4,Student E,87,71,91


In [9]:
# Apply the function across rows (axis=1) to calculate the consistency metric for each student
students_df['Consistency'] = (
    students_df[['Math', 'Science', 'English']]
    .apply(lambda row: row.max() - row.min(), axis=1)
)

# Display the DataFrame with the added consistency metric
students_df

Unnamed: 0,Student,Math,Science,English,Consistency
0,Student A,51,51,93,42
1,Student B,73,70,74,4
2,Student C,93,82,98,16
3,Student D,79,61,76,18
4,Student E,87,71,91,20


## <a id='toc1_3_'></a>[replace() method](#toc0_)

The [replace()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html) method in pandas is another tool that allows for replacing values in a DataFrame or Series based on some criteria. 

While not specifically designed for applying custom functions across elements like map() or apply(), replace() can be used to perform transformations by mapping specific values to their replacements. 

It is very versatile and can handle single values, lists of values, or even dictionaries of replacements, making it quite powerful for data cleaning tasks.

replace() is particularly useful for replacing missing values or for standardized replacements of categorical data. For instance, replacing various versions of "Yes" (e.g., "yes", "YES", "Y") with a standardized "Yes" across your dataset.

### <a id='toc1_3_1_'></a>[Example: using list of values to be replaced](#toc0_)

In [10]:
# Sample DataFrame
data = {
    'Response': ['yes', 'No', 'Y', 'n', 'YES', 'no', 'Yes', 'N'],
}
df = pd.DataFrame(data)
df

Unnamed: 0,Response
0,yes
1,No
2,Y
3,n
4,YES
5,no
6,Yes
7,N


In [11]:
# Define variations of "Yes" that you want to standardize
variations_of_yes = ['yes', 'YES', 'Y', 'y']
variations_of_no = ['no', 'NO', 'n', 'N']

df = df.replace(variations_of_yes, 'Yes')
df = df.replace(variations_of_no, 'No')
df

Unnamed: 0,Response
0,Yes
1,No
2,Yes
3,No
4,Yes
5,No
6,Yes
7,No


### <a id='toc1_3_2_'></a>[Example: using mapping dictionary for replacements](#toc0_)

Suppose you have a DataFrame containing ages of individuals and you want to categorize these ages into predefined age groups. Using replace() with a dictionary mapping can efficiently achieve this conversion.

In [12]:
# Sample DataFrame with ages
data = {
    'Name': ['Ivan', 'Maria', 'Georgi', 'Sofia', 'Petar'],
    'Age': [23, 37, 12, 45, 67]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Ivan,23
1,Maria,37
2,Georgi,12
3,Sofia,45
4,Petar,67


In [13]:

# Define a dictionary for replacing ages with age groups
age_groups = {
    12: 'Child',
    23: 'Young Adult',
    37: 'Adult',
    45: 'Middle Aged',
    67: 'Senior'
}

# Replace ages with corresponding age groups
df['Age Group'] = df['Age'].replace(age_groups)
df

Unnamed: 0,Name,Age,Age Group
0,Ivan,23,Young Adult
1,Maria,37,Adult
2,Georgi,12,Child
3,Sofia,45,Middle Aged
4,Petar,67,Senior


## <a id='toc1_4_'></a>[rename() method](#toc0_)

The [rename()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html) method in pandas is used to alter labels of the indices (index labels) and columns of a DataFrame. 

It provides a flexible way to change the names of your DataFrame's axes (rows and columns) without altering the data itself. 

This method is particularly useful for making your DataFrame labels more informative or for correcting any mistakes in the initial labeling.

### <a id='toc1_4_1_'></a>[How to Use `rename()`](#toc0_)

`rename()` can be applied through various parameters to achieve the desired renaming:

- **mapper**: Accepts a dictionary or a function. When a dictionary is provided, it maps existing labels to new ones. If a function is given, it transforms the labels according to its logic.
- **index** and **columns**: These parameters allow for specifying dictionaries that map old labels to new ones for the index and columns, respectively. They offer a more targeted approach than using `mapper`.

### <a id='toc1_4_2_'></a>[Key Parameters](#toc0_)

- **axis**: Determines whether the renaming applies to the index (`axis=0`) or columns (`axis=1`). If `mapper` is used without specifying `axis`, it attempts to rename both axes.
- **copy**: By default, `rename()` returns a modified copy of the DataFrame. If `copy=False`, the original DataFrame is modified directly.
- **inplace**: When set to `True`, changes the DataFrame in place, returning `None`. The default `False` returns a new DataFrame with the changes.
- **level**: In the case of a MultiIndex, specifies the level that should be renamed.

### <a id='toc1_4_3_'></a>[Example: Renaming Columns Using a Dictionary](#toc0_)

In [14]:
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# Using a dictionary to rename columns
df_renamed = df.rename(columns={'A': 'Alpha', 'B': 'Beta'})
df_renamed

Unnamed: 0,Alpha,Beta
0,1,4
1,2,5
2,3,6


### <a id='toc1_4_4_'></a>[Example: Renaming Index with a Function](#toc0_)

In [15]:
df_renamed = df.rename(index=lambda x: 'row'+str(x+1))
df_renamed

Unnamed: 0,A,B
row1,1,4
row2,2,5
row3,3,6
