# Applying Custom Functions in Pandas

### What Are Custom Function Applications in Pandas?

In many real-world datasets, raw data alone isn’t enough. We often need to **transform**, **categorize**, **normalize**, or **extract** information using our own logic. That’s where **custom functions** come in — and Pandas offers three core tools to apply them:

- `.apply()`
- `.map()`
- `.applymap()`

These allow us to apply **user-defined or lambda functions** across Series, rows, columns, or even entire DataFrames.

### `.apply()`

`.apply()` is the most flexible. It can be used on **Series or DataFrames**, allowing row-wise or column-wise operations.

**Example**

convert all names to string lengths:

In [1]:
import pandas as pd
df = pd.read_csv("data/train.csv")

df['NameLength'] = df['Name'].apply(len)
print(df['NameLength'].head())

0    23
1    51
2    22
3    44
4    24
Name: NameLength, dtype: int64


Or more complex:

In [2]:
def age_category(age):
    if pd.isnull(age):
        return 'Unknown'
    elif age < 18:
        return 'Child'
    elif age < 60:
        return 'Adult'
    else:
        return 'Senior'

df['AgeGroup'] = df['Age'].apply(age_category)
print(df[['Age', 'AgeGroup']].head())

    Age AgeGroup
0  22.0    Adult
1  38.0    Adult
2  26.0    Adult
3  35.0    Adult
4  35.0    Adult


We can also use lambda functions:

In [3]:
df['FareLevel'] = df['Fare'].apply(lambda x: 'High' if x > 100 else 'Low')
print(df[['Fare', 'FareLevel']].head())

      Fare FareLevel
0   7.2500       Low
1  71.2833       Low
2   7.9250       Low
3  53.1000       Low
4   8.0500       Low


### `.map()`

`.map()` works **only on Series** and is often used for **value mapping or replacing**.

In [4]:
# Replace male/female with 0/1
df['Sex_num'] = df['Sex'].map({'male': 0, 'female': 1})
print(df[['Sex', 'Sex_num']].head())

      Sex  Sex_num
0    male        0
1  female        1
2  female        1
3  female        1
4    male        0


It’s simpler than `.apply()` and faster for lookups or remapping.

### `.applymap()`

`.applymap()` is for **element-wise operations across the entire DataFrame**, usually with numeric data or transformations.

In [None]:
df_numeric = df.select_dtypes(include='number')
print(df_numeric.head())
df_squared = df_numeric.applymap(lambda x: x**2)
print(df_squared.head())

This is rarely used but powerful for applying a formula to every number in the dataset.

In AI/ML preprocessing, applying functions helps:

- Feature creation (age buckets, income levels, etc.)
- Label encoding
- Custom metrics (ratios, scores)
- Cleaning text and numerical inputs

### When and Why to Use `.apply()`, `.map()`, and `.applymap()`

These three functions allow us to **customize data behavior** that’s not possible with built-in Pandas methods alone. Here's how and when to use each:

### Use `.apply()` when:

- We need **row-wise or column-wise** logic
- Logic depends on multiple values in a row
- We want to apply complex or nested `if/else` logic
- We’re creating **new feature columns**

**Example**

Tag passengers with “Family” or “Solo” based on number of siblings/parents:

In [5]:
def family_status(row):
    return 'Family' if (row['SibSp'] + row['Parch']) > 0 else 'Solo'

df['FamilyStatus'] = df.apply(family_status, axis=1)
print(df[['SibSp', 'Parch', 'FamilyStatus']].head())

   SibSp  Parch FamilyStatus
0      1      0       Family
1      1      0       Family
2      0      0         Solo
3      1      0       Family
4      0      0         Solo


Note the `axis=1` tells it to apply **row-wise**.

### Use `.map()` when:

- We’re working with a **single column** (Series)
- We want to **replace values** based on a dictionary or function
- We don’t need access to multiple columns

**Example**

Convert gender to numeric:

In [6]:
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
print(df['Sex'].head())

0    0
1    1
2    1
3    1
4    0
Name: Sex, dtype: int64


We can also use lambdas:

In [7]:
df['FareLevel'] = df['Fare'].map(lambda x: 'High' if x > 100 else 'Low')
print(df[['Fare', 'FareLevel']].head())

      Fare FareLevel
0   7.2500       Low
1  71.2833       Low
2   7.9250       Low
3  53.1000       Low
4   8.0500       Low


### Use `.applymap()` when:

- We want to apply a **function to every element** of a DataFrame (not row/column)
- We're working with **numeric data**
- We need to normalize, round, or scale every value

**Example**

In [None]:
df_numeric = df.select_dtypes(include='number')
print(df_numeric.head())
df_scaled = df_numeric.applymap(lambda x: round(x / 10, 2))
print(df_scaled.head())

In AI/ML pipelines, these functions are a **core part of feature engineering** — converting messy or inconsistent raw values into machine-readable, informative features that drive better model performance.

### Exercises

Q1. Convert 'Sex' column to numeric using `.map()`

In [8]:
df['Sex_num'] = df['Sex'].map({'male': 0, 'female': 1})
print(df[['Sex', 'Sex_num']].head())

   Sex  Sex_num
0    0      NaN
1    1      NaN
2    1      NaN
3    1      NaN
4    0      NaN


Q2. Create an 'AgeGroup' column using `.apply()`

In [9]:
df['AgeGroup'] = df['Age'].apply(lambda x: 'Child' if x < 18 else 'Adult' if x < 60 else 'Senior')
print(df[['Age', 'AgeGroup']].head())

    Age AgeGroup
0  22.0    Adult
1  38.0    Adult
2  26.0    Adult
3  35.0    Adult
4  35.0    Adult


Q3. Apply `.applymap()` to square all numeric values

In [None]:
df_numeric = df.select_dtypes(include='number')
print(df_numeric.head())
df_squared = df_numeric.applymap(lambda x: x**2)
print(df_squared.head())

Q4. Use `.apply()` to calculate family size and label as 'Small' or 'Large'

In [10]:
df['FamilySize'] = df.apply(lambda row: row['SibSp'] + row['Parch'], axis=1)
df['FamilyLabel'] = df['FamilySize'].apply(lambda x: 'Large' if x > 2 else 'Small')
print(df[['FamilySize', 'FamilyLabel']].head())

   FamilySize FamilyLabel
0           1       Small
1           1       Small
2           0       Small
3           1       Small
4           0       Small


### Summary

In this lesson, we unlocked one of the most powerful tools in a data scientist’s toolkit: **custom function application**. The trio of `.apply()`, `.map()`, and `.applymap()` gives us **full control** over our DataFrame's transformation logic — turning raw, untidy, or complex data into clean, meaningful features ready for modeling.

- Use `.apply()` when we need **flexible logic** over rows or columns — it’s our go-to for feature engineering that involves more than one variable (e.g., total family size, conditional categorization).
- Use `.map()` when working on **a single column transformation** or simple value replacement. It's the fastest and cleanest for categorical remapping (e.g., "male" to 0, "female" to 1).
- Use `.applymap()` when applying **a mathematical or text transformation to every element** in a numeric or object DataFrame (though this is less common in feature engineering).

These tools not only make data more **useful** and **machine-readable**, but also let us embed **domain-specific knowledge** directly into the dataset. This can result in massive improvements in model performance, especially in projects where real-world logic matters (e.g., fraud detection, customer segmentation, survival prediction like Titanic).

Without mastering these techniques, we're limited to basic summaries. But with them, We can create **new features**, **fix inconsistencies**, and **capture nuance** — the key ingredients of winning AI/ML models.