## Introduction to pandas actions

Pandas is a powerful open-source data analysis and manipulation library for Python. It provides data structures like DataFrames (2-dimensional labeled data structures with columns of potentially different types) and Series (1-dimensional labeled array) that are essential for working with structured data.

One of the key strengths of Pandas lies in its rich set of **"Actions"** or methods. These are functions built into DataFrame and Series objects that allow you to perform various operations directly on your data, such as cleaning, transforming, aggregating, and analyzing. These methods are highly optimized and often much more efficient than writing equivalent operations using standard Python loops.

Using these built-in methods makes your code more concise, readable, and performant. Instead of iterating through rows or columns manually, you can often achieve the desired result with a single method call.

Some common and powerful Pandas actions include:

*   **`apply()`**: Apply a function along an axis of the DataFrame or Series.
*   **`assign()`**: Assign new columns to a DataFrame.
*   **`groupby()`**: Group rows based on column values for aggregation.
*   **`agg()`**: Perform aggregation operations on grouped data.
*   **`sort_values()`**: Sort DataFrame by column values.
*   **`fillna()`**: Fill missing values.

In the following sections, we will explore how to use Lambda functions in conjunction with some of these powerful Pandas actions to perform flexible and efficient data manipulations.

# Using Lambda Functions with the `apply` Method in Pandas

The `apply()` method in Pandas is a versatile tool that allows you to apply a function along an axis of a DataFrame or on a Series. When combined with lambda functions, `apply()` becomes incredibly powerful for performing custom, element-wise or row/column-wise operations without writing explicit loops.

**Applying to a Series:**

When `apply()` is used on a Series, the function (which can be a lambda function) is applied to each element of the Series by default. This is useful for transforming individual values in a column.

**Applying to a DataFrame:**

When `apply()` is used on a DataFrame, the function is applied to each column (by default, `axis=0`) or each row (`axis=1`). When applying along an axis, the function receives a Series (either a column or a row) as its input. This allows for more complex operations that involve multiple values within a row or column.

In [None]:
import pandas as pd

s = pd.Series([1, 2, 3, 4, 5])

print("Original Series:")
display(s)

s_squared = s.apply(lambda x: x**2)


print("\nSeries after applying lambda function (squaring):")
display(s_squared)

Original Series:


Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5



Series after applying lambda function (squaring):


Unnamed: 0,0
0,1
1,4
2,9
3,16
4,25


In [None]:
s_strings = pd.Series(['apple', 'banana', 'cherry'])
print("\nOriginal String Series:")
display(s_strings)

s_uppercase = s_strings.apply(lambda x: x.upper())
print("\nString Series after applying lambda function (uppercase):")
display(s_uppercase)


Original String Series:


Unnamed: 0,0
0,apple
1,banana
2,cherry



String Series after applying lambda function (uppercase):


Unnamed: 0,0
0,APPLE
1,BANANA
2,CHERRY


In [None]:
data = {'col1': [10, 20, 30, 40, 50],
        'col2': ['A', 'B', 'C', 'D', 'E'],
        'col3': [100, 200, 150, 250, 300]}

df = pd.DataFrame(data)

print("Original DataFrame:")
display(df)


df['col1_transformed'] = df['col1'].apply(lambda x: x + 5)
df['col2_formatted'] = df['col2'].apply(lambda x: f"ID_{x}")

print("\nDataFrame after applying lambda functions to columns:")
display(df)

Original DataFrame:


Unnamed: 0,col1,col2,col3
0,10,A,100
1,20,B,200
2,30,C,150
3,40,D,250
4,50,E,300



DataFrame after applying lambda functions to columns:


Unnamed: 0,col1,col2,col3,col1_transformed,col2_formatted
0,10,A,100,15,ID_A
1,20,B,200,25,ID_B
2,30,C,150,35,ID_C
3,40,D,250,45,ID_D
4,50,E,300,55,ID_E


## Lambda functions with pandas `assign`

The `assign()` method in Pandas is a convenient way to create new columns for a DataFrame. It returns a *new* DataFrame with the new columns added, rather than modifying the original DataFrame in place (though you can reassign the result back to the original variable).

The `assign()` method is particularly useful with lambda functions because it allows you to define new columns based on the DataFrame's existing columns in a clean and readable way. The lambda function passed to `assign` takes the entire DataFrame as its argument, making it easy to reference other columns.

The basic syntax is `df.assign(new_col_name=lambda x: expression_using_x)`. Here, `x` represents the DataFrame itself within the lambda function's scope. You can assign multiple new columns by providing multiple keyword arguments to `assign`.

In [None]:
display(df)


df_assigned = df.assign(

    product_col1_col3 = lambda x: x['col1'] * x['col3'],
    col1_is_large = lambda x: x['col1'] > 30

)

# Display the DataFrame with the newly created columns
print("DataFrame after using assign with lambda functions:")
display(df_assigned)

Unnamed: 0,col1,col2,col3,col1_transformed,col2_formatted
0,10,A,100,15,ID_A
1,20,B,200,25,ID_B
2,30,C,150,35,ID_C
3,40,D,250,45,ID_D
4,50,E,300,55,ID_E


DataFrame after using assign with lambda functions:


Unnamed: 0,col1,col2,col3,col1_transformed,col2_formatted,product_col1_col3,col1_is_large
0,10,A,100,15,ID_A,1000,False
1,20,B,200,25,ID_B,4000,False
2,30,C,150,35,ID_C,4500,False
3,40,D,250,45,ID_D,10000,True
4,50,E,300,55,ID_E,15000,True


## demonstrate Groupby

In [None]:
import pandas as pd

# 1. Create a sample DataFrame
data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'C', 'B', 'C'],
    'Value': [10, 15, 20, 25, 30, 35, 40, 45],
    'City': ['New York', 'London', 'New York', 'Paris', 'London', 'New York', 'Paris', 'London']
}
df_groupby = pd.DataFrame(data)

# 2. Print the original DataFrame
print("Original DataFrame:")
display(df_groupby)

# 3. Group by 'Category' and calculate the sum of 'Value'
print("\nGrouping by 'Category' and summing 'Value':")


category_sum = df_groupby.groupby('Category')['Value'].sum()
display(category_sum)

# 4. Group by multiple columns ('Category' and 'City') and calculate the mean of 'Value'

print("\nGrouping by 'Category' and 'City' and calculating the mean of 'Value':")

category_city_mean = df_groupby.groupby(['Category', 'City'])['Value'].mean()
display(category_city_mean)

Original DataFrame:


Unnamed: 0,Category,Value,City
0,A,10,New York
1,B,15,London
2,A,20,New York
3,B,25,Paris
4,A,30,London
5,C,35,New York
6,B,40,Paris
7,C,45,London



Grouping by 'Category' and summing 'Value':


Unnamed: 0_level_0,Value
Category,Unnamed: 1_level_1
A,60
B,80
C,80



Grouping by 'Category' and 'City' and calculating the mean of 'Value':


Unnamed: 0_level_0,Unnamed: 1_level_0,Value
Category,City,Unnamed: 2_level_1
A,London,30.0
A,New York,15.0
B,London,15.0
B,Paris,32.5
C,London,45.0
C,New York,35.0


## Demonstrate Aggregation

### Subtask:
Add a code cell to demonstrate `agg()` with both built-in aggregation functions and a lambda function for a custom aggregation.


In [None]:
import pandas as pd

# Group by 'Category'
display(df_groupby)


grouped_data = df_groupby.groupby('Category')

# 1. Apply agg() with built-in aggregation functions (mean and sum)
print("\nAggregating 'Value' by 'Category' using built-in functions (mean, sum):")
category_agg_builtin = grouped_data['Value'].agg(['mean', 'sum'])
display(category_agg_builtin)





# 2. Apply agg() with a lambda function for a custom aggregation (range: max - min)
print("\nAggregating 'Value' by 'Category' using a custom lambda function (range):")
category_agg_custom_lambda = grouped_data['Value'].agg(lambda x: x.max() - x.min())
category_agg_custom_lambda.name = 'Value_range' # Rename the series for clarity
display(category_agg_custom_lambda)




# 3. Demonstrate multiple custom aggregations using a dictionary
print("\nAggregating 'Value' by 'Category' using a dictionary with built-in and custom aggregations:")
category_multiple_agg = grouped_data['Value'].agg(
    mean_value='mean', # Built-in aggregation with custom name
    sum_value='sum',   # Built-in aggregation with custom name
    range_value=lambda x: x.max() - x.min(), # Custom aggregation with custom name
    count_elements='count' # Another built-in aggregation
)
display(category_multiple_agg)

Unnamed: 0,Category,Value,City
0,A,10,New York
1,B,15,London
2,A,20,New York
3,B,25,Paris
4,A,30,London
5,C,35,New York
6,B,40,Paris
7,C,45,London



Aggregating 'Value' by 'Category' using built-in functions (mean, sum):


Unnamed: 0_level_0,mean,sum
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
A,20.0,60
B,26.666667,80
C,40.0,80



Aggregating 'Value' by 'Category' using a custom lambda function (range):


Unnamed: 0_level_0,Value_range
Category,Unnamed: 1_level_1
A,20
B,25
C,10



Aggregating 'Value' by 'Category' using a dictionary with built-in and custom aggregations:


Unnamed: 0_level_0,mean_value,sum_value,range_value,count_elements
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,20.0,60,20,3
B,26.666667,80,25,3
C,40.0,80,10,2


## Demonstrate Sort Values

### Subtask:
Add a code cell to demonstrate `sort_values()` on a DataFrame, including an example with a custom sorting logic using a lambda function if applicable.


In [None]:
import pandas as pd

# 1. Create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
    'Score': [85, 92, 78, 95, 88, 75],
    'City': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney', 'Berlin']
}
df_sort = pd.DataFrame(data)

# 2. Print the original DataFrame
print("Original DataFrame:")
display(df_sort)

# 3. Create a new temporary column 'Name_Length' using a lambda function
df_sort['Name_Length'] = df_sort['Name'].apply(lambda x: len(x))
display(df_sort)


# 4. Sort the DataFrame by 'Name_Length' in ascending order
df_sorted_by_length = df_sort.sort_values(by='Name_Length', ascending=True)


# 5. Print the DataFrame sorted by the custom logic (name length)
print("\nDataFrame sorted by Name Length:")
display(df_sorted_by_length)

# 6. Optionally, drop the temporary 'Name_Length' column
df_sorted_by_length_cleaned = df_sorted_by_length.drop(columns=['Name_Length'])
print("\nDataFrame sorted by Name Length (without temporary column):")
display(df_sorted_by_length_cleaned)


Original DataFrame:


Unnamed: 0,Name,Score,City
0,Alice,85,New York
1,Bob,92,London
2,Charlie,78,Paris
3,David,95,Tokyo
4,Eve,88,Sydney
5,Frank,75,Berlin


Unnamed: 0,Name,Score,City,Name_Length
0,Alice,85,New York,5
1,Bob,92,London,3
2,Charlie,78,Paris,7
3,David,95,Tokyo,5
4,Eve,88,Sydney,3
5,Frank,75,Berlin,5



DataFrame sorted by Name Length:


Unnamed: 0,Name,Score,City,Name_Length
1,Bob,92,London,3
4,Eve,88,Sydney,3
3,David,95,Tokyo,5
0,Alice,85,New York,5
5,Frank,75,Berlin,5
2,Charlie,78,Paris,7



DataFrame sorted by Name Length (without temporary column):


Unnamed: 0,Name,Score,City
1,Bob,92,London
4,Eve,88,Sydney
3,David,95,Tokyo
0,Alice,85,New York
5,Frank,75,Berlin
2,Charlie,78,Paris


## Demonstrate Fillna

### Subtask:
Add a code cell to create a DataFrame with missing values and demonstrate `fillna()` using a lambda function to fill NaNs, for instance, based on the mean of a group or a value from another column.


In [None]:
import pandas as pd
import numpy as np

# DataFrame
data = {
    'Category': ['A','B','A','B','A','C','B','C'],
    'Value1': [10,15,np.nan,25,30,np.nan,40,45],
    'Value2': [100,np.nan,120,130,np.nan,150,160,170],
    'Aux_Value': [5,10,7,12,8,15,10,18]
}
df = pd.DataFrame(data)

print("Original:")
display(df)

# ------------------------------------------------------
# 1. Fill NaN for MULTIPLE columns using a dictionary
# ------------------------------------------------------

df_multiple = df.fillna({
    'Value1': df['Value1'].mean(),
    'Value2': df['Value2'].mean()
})

print("\nFill specific columns using a dictionary:")
display(df_multiple)



# ------------------------------------------------------
# 2. Fill EVERYTHING with a constant (global fill)
# ------------------------------------------------------

df_zero = df.fillna(0)

print("\nGlobal fillna(0):")
display(df_zero)

Original:


Unnamed: 0,Category,Value1,Value2,Aux_Value
0,A,10.0,100.0,5
1,B,15.0,,10
2,A,,120.0,7
3,B,25.0,130.0,12
4,A,30.0,,8
5,C,,150.0,15
6,B,40.0,160.0,10
7,C,45.0,170.0,18



Fill specific columns using a dictionary:


Unnamed: 0,Category,Value1,Value2,Aux_Value
0,A,10.0,100.0,5
1,B,15.0,138.333333,10
2,A,27.5,120.0,7
3,B,25.0,130.0,12
4,A,30.0,138.333333,8
5,C,27.5,150.0,15
6,B,40.0,160.0,10
7,C,45.0,170.0,18



Global fillna(0):


Unnamed: 0,Category,Value1,Value2,Aux_Value
0,A,10.0,100.0,5
1,B,15.0,0.0,10
2,A,0.0,120.0,7
3,B,25.0,130.0,12
4,A,30.0,0.0,8
5,C,0.0,150.0,15
6,B,40.0,160.0,10
7,C,45.0,170.0,18
