# 6. Filtering Groups

Filtering groups allows you to retain or drop groups based on conditions applied to group-level computations. This technique is especially useful for narrowing down data to relevant groups.

## Using `.filter()` to Retain Only Specific Groups Based on a Condition
The `.filter()` method enables you to evaluate each group and decide whether to retain or exclude it based on a condition.



In [None]:
import pandas as pd
# Load the COVID-19 dataset
data_path = '../DataSets/Data_COVID19_Indonesia.csv'
covid_data = pd.read_csv(data_path)

# Retain groups with mean Total Cases above a threshold
filtered_groups = covid_data.groupby('Location').filter(lambda x: x['Total Cases'].mean() > 10000)
print('Filtered Groups with Mean Total Cases > 10000:')
print(filtered_groups.head())

## Drop Groups with Fewer than a Specific Number of Entries
You can filter groups by their size using `.filter()`. For instance, dropping groups with fewer than 5 entries.



In [None]:
# Drop groups with fewer than 5 entries
large_groups = covid_data.groupby('Location').filter(lambda x: len(x) >= 5)
print('Groups with 5 or More Entries:')
print(large_groups.head())

# 7. Aggregation on Multi-Index DataFrames

Multi-index DataFrames allow hierarchical grouping and advanced aggregation techniques.

## Grouping and Aggregating on Hierarchical Indices
You can perform grouping and aggregation on hierarchical indices created by grouping multiple columns.



In [None]:
# Group by Location and Date to create a multi-index DataFrame
multi_index_group = covid_data.groupby(['Location', 'Date'])['New Cases'].sum()
print('Multi-Index Grouping:')
print(multi_index_group.head())

## Cross-Sections of Grouped Data Using `IndexSlice`
The `IndexSlice` feature enables slicing and querying specific levels of multi-index data efficiently.



In [None]:
# Accessing data for a specific Location and Date
idx = pd.IndexSlice
specific_group = multi_index_group.loc[idx['Jakarta', '2021-08-01']]
print('Specific Group Data:')
print(specific_group)

# 8. Practical: Aggregating Large Datasets

In this practical example, we will apply the learned concepts to aggregate a large dataset and perform meaningful analysis.

## Example Task 1: Total Revenue and Average Profit Margin by Region and Product Category


In [None]:
# Example dataset for sales data
sales_data = pd.DataFrame({
    'Region': ['North', 'South', 'East', 'North', 'East'],
    'Product Category': ['A', 'B', 'A', 'C', 'B'],
    'Revenue': [2000, 3000, 1500, 4000, 2500],
    'Profit Margin': [0.2, 0.25, 0.15, 0.3, 0.18]    
    })



# Group by Region and Product Category
revenue_profit = sales_data.groupby(['Region', 'Product Category']).agg({
    'Revenue': 'sum',
    'Profit Margin': 'mean'
    })
print('Total Revenue and Average Profit Margin:')
print(revenue_profit)

## Example Task 2: Percentage Growth in Revenue Year-Over-Year by Region


In [None]:
# Example dataset for revenue growth

revenue_data = pd.DataFrame({
    'Region': ['North', 'North', 'South', 'South'],
    'Year': [2020, 2021, 2020, 2021],
    'Revenue': [5000, 6000, 4000, 4500]
})

# Calculate percentage growth
revenue_data['Growth'] = revenue_data.groupby('Region')['Revenue'].pct_change() * 100
print('Revenue Growth Year-Over-Year:')
print(revenue_data)

## Conclusion

By mastering filtering groups, aggregating multi-index DataFrames, and practical applications, you can derive detailed insights from complex datasets and handle large-scale data analysis efficiently.