# 8. Aggregation and Grouping

This section explores how to group data and perform aggregations using Pandas. Grouping and aggregation are essential techniques for summarizing and analyzing large datasets effectively.

## 1. Introduction to Grouping and Aggregation

**Definition:**
- **Grouping**: Splitting data into subsets based on unique values in one or more columns.
- **Aggregation**: Calculating summary statistics (e.g., mean, sum, count) for each group.

**Importance:**
- Summarizes data efficiently.
- Enables insights into patterns and relationships.
- Supports data transformations and analysis.

**Overview of `groupby()` in Pandas:**
- **Split-Apply-Combine Paradigm:**
  1. **Split**: Partition data into groups based on unique values.
  2. **Apply**: Perform operations on each group.
  3. **Combine**: Merge results into a single DataFrame.



## 2. Grouping Data with `groupby()`

The `groupby()` method in Pandas simplifies grouping and aggregating data into meaningful insights.

### Grouping by a Single Column
This method allows grouping data based on unique values in a column.


In [ ]:
import pandas as pd
# Load the COVID-19 dataset
data_path = '../DataSets/Data_COVID19_Indonesia.csv'
covid_data = pd.read_csv(data_path)
print('Dataset Preview:')
print(covid_data.head())

# Group by Location and calculate total cases
grouped = covid_data.groupby('Location')
total_cases = grouped['Total Cases'].sum()
print(total_cases)

### Grouping by Multiple Columns
Combine multiple columns to create hierarchical groups for detailed analysis.


In [ ]:
# Group by Location and Date
grouped_multi = covid_data.groupby(['Location', 'Date'])
cases_summary = grouped_multi['New Cases'].sum()
print(cases_summary)

## 3. Basic Aggregations

Pandas provides a variety of built-in aggregation methods to calculate summary statistics.

### Common Aggregation Methods
- **`mean()`**: Calculate the average value for each group.
- **`sum()`**: Calculate the total value for each group.
- **`count()`**: Count the number of occurrences in each group.
- **`min()`**: Find the minimum value in each group.
- **`max()`**: Find the maximum value in each group.

### Applying a Single Aggregation Function to a Grouped Object


In [ ]:
# Calculate average new cases by location
avg_new_cases = covid_data.groupby('Location')['New Cases'].mean()
print('Average New Cases by Location:')
print(avg_new_cases)

### Examples
#### Average Sales by Region
Aggregating data to find averages reveals patterns within groups.


In [ ]:
# Example: Calculate average total cases by continent
avg_total_cases = covid_data.groupby('Continent')['Total Cases'].mean()
print('Average Total Cases by Continent:')
print(avg_total_cases)

#### Maximum Revenue per Product Category
Use grouping and aggregation to determine maximum values in each group.


In [ ]:
# Example: Find the maximum new cases by location
max_new_cases = covid_data.groupby('Location')['New Cases'].max()
print('Maximum New Cases by Location:')
print(max_new_cases)

## Conclusion

The `groupby()` method in Pandas is a powerful tool for aggregating and analyzing data. By splitting data into groups and applying functions to each group, you can uncover patterns and gain meaningful insights from your dataset.