In [1]:
import pandas as pd

In Pandas, the groupby() method is used to group data by one or more columns and then perform aggregate operations (like sum, count, mean, etc.) on the grouped data. This is similar to SQL’s `GROUP BY` clause.

**Basic Syntax in Pandas**

```
df.groupby('column_name').agg(aggregate_function)

```

Where:

- `df` is the DataFrame.
- `column_name` is the column(s) you want to group by.
- `aggregate_function` is a function like . `.sum()`, `.mean()`, `.count()`, etc., applied on the grouped data.

**Example 1: Grouping by a single column and applying aggregation**

Let's say we have a DataFrame with sales data:

In [6]:
# Sample Dataframe
data = {
    'product_id': [1, 2, 1, 3, 2, 1],
    'amount': [100, 200, 150, 300, 250, 200],
    'sale_date': ['2024-01-01', '2024-01-02', '2024-01-02', '2024-01-03', '2024-01-03', '2024-01-04']
}

df = pd.DataFrame(data)

print(df)

   product_id  amount   sale_date
0           1     100  2024-01-01
1           2     200  2024-01-02
2           1     150  2024-01-02
3           3     300  2024-01-03
4           2     250  2024-01-03
5           1     200  2024-01-04


Now, if we want to calculate the total sales (`sum`) for each `product_id`, we can use:

In [9]:
group = df.groupby('product_id').agg({'amount': 'sum'})
print(group)

            amount
product_id        
1              450
2              450
3              300


**Explanation:**
- `df.groupby('product_id')`: This groups the data by the product_id column.
- `.agg({'amount': 'sum'})`: This calculates the sum of the amount column for each product_id.

**Example 2: Grouping by multiple columns**

You can group by multiple columns to get more detailed aggregations. For example, to calculate the total sales per `product_id` per `sale_date`:

In [11]:
group = df.groupby(['product_id', 'sale_date']).agg({'amount': 'sum'})
print(group)

                       amount
product_id sale_date         
1          2024-01-01     100
           2024-01-02     150
           2024-01-04     200
2          2024-01-02     200
           2024-01-03     250
3          2024-01-03     300
