# pandas.DataFrame.groupby
The pandas.DataFrame.groupby function is one of the most powerful features in pandas, used for grouping data based on one or more keys and then applying some aggregation, transformation, or other operations on those groups.

## 1. Basic Concept
DataFrame.groupby is similar to the SQL GROUP BY clause. It allows you to split the DataFrame into groups based on some criteria, perform operations on each group, and then combine the results back into a single DataFrame or Series.

## 2. Syntax

```python
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=NoDefault.no_default, observed=False, dropna=True)
```

- `by`: Specifies the column(s) to group by.
- `axis`: The axis to group along (default is 0, which means grouping by rows).
- `level`: For multi-indexes, you can specify the level to group by.
- `as_index`: If `True`, the group labels are set as the index of the resulting DataFrame.
- `sort`: Whether to sort the group keys (default is `True`).
- `group_keys`: When calling `apply`, add group keys to the index (default is `True`).
- `observed`: For categorical data, whether to show the observed groups only (default is `False`).
- `dropna`: If `True`, don’t include groups whose keys are `NaN` (default is `True`).


## 3. Steps in Using groupby
1. **Splitting**: The data is split into groups based on the values of the specified key(s).
2. **Applying**: An operation is applied to each group independently. These operations can include:
   - Aggregation: e.g., sum, mean, count, etc.
   - Transformation: e.g., standardizing data within groups.
   - Filtration: e.g., filtering groups based on a condition.
3. **Combining**: The results are combined back into a DataFrame or Series.

## 4. Common Aggregations
Here are some common aggregation functions you can use after grouping:

- `sum()`: Sum of values in each group.
- `mean()`: Mean of values in each group.
- `size()`: Size of each group.
- `count()`: Number of non-NA/null observations in each group.
- `min()`/`max()`: Minimum/Maximum value in each group.
- `first()`/`last()`: First/Last value in each group.
- `std()`/`var()`: Standard deviation/variance of each group.

## 5. Examples

### Example 1: Basic Grouping and Aggregation

In [None]:
import pandas as pd

# Sample DataFrame
data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Values': [10, 20, 15, 25, 10, 30]
}
df = pd.DataFrame(data)

# Group by 'Category' and sum the 'Values'
grouped = df.groupby('Category').sum()

print(grouped)


• Explanation: The DataFrame is grouped by the Category column, and the sum() function is applied to the Values column. This results in a new DataFrame where the Values for each Category are summed.

### Example 2: Grouping by Multiple Columns

In [None]:
data = {
    'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Sub-Category': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
    'Values': [1, 2, 3, 4, 5, 6]
}
df = pd.DataFrame(data)

# Group by 'Category' and 'Sub-Category', then calculate the mean of 'Values'
grouped = df.groupby(['Category', 'Sub-Category']).mean()

print(grouped)


• Explanation: The DataFrame is grouped by both Category and Sub-Category. The mean() function is applied to the Values column, giving the mean for each combination of Category and Sub-Category.

### Example 3: Grouping and Applying Multiple Aggregations

In [None]:
data = {
    'Category': ['A', 'A', 'B', 'B', 'A', 'B'],
    'Values': [10, 20, 15, 25, 10, 30]
}
df = pd.DataFrame(data)

# Group by 'Category' and apply multiple aggregation functions
grouped = df.groupby('Category')['Values'].agg(['sum', 'mean', 'count'])

print(grouped)


• Explanation: The DataFrame is grouped by the Category column, and multiple aggregation functions (sum, mean, and count) are applied to the Values column. The result is a new DataFrame with the specified aggregations.

### Example 4: Grouping and Filtering

In [None]:
data = {
    'Category': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Values': [10, 20, 10, 30, 50, 60]
}
df = pd.DataFrame(data)

# Group by 'Category' and filter groups where the sum of 'Values' > 50
filtered = df.groupby('Category').filter(lambda x: x['Values'].sum() > 50)

print(filtered)


• Explanation: The DataFrame is grouped by the Category column. The filter() function keeps only those groups where the sum of Values is greater than 50. In this case, only category 'C' meets the condition.

## 6. Advanced Usage

### Example 5: Grouping and Transforming Data

In [None]:
data = {
    'Category': ['A', 'A', 'B', 'B', 'A', 'B'],
    'Values': [10, 20, 15, 25, 10, 30]
}
df = pd.DataFrame(data)

# Group by 'Category' and subtract the mean of 'Values' within each group
df['Adjusted Values'] = df.groupby('Category')['Values'].transform(lambda x: x - x.mean())

print(df)


- `Explanation:` The DataFrame is grouped by Category, and the mean of Values within each group is subtracted from each value in that group. This operation is done using transform(), which returns a Series with the same shape as the original data.

### Summary
- `groupby()` allows you to split your data into groups based on some criteria, perform operations on each group, and then combine the results.
- Aggregation functions like `sum(), mean(), count()`, etc., are commonly used with groupby.
- You can group by multiple columns, apply multiple aggregation functions, filter groups, or even transform data within groups.
- groupby() is essential for data analysis tasks where you need to summarize or manipulate data based on specific groupings.