# Group By Operations in Pandas
This notebook demonstrates how to use the `groupby` method in pandas to perform various aggregation operations on a dataset.

## Importing Required Libraries
We start by importing the pandas library, which is essential for data manipulation and analysis.

In [None]:
import pandas as pd

## Loading the Dataset
We load the transactions dataset from a CSV file and display the first few rows to understand its structure.

In [None]:
transactions = pd.read_csv("../data/transactions.csv")
transactions.head()

## Grouping and Aggregating Data
### Counting All Columns
This operation groups the data by the `idCliente` column and counts the number of non-null values in all columns for each group.

In [None]:
# Group by 'idCliente' and count non-null values in all columns
transactions.groupby(by=["idCliente"]).count()

### Counting a Specific Column
Here, we count the number of transactions (`idTransacao`) for each client (`idCliente`).

In [None]:
# Group by 'idCliente' and count the 'idTransacao' column
transactions.groupby(by=["idCliente"])["idTransacao"].count()

### Returning a DataFrame Instead of a Series
To ensure the result is a DataFrame, we use double square brackets around the column name.

In [None]:
# Group by 'idCliente' and count 'idTransacao', returning a DataFrame
transactions.groupby(by=["idCliente"], as_index=False)[["idTransacao"]].count()

### Aggregating Multiple Columns
We use the `agg` method to apply multiple aggregation functions to different columns. For example, we count the number of transactions and calculate the sum and mean of points (`qtdePontos`).

In [None]:
# Group by 'idCliente' and apply multiple aggregation functions
summary = (transactions.groupby(by=["idCliente"], as_index=False)
 .agg({"idTransacao": ["count"],
       "qtdePontos": ["sum", "mean"]}))

### Understanding MultiIndex Columns
The resulting DataFrame has a MultiIndex for its columns, indicating a hierarchy. Let's inspect the column structure.

In [None]:
# Display the column structure of the summary DataFrame
summary.columns

### Accessing MultiIndex Columns
To access a specific column in a MultiIndex DataFrame, use a tuple with the levels of the index.

In [None]:
# Access the mean of 'qtdePontos' from the summary DataFrame
summary[("qtdePontos", "mean")]

### Flattening MultiIndex Columns
To simplify the column structure, we can rename the columns to remove the hierarchy and make them more descriptive.

In [None]:
# Rename columns to remove MultiIndex and make them more descriptive
summary.columns  = ["idCliente", "qtdeTransacao", "totalPontos", "avgPontos"]
summary