# Custom Grouping with Pandas
This notebook demonstrates how to perform custom grouping and aggregation using pandas. It includes examples of defining custom aggregation functions and applying them to grouped data.

## Importing Libraries
We start by importing the necessary libraries: pandas for data manipulation and numpy for numerical operations.

In [None]:
import pandas as pd
import numpy as np

## Loading the Dataset
We load the transactions dataset, which contains information about customer transactions, including transaction IDs, customer IDs, creation dates, and points earned.

In [None]:
transactions = pd.read_csv("../data/transactions.csv")
# Display the first few rows of the dataset to understand its structure
transactions.head()

## Defining a Custom Aggregation Function
We define a custom function `diff_amp` to calculate the difference amplitude statistic for a pandas Series. This function computes the square root of the squared difference between the amplitude (max - min) and the mean of the Series.

In [None]:
def diff_amp(x: pd.Series):
    """
    Calculate the difference amplitude statistic for a pandas Series.
    Parameters:
        x (pd.Series): A pandas Series containing numerical data.
    Returns:
        float: The square root of the squared difference between the amplitude 
           (max - min) and the mean of the Series.
    """
    amplitude = x.max() - x.min()  # Calculate the amplitude (max - min)
    average = x.mean()  # Calculate the mean of the Series
    return np.sqrt((amplitude - average) ** 2)  # Return the computed statistic

### Example Usage of `diff_amp`
We test the `diff_amp` function on a small Series of ages to verify its behavior.

In [None]:
ages = pd.Series([34, 23, 67, 89, 12, 3, 27, 87, 15])
# Apply the custom function to the Series
diff_amp(ages)

## Grouping and Aggregating Data
We group the transactions dataset by `idCliente` and apply various aggregation functions, including the custom `diff_amp` function, to summarize the data.

In [None]:
summary = (transactions.groupby(by=["idCliente"], as_index=False)
 .agg({
     "idTransacao": ["count"],  # Count the number of transactions per customer
     "qtdePontos": ["sum", "mean", diff_amp]  # Sum, mean, and custom aggregation for points
 }))

### Renaming Columns for Clarity
We rename the columns of the summary DataFrame to make them more descriptive.

In [None]:
summary.columns = [
    "idCliente",  # Customer ID
    "qtdeTransacao",  # Number of transactions
    "totalPontos",  # Total points earned
    "mediaPontos",  # Average points per transaction
    "ampMeanDiff"  # Custom difference amplitude statistic
]

# Display the summarized DataFrame
summary