#### Pandas Rolling Window Functions and GroupBy - Part 87

This notebook covers advanced rolling window functions like `kurt()` and `apply()`, as well as the basics of GroupBy objects and their methods.

In [None]:
import pandas as pd
import numpy as np
import scipy.stats

##### Rolling Window Functions

### Rolling.kurt() - Rolling Kurtosis

The `kurt()` method calculates the unbiased rolling kurtosis using Fisher's definition without bias.

In [None]:
# Create a sample series
arr = [1, 2, 3, 4, 999]
s = pd.Series(arr)
print("Series:")
print(s)

In [None]:
# Calculate rolling kurtosis with window size 4
rolling_kurt = s.rolling(4).kurt()
print("\nRolling kurtosis with window size 4:")
print(rolling_kurt)

In [None]:
# Compare with scipy.stats.kurtosis
print(f"\nSciPy kurtosis for first 4 values: {scipy.stats.kurtosis(arr[:-1], bias=False):.6f}")
print(f"SciPy kurtosis for last 4 values: {scipy.stats.kurtosis(arr[1:], bias=False):.6f}")

### Rolling.apply() - Custom Rolling Window Function

The `apply()` method allows you to apply a custom function to each window of data.

In [None]:
# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1]
})
print("DataFrame:")
print(df)

In [None]:
# Define a custom function to calculate the range (max - min) of each window
def window_range(x):
    return x.max() - x.min()

# Apply the custom function to each window
rolling_range = df.rolling(window=3).apply(window_range)
print("\nRolling range with window size 3:")
print(rolling_range)

In [None]:
# Using raw=True for better performance with NumPy functions
def numpy_range(x):
    return np.max(x) - np.min(x)

rolling_range_raw = df.rolling(window=3).apply(numpy_range, raw=True)
print("\nRolling range with window size 3 using raw=True:")
print(rolling_range_raw)

### Rolling.aggregate() - Multiple Aggregations

The `aggregate()` (or `agg()`) method allows you to apply multiple aggregation functions to each window.

In [None]:
# Apply multiple aggregation functions
rolling_agg = df.rolling(window=3).agg(['mean', 'std', 'min', 'max'])
print("Rolling aggregation with window size 3:")
print(rolling_agg)

In [None]:
# Apply different functions to different columns
rolling_agg_dict = df.rolling(window=3).agg({
    'A': ['mean', 'max'],
    'B': ['min', 'std']
})
print("\nRolling aggregation with different functions per column:")
print(rolling_agg_dict)

##### Custom Window Indexer

Pandas provides the `BaseIndexer` class for defining custom window boundaries.

In [None]:
from pandas.api.indexers import BaseIndexer

# Define a custom indexer that uses variable window sizes
class VariableWindowIndexer(BaseIndexer):
    def __init__(self, index_array=None):
        super().__init__(index_array=index_array)
        
    def get_window_bounds(self, num_values, min_periods, center, closed):
        # This creates windows with sizes 1, 2, 3, 4, 5, etc.
        start = np.zeros(num_values, dtype=np.int64)
        end = np.array(range(1, num_values + 1), dtype=np.int64)
        return start, end

In [None]:
# Create a sample series
s = pd.Series([1, 2, 3, 4, 5])
print("Series:")
print(s)

In [None]:
# Apply the custom indexer
indexer = VariableWindowIndexer()
variable_window_mean = s.rolling(window=indexer).mean()
print("\nVariable window mean:")
print(variable_window_mean)

##### GroupBy Objects

GroupBy objects are returned by `groupby()` calls and provide methods for aggregating and transforming data by groups.

In [None]:
# Create a sample DataFrame
df = pd.DataFrame({
    'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
    'B': ['one', 'one', 'two', 'three', 'two', 'two'],
    'C': [1, 2, 3, 4, 5, 6],
    'D': [10, 20, 30, 40, 50, 60]
})
print("DataFrame:")
print(df)

In [None]:
# Group by column 'A'
grouped = df.groupby('A')
print("\nGrouped by column 'A':")
print(grouped)

### GroupBy.__iter__() - Iterating Over Groups

The `__iter__()` method allows you to iterate over each group in a GroupBy object.

In [None]:
# Iterate over groups
print("Iterating over groups:")
for name, group in grouped:
    print(f"\nGroup name: {name}")
    print(group)

### GroupBy.groups - Group Labels

The `groups` property returns a dictionary mapping group names to group labels.

In [None]:
# Get group labels
print("Group labels:")
print(grouped.groups)

### GroupBy.indices - Group Indices

The `indices` property returns a dictionary mapping group names to group indices.

In [None]:
# Get group indices
print("Group indices:")
print(grouped.indices)

### GroupBy.get_group() - Get a Specific Group

The `get_group()` method allows you to retrieve a specific group by its name.

In [None]:
# Get a specific group
foo_group = grouped.get_group('foo')
print("Group 'foo':")
print(foo_group)

### Groupby with Multiple Columns

In [None]:
# Group by multiple columns
multi_grouped = df.groupby(['A', 'B'])
print("Grouped by columns 'A' and 'B':")
print(multi_grouped.groups)

In [None]:
# Get a specific group from multi-level groupby
foo_one_group = multi_grouped.get_group(('foo', 'one'))
print("\nGroup ('foo', 'one'):")
print(foo_one_group)

### Using Grouper for Time-Based Grouping

In [None]:
# Create a DataFrame with date index
dates = pd.date_range('2023-01-01', periods=10)
df_dates = pd.DataFrame({
    'A': np.random.randn(10),
    'B': np.random.randn(10)
}, index=dates)
print("DataFrame with date index:")
print(df_dates)

In [None]:
# Group by month using Grouper
monthly_grouped = df_dates.groupby(pd.Grouper(freq='M'))
print("\nMonthly groups:")
for name, group in monthly_grouped:
    print(f"\nMonth: {name}")
    print(group)

In [None]:
# Calculate monthly statistics
monthly_stats = monthly_grouped.agg(['mean', 'std', 'min', 'max'])
print("\nMonthly statistics:")
print(monthly_stats)

### Practical Example: Sales Data Analysis

In [None]:
# Create a sample sales DataFrame
sales_data = pd.DataFrame({
    'date': pd.date_range('2023-01-01', periods=20),
    'product': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
    'region': ['East', 'East', 'West', 'West', 'East', 'East', 'West', 'West', 'East', 'East', 
               'West', 'West', 'East', 'East', 'West', 'West', 'East', 'East', 'West', 'West'],
    'sales': np.random.randint(100, 1000, 20),
    'quantity': np.random.randint(1, 10, 20)
})
print("Sales data:")
print(sales_data.head())

In [None]:
# Group by product and region
product_region_grouped = sales_data.groupby(['product', 'region'])

# Calculate total sales and average quantity by product and region
sales_summary = product_region_grouped.agg({
    'sales': 'sum',
    'quantity': 'mean'
})
print("\nSales summary by product and region:")
print(sales_summary)

In [None]:
# Group by date (weekly) and product
weekly_product_grouped = sales_data.groupby([pd.Grouper(key='date', freq='W'), 'product'])

# Calculate weekly sales by product
weekly_sales = weekly_product_grouped['sales'].sum().unstack()
print("\nWeekly sales by product:")
print(weekly_sales)

In [None]:
# Visualize weekly sales by product
import matplotlib.pyplot as plt

weekly_sales.plot(kind='bar', figsize=(12, 6))
plt.title('Weekly Sales by Product')
plt.xlabel('Week')
plt.ylabel('Sales')
plt.tight_layout()
plt.show()