#### Part 28: Advanced Rolling Operations and GroupBy Basics

In this notebook, we'll explore:
- Advanced rolling window operations with custom functions
- Using Numba for performance optimization
- Weighted rolling windows
- Introduction to GroupBy operations

##### Setup
First, let's import the necessary libraries:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set the plotting style
plt.style.use('ggplot')

# Make plots appear in the notebook
%matplotlib inline

##### 1. Rolling Apply with Custom Functions

The `apply()` function allows you to perform generic rolling computations with custom functions. Let's create a sample time series first:

In [None]:
# Create a Series for rolling window examples
s = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
s = s.cumsum()
s.head()

Now, let's define a custom function to compute the mean absolute deviation on a rolling basis:

In [None]:
def mad(x):
    return np.fabs(x - x.mean()).mean()

In [None]:
# Apply the custom function to a rolling window
s.rolling(window=60).apply(mad, raw=True).plot(style='k', figsize=(10, 6), 
                                             title='Rolling Mean Absolute Deviation')

##### 2. Using Numba for Performance Optimization

Pandas' `apply()` function can leverage Numba for performance optimization. Numba is a JIT (Just-In-Time) compiler that can significantly speed up Python functions, especially for numerical computations.

To use Numba, you need to specify `engine='numba'` and set `raw=True`. Let's see an example:

In [None]:
# Create a large Series for performance comparison
data = pd.Series(range(1_000_000))
roll = data.rolling(10)

In [None]:
# Define a simple function for rolling apply
def f(x):
    return np.sum(x) + 5

In [None]:
# Compare performance: Numba vs Cython
# Note: First run with Numba will be slower due to compilation overhead
%timeit -r 1 -n 1 roll.apply(f, engine='numba', raw=True)

In [None]:
# Second run with Numba will be faster as the function is cached
%timeit roll.apply(f, engine='numba', raw=True)

In [None]:
# Compare with Cython engine
%timeit roll.apply(f, engine='cython', raw=True)

##### 3. Weighted Rolling Windows

You can create weighted rolling windows by passing the `win_type` parameter to the `rolling()` method. The weights used in the window are specified by the `win_type` keyword.

Let's create a small Series to demonstrate this:

In [None]:
# Create a small Series for weighted rolling window examples
ser = pd.Series(np.random.randn(10), 
               index=pd.date_range('1/1/2000', periods=10))
ser

In [None]:
# Apply a triangular window
ser.rolling(window=5, win_type='triang').mean()

Let's compare different window types visually:

In [None]:
# Create a larger Series for visualization
larger_ser = pd.Series(np.random.randn(100).cumsum(), 
                      index=pd.date_range('1/1/2000', periods=100))

# Create a figure with multiple window types
fig, axes = plt.subplots(3, 2, figsize=(14, 10))
window_types = ['boxcar', 'triang', 'blackman', 'hamming', 'bartlett', 'bohman']

# Plot original data on all subplots
for i, ax in enumerate(axes.flatten()):
    larger_ser.plot(ax=ax, alpha=0.5, label='Original', legend=True)
    if i < len(window_types):
        # Apply the window type
        win_type = window_types[i]
        larger_ser.rolling(window=20, win_type=win_type).mean().plot(
            ax=ax, label=f'{win_type} window', legend=True)
        ax.set_title(f'Window Type: {win_type}')

plt.tight_layout()

##### 4. Introduction to GroupBy Operations

GroupBy operations allow you to split your data into groups, apply a function to each group independently, and then combine the results. This is often referred to as the "split-apply-combine" pattern.

Let's start with a simple example:

In [None]:
# Create a simple Series with repeating indices
s = pd.Series([1, 2, 3, 10, 20, 30], index=[1, 2, 3, 1, 2, 3])
s

In [None]:
# Group by index
grouped = s.groupby(level=0)
grouped

In [None]:
# Get the first value in each group
grouped.first()

In [None]:
# Get the last value in each group
grouped.last()

In [None]:
# Sum the values in each group
grouped.sum()

### 4.1 GroupBy Sorting

By default, the group keys are sorted during the groupby operation. You can pass `sort=False` for potential speedups:

In [None]:
# Create a DataFrame for groupby examples
df2 = pd.DataFrame({'X': ['B', 'B', 'A', 'A'], 'Y': [1, 2, 3, 4]})
df2

In [None]:
# Default sorting (alphabetical)
df2.groupby(['X']).sum()

In [None]:
# No sorting (order of appearance)
df2.groupby(['X'], sort=False).sum()

GroupBy will preserve the order in which observations are sorted within each group:

In [None]:
# Create another DataFrame for groupby examples
df3 = pd.DataFrame({'X': ['A', 'B', 'A', 'B'], 'Y': [1, 4, 3, 2]})
df3

In [None]:
# Get group 'A' - order is preserved
df3.groupby(['X']).get_group('A')

In [None]:
# Get group 'B' - order is preserved
df3.groupby(['X']).get_group('B')

### 4.2 GroupBy Object Attributes

The `groups` attribute is a dict whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group.

In [None]:
# Create a DataFrame for more complex groupby examples
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                  'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                  'C': np.random.randn(8),
                  'D': np.random.randn(8)})
df

In [None]:
# Get the groups when grouping by column 'A'
df.groupby('A').groups

In [None]:
# Define a function to identify vowels and consonants
def get_letter_type(letter):
    if letter in 'aeiou':
        return 'vowel'
    else:
        return 'consonant'

In [None]:
# Group by a function along axis=1 (columns)
df.groupby(get_letter_type, axis=1).groups

In [None]:
# Group by multiple columns
grouped = df.groupby(['A', 'B'])
grouped.groups

In [None]:
# Get the number of groups
len(grouped)

Let's create a different DataFrame with a DatetimeIndex to demonstrate more GroupBy functionality:

In [None]:
# Create a DataFrame with a DatetimeIndex
df = pd.DataFrame({
    'height': np.random.normal(loc=60, scale=10, size=8),
    'weight': np.random.normal(loc=160, scale=15, size=8),
    'gender': np.random.choice(['male', 'female'], size=8)
}, index=pd.date_range('1/1/2000', periods=8))
df

##### Summary

In this notebook, we've explored:

1. Advanced rolling operations with custom functions using `apply()`
2. Performance optimization with Numba for rolling operations
3. Weighted rolling windows with various window types
4. Introduction to GroupBy operations, including:
   - Basic groupby functionality
   - GroupBy sorting options
   - GroupBy object attributes

These techniques provide powerful tools for time series analysis and data aggregation in pandas.