#### Pandas GroupBy and Resampling Methods - Part 89

This notebook covers important methods for DataFrameGroupBy objects and Resampler objects, focusing on data manipulation and time series operations.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

##### DataFrame.diff() Method

The `diff()` method calculates the difference between elements in a DataFrame or Series. It's useful for computing discrete differences, especially in time series data.

In [None]:
# Create a sample DataFrame
df = pd.DataFrame({
    'a': [1, 2, 3, 4, 5, 6],
    'b': [1, 1, 2, 3, 5, 8],
    'c': [1, 4, 9, 16, 25, 36]
})
print("Original DataFrame:")
print(df)

In [None]:
# Calculate the difference with previous row (default)
print("\nDifference with previous row (default):")
print(df.diff())

In [None]:
# Calculate the difference with 3rd previous row
print("\nDifference with 3rd previous row:")
print(df.diff(periods=3))

In [None]:
# Calculate the difference with following row
print("\nDifference with following row:")
print(df.diff(periods=-1))

##### DataFrameGroupBy Methods

When you group data using `groupby()`, you get a DataFrameGroupBy object that provides various methods for manipulating and analyzing grouped data.

In [None]:
# Create a sample DataFrame with groups
df_group = pd.DataFrame({
    'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar'],
    'B': [1, 2, np.nan, np.nan, 5, 6],
    'C': [np.nan, np.nan, 3, 4, 5, 6]
})
print("DataFrame with groups:")
print(df_group)

In [None]:
# Group by column 'A'
grouped = df_group.groupby('A')
print("\nGrouped by column 'A':")
for name, group in grouped:
    print(f"\nGroup: {name}")
    print(group)

### DataFrameGroupBy.ffill() - Forward Fill

The `ffill()` method fills NA/NaN values by propagating the last valid observation forward within each group.

In [None]:
# Forward fill NA values within each group
filled_ffill = grouped.ffill()
print("Forward filled values within groups:")
print(filled_ffill)

In [None]:
# Forward fill with limit
filled_ffill_limit = grouped.ffill(limit=1)
print("\nForward filled values with limit=1:")
print(filled_ffill_limit)

### DataFrameGroupBy.fillna() - Fill NA Values

The `fillna()` method fills NA/NaN values using the specified method or value within each group.

In [None]:
# Fill NA values with a specific value
filled_value = grouped.fillna(value=0)
print("NA values filled with 0 within groups:")
print(filled_value)

In [None]:
# Fill NA values using the 'backfill' method
filled_bfill = grouped.fillna(method='bfill')
print("\nNA values filled using backfill within groups:")
print(filled_bfill)

##### Resampling Methods

Resampling is a time series-specific operation that allows you to change the frequency of your time series data. The `resample()` method returns a Resampler object that provides various methods for aggregating and transforming time series data.

In [None]:
# Create a sample time series
dates = pd.date_range('20230101', periods=10, freq='D')
ts = pd.Series(np.random.randn(10), index=dates)
print("Original time series:")
print(ts)

### Resampler.aggregate() / Resampler.agg() - Aggregate Resampled Data

The `aggregate()` (or `agg()`) method allows you to apply one or more aggregation functions to the resampled data.

In [None]:
# Resample to 3-day frequency and calculate the sum
resampled_sum = ts.resample('3D').agg(np.sum)
print("\nResampled to 3-day frequency (sum):")
print(resampled_sum)

In [None]:
# Resample to 3-day frequency and apply multiple aggregation functions
resampled_multi = ts.resample('3D').agg(['sum', 'mean', 'std', 'max'])
print("\nResampled to 3-day frequency (multiple aggregations):")
print(resampled_multi)

In [None]:
# Resample to 3-day frequency and apply custom aggregations
resampled_custom = ts.resample('3D').agg({
    'result': lambda x: x.mean() / x.std() if x.std() != 0 else np.nan,
    'total': np.sum
})
print("\nResampled to 3-day frequency (custom aggregations):")
print(resampled_custom)

### Resampler.transform() - Transform Resampled Data

The `transform()` method applies a function to each group and returns a Series with the transformed values.

In [None]:
# Resample to 3-day frequency and standardize the values within each group
resampled_transform = ts.resample('3D').transform(lambda x: (x - x.mean()) / x.std() if x.std() != 0 else 0)
print("\nResampled and standardized values:")
print(resampled_transform)

### Resampler.pipe() - Chain Operations

The `pipe()` method allows you to chain operations on a Resampler object, improving readability.

In [None]:
# Define custom functions
def add_mean_column(df):
    df['mean'] = df.mean(axis=1)
    return df

def add_std_column(df):
    df['std'] = df.std(axis=1)
    return df

# Create a DataFrame with multiple time series
df_ts = pd.DataFrame({
    'A': np.random.randn(10),
    'B': np.random.randn(10),
    'C': np.random.randn(10)
}, index=dates)
print("DataFrame with multiple time series:")
print(df_ts)

In [None]:
# Use pipe to chain operations
result = (df_ts.resample('3D')
          .mean()
          .pipe(add_mean_column)
          .pipe(add_std_column))
print("\nResampled and processed using pipe:")
print(result)

##### Practical Example: Financial Time Series Analysis

In [None]:
# Create a sample financial time series
dates = pd.date_range('20230101', periods=30, freq='D')
np.random.seed(42)  # For reproducibility
stock_prices = pd.DataFrame({
    'Stock A': 100 + np.cumsum(np.random.normal(0.1, 1, 30)),
    'Stock B': 100 + np.cumsum(np.random.normal(0.05, 1.2, 30)),
    'Stock C': 100 + np.cumsum(np.random.normal(0.2, 0.8, 30))
}, index=dates)
print("Stock prices:")
print(stock_prices.head())

In [None]:
# Calculate daily returns
daily_returns = stock_prices.pct_change().dropna()
print("\nDaily returns:")
print(daily_returns.head())

In [None]:
# Resample to weekly frequency and calculate various statistics
weekly_stats = daily_returns.resample('W').agg(['mean', 'std', 'min', 'max'])
print("\nWeekly return statistics:")
print(weekly_stats)

In [None]:
# Calculate cumulative returns
cumulative_returns = (1 + daily_returns).cumprod() - 1
print("\nCumulative returns:")
print(cumulative_returns.tail())

In [None]:
# Visualize stock prices
plt.figure(figsize=(12, 6))
stock_prices.plot()
plt.title('Stock Prices')
plt.xlabel('Date')
plt.ylabel('Price')
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Visualize cumulative returns
plt.figure(figsize=(12, 6))
cumulative_returns.plot()
plt.title('Cumulative Returns')
plt.xlabel('Date')
plt.ylabel('Cumulative Return')
plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Calculate rolling statistics
rolling_mean = stock_prices.rolling(window=7).mean()
rolling_std = stock_prices.rolling(window=7).std()

# Visualize rolling statistics for Stock A
plt.figure(figsize=(12, 6))
plt.plot(stock_prices['Stock A'], label='Stock A')
plt.plot(rolling_mean['Stock A'], label='7-day Moving Average')
plt.fill_between(rolling_std.index, 
                 rolling_mean['Stock A'] - rolling_std['Stock A'],
                 rolling_mean['Stock A'] + rolling_std['Stock A'],
                 alpha=0.2)
plt.title('Stock A with 7-day Moving Average and Standard Deviation')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()