# Polars Time-Series Analysis - group_by_dynamic & Advanced Features

Deep dive into Polars' powerful time-series capabilities.

## Topics:
- `group_by_dynamic` - Dynamic time-based grouping
- `group_by_rolling` - Rolling window aggregations
- Time-based resampling (upsampling/downsampling)
- Sliding windows and offset configurations
- Time-based joins and asof joins
- Handling irregular time series
- Business calendar operations
- Real-world financial and IoT time-series examples

In [None]:
import polars as pl
from datetime import datetime, date, timedelta
import numpy as np

# Set display options
pl.Config.set_tbl_rows(20)

## Part 1: Understanding group_by_dynamic

`group_by_dynamic` is one of Polars' most powerful features for time-series analysis. It allows you to:
- Group data by time-based windows
- Handle irregular time series
- Create custom rolling windows
- Resample data to different frequencies

### Basic group_by_dynamic Example

In [None]:
# Create sample data with irregular timestamps
np.random.seed(42)

df = pl.DataFrame({
    'timestamp': [
        datetime(2024, 1, 1, 0, 5),
        datetime(2024, 1, 1, 0, 17),
        datetime(2024, 1, 1, 0, 28),
        datetime(2024, 1, 1, 0, 45),
        datetime(2024, 1, 1, 1, 3),
        datetime(2024, 1, 1, 1, 22),
        datetime(2024, 1, 1, 1, 47),
        datetime(2024, 1, 1, 2, 15),
    ],
    'value': [10, 15, 8, 12, 20, 18, 14, 22]
})

print("Original irregular data:")
print(df)

In [None]:
# Group by 30-minute windows
result_30min = df.group_by_dynamic(
    'timestamp',
    every='30m'
).agg([
    pl.col('value').sum().alias('sum'),
    pl.col('value').mean().alias('mean'),
    pl.col('value').count().alias('count')
])

print("\nGrouped by 30-minute windows:")
print(result_30min)

### The 'every' Parameter - Time Window Sizes

The `every` parameter supports various time units:
- `ns` - nanoseconds
- `us` - microseconds  
- `ms` - milliseconds
- `s` - seconds
- `m` - minutes
- `h` - hours
- `d` - days
- `w` - weeks
- `mo` - months
- `q` - quarters
- `y` - years

In [None]:
# Different window sizes
print("1-hour windows:")
print(df.group_by_dynamic('timestamp', every='1h').agg(pl.col('value').sum()))

print("\n15-minute windows:")
print(df.group_by_dynamic('timestamp', every='15m').agg(pl.col('value').sum()))

### The 'period' Parameter - Window Duration

While `every` determines window start points, `period` determines the window duration.
This enables overlapping or non-overlapping windows.

In [None]:
# Create hourly data for clearer demonstration
hourly_df = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1, 0, 0),
        datetime(2024, 1, 1, 12, 0),
        '1h',
        eager=True
    ),
    'value': range(13)
})

print("Original hourly data:")
print(hourly_df)

In [None]:
# every=3h, period=3h (non-overlapping 3-hour windows)
non_overlapping = hourly_df.group_by_dynamic(
    'timestamp',
    every='3h',
    period='3h'
).agg([
    pl.col('value').sum().alias('sum'),
    pl.col('value').count().alias('count')
])

print("\nNon-overlapping 3-hour windows:")
print(non_overlapping)

In [None]:
# every=1h, period=3h (overlapping windows - sliding window)
overlapping = hourly_df.group_by_dynamic(
    'timestamp',
    every='1h',
    period='3h'
).agg([
    pl.col('value').sum().alias('sum'),
    pl.col('value').count().alias('count')
])

print("\nOverlapping 3-hour windows (sliding every 1 hour):")
print(overlapping)

### The 'offset' Parameter - Shifting Window Boundaries

The `offset` parameter shifts when windows start, useful for aligning to specific times.

In [None]:
# Default: windows start at midnight
default_windows = hourly_df.group_by_dynamic(
    'timestamp',
    every='4h'
).agg(pl.col('value').sum())

print("Windows starting at midnight (default):")
print(default_windows)

In [None]:
# Offset by 2 hours: windows start at 02:00, 06:00, 10:00, etc.
offset_windows = hourly_df.group_by_dynamic(
    'timestamp',
    every='4h',
    offset='2h'
).agg(pl.col('value').sum())

print("\nWindows offset by 2 hours:")
print(offset_windows)

### The 'truncate' Parameter - Window Alignment

Controls whether the window boundaries are truncated to the time unit.

In [None]:
# Data starting at an odd time
odd_start_df = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1, 0, 17),  # Starts at 00:17
        datetime(2024, 1, 1, 5, 17),
        '1h',
        eager=True
    ),
    'value': range(6)
})

print("Data starting at 00:17:")
print(odd_start_df)

In [None]:
# truncate=True (default): windows align to clean boundaries
truncated = odd_start_df.group_by_dynamic(
    'timestamp',
    every='2h',
    truncate=True
).agg(pl.col('value').sum())

print("\nWith truncate=True (aligned to hour boundaries):")
print(truncated)

In [None]:
# truncate=False: windows start from first data point
not_truncated = odd_start_df.group_by_dynamic(
    'timestamp',
    every='2h',
    truncate=False
).agg(pl.col('value').sum())

print("\nWith truncate=False (windows from first data point):")
print(not_truncated)

### The 'include_boundaries' Parameter

Adds explicit boundary columns showing window start/end times.

In [None]:
with_boundaries = hourly_df.group_by_dynamic(
    'timestamp',
    every='3h',
    include_boundaries=True
).agg([
    pl.col('value').sum().alias('sum')
])

print("With window boundaries:")
print(with_boundaries)

## Part 2: group_by_rolling - Rolling Window Aggregations

`group_by_rolling` creates rolling (moving) windows based on the data itself rather than fixed time boundaries.

In [None]:
# Sample time series data
ts_df = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1),
        datetime(2024, 1, 10),
        '1d',
        eager=True
    ),
    'value': [10, 12, 8, 15, 11, 14, 9, 13, 16, 10]
})

print("Daily time series data:")
print(ts_df)

In [None]:
# Rolling 3-day window
rolling_3d = ts_df.group_by_rolling(
    'timestamp',
    period='3d'
).agg([
    pl.col('value').mean().alias('rolling_mean'),
    pl.col('value').sum().alias('rolling_sum'),
    pl.col('value').std().alias('rolling_std')
])

print("\n3-day rolling window statistics:")
print(rolling_3d)

### Closed Parameter - Window Boundary Inclusion

Controls which boundaries are included in the window:
- `'right'` (default): includes right boundary, excludes left
- `'left'`: includes left boundary, excludes right
- `'both'`: includes both boundaries
- `'none'`: excludes both boundaries

In [None]:
# Compare different closed options
simple_df = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1),
        datetime(2024, 1, 5),
        '1d',
        eager=True
    ),
    'value': [1, 2, 3, 4, 5]
})

print("Original data:")
print(simple_df)

In [None]:
# 2-day rolling with different closed options
for closed in ['right', 'left', 'both', 'none']:
    result = simple_df.group_by_rolling(
        'timestamp',
        period='2d',
        closed=closed
    ).agg([
        pl.col('value').sum().alias(f'sum_{closed}')
    ])
    print(f"\n2-day rolling sum with closed='{closed}':")
    print(result)

### offset in group_by_rolling

The offset parameter in rolling windows shifts the window backwards or forwards.

In [None]:
# Rolling window with negative offset (look backward)
backward_rolling = simple_df.group_by_rolling(
    'timestamp',
    period='2d',
    offset='-1d'
).agg([
    pl.col('value').sum().alias('sum')
])

print("Rolling window offset backward by 1 day:")
print(backward_rolling)

## Part 3: Advanced Time-Series Patterns

### Downsampling - Reducing Frequency

In [None]:
# High-frequency data (every minute)
np.random.seed(42)
high_freq = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1, 0, 0),
        datetime(2024, 1, 1, 2, 0),
        '1m',
        eager=True
    ),
    'temperature': np.random.normal(20, 2, 121),
    'humidity': np.random.normal(60, 5, 121)
})

print(f"High-frequency data: {len(high_freq)} records (1-minute intervals)")
print(high_freq.head(10))

In [None]:
# Downsample to 15-minute intervals
downsampled_15m = high_freq.group_by_dynamic(
    'timestamp',
    every='15m'
).agg([
    pl.col('temperature').mean().alias('temp_mean'),
    pl.col('temperature').min().alias('temp_min'),
    pl.col('temperature').max().alias('temp_max'),
    pl.col('humidity').mean().alias('humidity_mean')
])

print(f"\nDownsampled to 15-minute intervals: {len(downsampled_15m)} records")
print(downsampled_15m)

### Upsampling - Increasing Frequency

For upsampling, we create the desired time range and join with original data.

In [None]:
# Sparse data (hourly)
sparse_df = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1, 0, 0),
        datetime(2024, 1, 1, 5, 0),
        '1h',
        eager=True
    ),
    'value': [10, 15, 12, 18, 14, 20]
})

print("Sparse hourly data:")
print(sparse_df)

In [None]:
# Create 15-minute grid
upsampled_grid = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1, 0, 0),
        datetime(2024, 1, 1, 5, 0),
        '15m',
        eager=True
    )
})

# Forward fill (use last known value)
upsampled = upsampled_grid.join_asof(
    sparse_df,
    on='timestamp',
    strategy='forward'
)

print("\nUpsampled to 15-minute intervals (forward fill):")
print(upsampled)

In [None]:
# Linear interpolation for upsampling
upsampled_interp = upsampled_grid.join(
    sparse_df,
    on='timestamp',
    how='left'
).with_columns([
    pl.col('value').interpolate().alias('value_interpolated')
])

print("\nUpsampled with linear interpolation:")
print(upsampled_interp)

## Part 4: Multi-Column and Grouped Dynamic Aggregations

### group_by_dynamic with Multiple Groups

In [None]:
# Multi-sensor data
np.random.seed(42)
sensors_df = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1, 0, 0),
        datetime(2024, 1, 1, 6, 0),
        '30m',
        eager=True
    ).repeat_by(3).explode(),
    'sensor_id': ['sensor_A', 'sensor_B', 'sensor_C'] * 13,
    'reading': np.random.uniform(10, 30, 39)
}).sort(['sensor_id', 'timestamp'])

print("Multi-sensor readings:")
print(sensors_df.head(15))

In [None]:
# Group by sensor AND time window
sensor_hourly = sensors_df.group_by_dynamic(
    'timestamp',
    every='2h',
    by='sensor_id'  # Additional grouping column
).agg([
    pl.col('reading').mean().alias('avg_reading'),
    pl.col('reading').std().alias('std_reading'),
    pl.col('reading').count().alias('num_readings')
]).sort(['sensor_id', 'timestamp'])

print("\n2-hour aggregations per sensor:")
print(sensor_hourly)

## Part 5: ASOF Joins - Time-Series Joins

ASOF (as-of) joins are crucial for time-series data where exact timestamp matches are rare.

In [None]:
# Stock prices (sampled every few minutes)
stock_prices = pl.DataFrame({
    'timestamp': [
        datetime(2024, 1, 1, 9, 0),
        datetime(2024, 1, 1, 9, 5),
        datetime(2024, 1, 1, 9, 12),
        datetime(2024, 1, 1, 9, 18),
        datetime(2024, 1, 1, 9, 25),
    ],
    'price': [100.0, 101.5, 99.8, 102.3, 103.1]
})

# Trade events (irregular timing)
trades = pl.DataFrame({
    'timestamp': [
        datetime(2024, 1, 1, 9, 3),
        datetime(2024, 1, 1, 9, 8),
        datetime(2024, 1, 1, 9, 15),
        datetime(2024, 1, 1, 9, 23),
    ],
    'quantity': [100, 200, 150, 300]
})

print("Stock prices:")
print(stock_prices)
print("\nTrades:")
print(trades)

In [None]:
# Join trades with most recent price (backward strategy)
trades_with_price = trades.join_asof(
    stock_prices,
    on='timestamp',
    strategy='backward'  # Use most recent price before trade
)

print("\nTrades matched with most recent price (backward):")
print(trades_with_price)

In [None]:
# Forward strategy - use next available price
trades_forward = trades.join_asof(
    stock_prices,
    on='timestamp',
    strategy='forward'
)

print("\nTrades matched with next available price (forward):")
print(trades_forward)

In [None]:
# Nearest strategy - use closest price in time
trades_nearest = trades.join_asof(
    stock_prices,
    on='timestamp',
    strategy='nearest'
)

print("\nTrades matched with nearest price:")
print(trades_nearest)

### ASOF Join with Tolerance

In [None]:
# Only match if price is within 3 minutes of trade
trades_with_tolerance = trades.join_asof(
    stock_prices,
    on='timestamp',
    strategy='backward',
    tolerance='3m'
)

print("\nTrades with price (max 3-minute tolerance):")
print(trades_with_tolerance)

### ASOF Join with Multiple Keys

In [None]:
# Multiple stocks
multi_stock_prices = pl.DataFrame({
    'timestamp': [
        datetime(2024, 1, 1, 9, 0),
        datetime(2024, 1, 1, 9, 0),
        datetime(2024, 1, 1, 9, 10),
        datetime(2024, 1, 1, 9, 10),
        datetime(2024, 1, 1, 9, 20),
        datetime(2024, 1, 1, 9, 20),
    ],
    'symbol': ['AAPL', 'GOOGL', 'AAPL', 'GOOGL', 'AAPL', 'GOOGL'],
    'price': [150.0, 2800.0, 151.5, 2795.0, 149.8, 2810.0]
})

multi_trades = pl.DataFrame({
    'timestamp': [
        datetime(2024, 1, 1, 9, 5),
        datetime(2024, 1, 1, 9, 12),
        datetime(2024, 1, 1, 9, 15),
    ],
    'symbol': ['AAPL', 'GOOGL', 'AAPL'],
    'quantity': [100, 50, 200]
})

# ASOF join on both timestamp AND symbol
matched_trades = multi_trades.join_asof(
    multi_stock_prices,
    on='timestamp',
    by='symbol',
    strategy='backward'
)

print("\nTrades matched by symbol and time:")
print(matched_trades)

## Part 6: Real-World Example 1 - Financial Time Series

In [None]:
# Simulate tick data (high-frequency stock prices)
np.random.seed(42)

# Generate irregular timestamps (realistic trading scenario)
base_time = datetime(2024, 1, 1, 9, 30)  # Market open
num_ticks = 1000

# Random millisecond intervals between ticks
intervals_ms = np.random.exponential(500, num_ticks).astype(int)
cumulative_ms = np.cumsum(intervals_ms)

tick_data = pl.DataFrame({
    'timestamp': [base_time + timedelta(milliseconds=int(ms)) for ms in cumulative_ms],
    'price': 100 + np.cumsum(np.random.normal(0, 0.1, num_ticks)),
    'volume': np.random.randint(100, 1000, num_ticks)
})

print(f"Tick data: {len(tick_data)} records")
print(tick_data.head(10))

### Create OHLC (Open-High-Low-Close) Bars

In [None]:
# 1-minute OHLC bars
ohlc_1m = tick_data.group_by_dynamic(
    'timestamp',
    every='1m'
).agg([
    pl.col('price').first().alias('open'),
    pl.col('price').max().alias('high'),
    pl.col('price').min().alias('low'),
    pl.col('price').last().alias('close'),
    pl.col('volume').sum().alias('volume'),
    pl.col('price').count().alias('num_ticks')
])

print("\n1-minute OHLC bars:")
print(ohlc_1m.head(10))

### Calculate Technical Indicators

In [None]:
# Calculate common technical indicators using rolling windows
technical_indicators = ohlc_1m.with_columns([
    # Simple Moving Average (SMA)
    pl.col('close').rolling_mean(window_size=5).alias('sma_5'),
    pl.col('close').rolling_mean(window_size=20).alias('sma_20'),
    
    # Exponential Moving Average (EMA) - using built-in ewm_mean
    pl.col('close').ewm_mean(span=5).alias('ema_5'),
    pl.col('close').ewm_mean(span=20).alias('ema_20'),
    
    # Bollinger Bands (20-period)
    pl.col('close').rolling_std(window_size=20).alias('std_20'),
]).with_columns([
    # Calculate Bollinger Bands
    (pl.col('sma_20') + 2 * pl.col('std_20')).alias('bb_upper'),
    (pl.col('sma_20') - 2 * pl.col('std_20')).alias('bb_lower'),
    
    # Calculate returns
    (pl.col('close') / pl.col('close').shift(1) - 1).alias('return_1m'),
]).with_columns([
    # Volatility (rolling standard deviation of returns)
    pl.col('return_1m').rolling_std(window_size=20).alias('volatility_20')
])

print("\nTechnical indicators:")
print(technical_indicators.select([
    'timestamp', 'close', 'sma_5', 'sma_20', 'ema_5', 'ema_20', 
    'bb_upper', 'bb_lower', 'return_1m', 'volatility_20'
]).tail(10))

### Volume-Weighted Average Price (VWAP)

In [None]:
# Calculate VWAP for each 5-minute period
vwap_5m = tick_data.group_by_dynamic(
    'timestamp',
    every='5m'
).agg([
    (pl.col('price') * pl.col('volume')).sum().alias('price_volume'),
    pl.col('volume').sum().alias('total_volume')
]).with_columns([
    (pl.col('price_volume') / pl.col('total_volume')).alias('vwap')
]).select(['timestamp', 'vwap', 'total_volume'])

print("\n5-minute VWAP:")
print(vwap_5m.head(10))

## Part 7: Real-World Example 2 - IoT Sensor Data

In [None]:
# Simulate IoT sensor data with missing readings and irregular intervals
np.random.seed(42)

# Generate 1 week of data with some gaps
all_timestamps = pl.datetime_range(
    datetime(2024, 1, 1, 0, 0),
    datetime(2024, 1, 7, 23, 59),
    '5m',
    eager=True
)

# Randomly drop 20% of readings (simulate sensor failures)
keep_indices = np.random.choice(len(all_timestamps), int(len(all_timestamps) * 0.8), replace=False)
keep_indices.sort()

iot_data = pl.DataFrame({
    'timestamp': [all_timestamps[i] for i in keep_indices],
    'temperature': np.random.normal(20, 3, len(keep_indices)),
    'humidity': np.random.normal(60, 10, len(keep_indices)),
    'pressure': np.random.normal(1013, 5, len(keep_indices))
})

print(f"IoT sensor data: {len(iot_data)} readings (20% missing)")
print(iot_data.head(10))

### Handle Missing Data with Resampling

In [None]:
# Create complete 5-minute grid
complete_grid = pl.DataFrame({
    'timestamp': all_timestamps
})

# Join and interpolate missing values
complete_iot = complete_grid.join(
    iot_data,
    on='timestamp',
    how='left'
).with_columns([
    pl.col('temperature').interpolate().alias('temperature'),
    pl.col('humidity').interpolate().alias('humidity'),
    pl.col('pressure').interpolate().alias('pressure')
])

print(f"\nComplete data with interpolation: {len(complete_iot)} readings")
print(complete_iot.head(10))

### Detect Anomalies Using Rolling Statistics

In [None]:
# Add some anomalies
anomaly_iot = complete_iot.clone()

# Insert temperature spikes at random positions
spike_indices = np.random.choice(len(anomaly_iot), 10, replace=False)
for idx in spike_indices:
    anomaly_iot[idx, 'temperature'] = anomaly_iot[idx, 'temperature'] + np.random.choice([15, -15])

# Calculate rolling mean and std
with_stats = anomaly_iot.with_columns([
    pl.col('temperature').rolling_mean(window_size=12).alias('temp_mean_1h'),  # 12 * 5min = 1 hour
    pl.col('temperature').rolling_std(window_size=12).alias('temp_std_1h')
]).with_columns([
    # Flag anomalies (> 3 standard deviations from mean)
    (pl.col('temperature') - pl.col('temp_mean_1h')).abs() > (3 * pl.col('temp_std_1h')).alias('is_anomaly')
])

# Show anomalies
anomalies = with_stats.filter(pl.col('is_anomaly'))
print(f"\nDetected {len(anomalies)} anomalies:")
print(anomalies.select(['timestamp', 'temperature', 'temp_mean_1h', 'temp_std_1h']))

### Multi-Resolution Analysis

In [None]:
# Analyze at different time scales
resolutions = [
    ('15m', '15 minutes'),
    ('1h', '1 hour'),
    ('6h', '6 hours'),
    ('1d', '1 day')
]

for interval, label in resolutions:
    aggregated = complete_iot.group_by_dynamic(
        'timestamp',
        every=interval
    ).agg([
        pl.col('temperature').mean().alias('temp_mean'),
        pl.col('temperature').std().alias('temp_std'),
        pl.col('humidity').mean().alias('humidity_mean')
    ])
    
    print(f"\n{label} resolution: {len(aggregated)} records")
    print(aggregated.head(5))

## Part 8: Business Calendar Operations

In [None]:
# Create business data with timestamps
business_df = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1),
        datetime(2024, 1, 31),
        '1d',
        eager=True
    ),
    'sales': np.random.uniform(1000, 5000, 31)
})

print("Daily business data:")
print(business_df.head(10))

### Filter Business Days Only

In [None]:
# Keep only weekdays (Monday=0 to Friday=4)
business_days_only = business_df.filter(
    pl.col('timestamp').dt.weekday() < 5
)

print(f"\nBusiness days only: {len(business_days_only)} records")
print(business_days_only.head(10))

### Week-over-Week Comparison

In [None]:
# Aggregate by week and calculate WoW growth
weekly = business_df.group_by_dynamic(
    'timestamp',
    every='1w',
    offset='1d'  # Start week on Monday
).agg([
    pl.col('sales').sum().alias('weekly_sales')
]).with_columns([
    # Week-over-week growth
    ((pl.col('weekly_sales') / pl.col('weekly_sales').shift(1)) - 1).alias('wow_growth')
])

print("\nWeekly sales with WoW growth:")
print(weekly)

### Month-to-Date Calculations

In [None]:
# Calculate cumulative month-to-date sales
mtd = business_df.with_columns([
    pl.col('timestamp').dt.year().alias('year'),
    pl.col('timestamp').dt.month().alias('month')
]).with_columns([
    pl.col('sales').cum_sum().over(['year', 'month']).alias('mtd_sales')
])

print("\nMonth-to-date sales:")
print(mtd.select(['timestamp', 'sales', 'mtd_sales']))

## Part 9: Performance Tips for Time-Series

### 1. Ensure Data is Sorted

In [None]:
# group_by_dynamic and rolling operations assume sorted data
# Always sort by timestamp first for best performance

unsorted_df = pl.DataFrame({
    'timestamp': [
        datetime(2024, 1, 1, 12, 0),
        datetime(2024, 1, 1, 8, 0),
        datetime(2024, 1, 1, 10, 0),
    ],
    'value': [1, 2, 3]
})

# Sort before time-series operations
sorted_df = unsorted_df.sort('timestamp')

print("Always sort by timestamp first:")
print(sorted_df)

### 2. Use Lazy Evaluation for Large Datasets

In [None]:
# Create a large time-series in lazy mode
large_ts = pl.LazyFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1),
        datetime(2024, 12, 31),
        '1m',
        eager=True
    ),
    'value': range(525600)  # Minutes in a year
})

# Chain operations in lazy mode
result = (large_ts
    .group_by_dynamic('timestamp', every='1d')
    .agg([
        pl.col('value').mean().alias('daily_mean')
    ])
    .with_columns([
        pl.col('daily_mean').rolling_mean(window_size=7).alias('weekly_ma')
    ])
    .collect()  # Execute all at once
)

print(f"Processed {len(result)} days of data efficiently")
print(result.head())

### 3. Choose Appropriate Time Granularity

In [None]:
# Don't use higher resolution than needed
# If you need daily stats, aggregate to daily first, then calculate

minute_data = pl.DataFrame({
    'timestamp': pl.datetime_range(
        datetime(2024, 1, 1),
        datetime(2024, 1, 7),
        '1m',
        eager=True
    ),
    'value': range(10080)
})

# Aggregate to daily first for better performance
daily_first = (
    minute_data
    .group_by_dynamic('timestamp', every='1d')
    .agg(pl.col('value').mean().alias('daily_mean'))
    # Then do further analysis on daily data
    .with_columns([
        pl.col('daily_mean').rolling_mean(window_size=3).alias('3d_ma')
    ])
)

print("Efficient aggregation:")
print(daily_first)

## Summary

### Key Time-Series Features:

**group_by_dynamic**:
- `every`: Time window size
- `period`: Window duration (can differ from `every` for overlapping windows)
- `offset`: Shift window boundaries
- `truncate`: Align to time boundaries
- `include_boundaries`: Show window start/end
- `by`: Additional grouping columns

**group_by_rolling**:
- `period`: Rolling window size
- `offset`: Shift window position
- `closed`: Window boundary inclusion ('left', 'right', 'both', 'none')
- `by`: Additional grouping columns

**ASOF Joins**:
- `strategy`: 'backward', 'forward', or 'nearest'
- `tolerance`: Maximum time difference allowed
- `by`: Join on multiple keys

### Best Practices:
1. Always sort data by timestamp before time-series operations
2. Use lazy evaluation for large datasets
3. Choose appropriate time granularity
4. Use ASOF joins for aligning time series with different timestamps
5. Consider `period` vs `every` for overlapping windows
6. Use interpolation for missing data points
7. Apply rolling statistics for anomaly detection
8. Leverage `by` parameter for multi-series analysis

### Common Use Cases:
- **Financial**: OHLC bars, technical indicators, VWAP
- **IoT**: Sensor data aggregation, anomaly detection
- **Business**: Sales analysis, WoW/MoM growth, business calendar operations
- **Resampling**: Downsampling (reduce frequency), upsampling (increase frequency)
- **Multi-resolution**: Analyze data at multiple time scales