# Module 05: Advanced Stream Processing

**Estimated Time:** 90 minutes

## Learning Objectives

By the end of this module, you will:
- Master advanced windowing techniques (session, sliding)
- Implement watermark strategies for late data
- Perform stream joins (window joins, interval joins)
- Use process functions for custom logic
- Handle side outputs and multiple streams
- Optimize stream processing performance

---

## 1. Advanced Windowing

### Session Windows

**Concept**: Windows defined by periods of activity separated by gaps

```
Events with 5-minute inactivity gap:

10:00 ─ 10:02 ─ 10:03 ───── 10:10 ─ 10:12 ───── 10:20
  e1     e2      e3           e4      e5           e6
  └──Session 1───┘          └Session 2┘         └Session 3┘
     (3 events)                (2 events)          (1 event)

Use Cases:
- User sessions on website
- Activity bursts
- Conversation threads
```

### Sliding Windows

**Concept**: Overlapping windows that slide by a fixed interval

```
Window size: 10 minutes, Slide: 5 minutes

00:00 ──────────── 00:10
  [    Window 1    ]
       00:05 ──────────── 00:15
         [    Window 2    ]
              00:10 ──────────── 00:20
                [    Window 3    ]

Each event appears in multiple windows!

Use Cases:
- Moving averages
- Trend detection
- Smoothed metrics
```

### Global Windows

**Concept**: Single window containing all events (use with triggers)

```
All Events: [e1, e2, e3, e4, e5, ...]
            └─── Global Window ───────┘

Trigger determines when to compute results:
- Every 100 events
- Every 10 seconds
- On specific conditions

Use Cases:
- Custom windowing logic
- Complex triggering conditions
```

In [None]:
# Setup
import json
import time
from datetime import datetime, timedelta
from collections import defaultdict, deque
import random

print("[OK] Ready for advanced stream processing")

In [None]:
# Implement Session Window
class SessionWindowProcessor:
    """Process events in session windows based on inactivity gap"""

    def __init__(self, gap_seconds=5):
        self.gap = timedelta(seconds=gap_seconds)
        self.sessions = {}  # user_id -> current session
        self.completed_sessions = []  # Completed sessions

    def process_event(self, event):
        """Process event and manage sessions"""
        user_id = event["user_id"]
        event_time = datetime.fromisoformat(event["timestamp"])

        if user_id not in self.sessions:
            # Start new session
            self.sessions[user_id] = {
                "start": event_time,
                "last_event": event_time,
                "events": [event],
                "user_id": user_id,
            }
        else:
            session = self.sessions[user_id]
            gap_duration = event_time - session["last_event"]

            if gap_duration > self.gap:
                # Gap exceeded, close current session and start new one
                session["end"] = session["last_event"]
                session["duration"] = (session["end"] - session["start"]).total_seconds()
                self.completed_sessions.append(session)

                # Start new session
                self.sessions[user_id] = {
                    "start": event_time,
                    "last_event": event_time,
                    "events": [event],
                    "user_id": user_id,
                }
            else:
                # Continue current session
                session["last_event"] = event_time
                session["events"].append(event)

    def close_all_sessions(self):
        """Close all active sessions (end of stream)"""
        for user_id, session in self.sessions.items():
            session["end"] = session["last_event"]
            session["duration"] = (session["end"] - session["start"]).total_seconds()
            self.completed_sessions.append(session)
        self.sessions.clear()

    def print_sessions(self):
        """Print completed sessions"""
        print("\n[DATA] Session Windows:\n")

        for i, session in enumerate(self.completed_sessions, 1):
            print(f"Session {i} ({session['user_id']}):")
            print(f"  Start: {session['start'].strftime('%H:%M:%S')}")
            print(f"  End: {session['end'].strftime('%H:%M:%S')}")
            print(f"  Duration: {session['duration']:.1f}s")
            print(f"  Events: {len(session['events'])}")
            print()


# Generate events with gaps
def generate_session_events():
    """Generate events with natural session gaps"""
    processor = SessionWindowProcessor(gap_seconds=3)

    base_time = datetime.now()

    # User A: Active session, gap, another session
    events = [
        {"user_id": "Alice", "action": "login", "timestamp": base_time.isoformat()},
        {
            "user_id": "Alice",
            "action": "view",
            "timestamp": (base_time + timedelta(seconds=1)).isoformat(),
        },
        {
            "user_id": "Alice",
            "action": "click",
            "timestamp": (base_time + timedelta(seconds=2)).isoformat(),
        },
        # Gap of 5 seconds
        {
            "user_id": "Alice",
            "action": "search",
            "timestamp": (base_time + timedelta(seconds=7)).isoformat(),
        },
        {
            "user_id": "Alice",
            "action": "logout",
            "timestamp": (base_time + timedelta(seconds=8)).isoformat(),
        },
        # User B: Separate sessions
        {
            "user_id": "Bob",
            "action": "login",
            "timestamp": (base_time + timedelta(seconds=0)).isoformat(),
        },
        {
            "user_id": "Bob",
            "action": "view",
            "timestamp": (base_time + timedelta(seconds=1)).isoformat(),
        },
        # Gap of 4 seconds
        {
            "user_id": "Bob",
            "action": "login",
            "timestamp": (base_time + timedelta(seconds=5)).isoformat(),
        },
    ]

    print("[OK] Processing events with session windows...\n")

    for event in events:
        processor.process_event(event)
        print(
            f"Event: {event['user_id']:5s} - {event['action']:10s} at {event['timestamp'][11:19]}"
        )

    processor.close_all_sessions()
    processor.print_sessions()


generate_session_events()
print("[SUCCESS] Session windowing complete!")

In [None]:
# Implement Sliding Window
class SlidingWindowProcessor:
    """Process events in sliding (overlapping) windows"""

    def __init__(self, window_size_seconds=10, slide_seconds=5):
        self.window_size = window_size_seconds
        self.slide = slide_seconds
        self.events = []  # All events

    def add_event(self, event):
        """Add event to buffer"""
        self.events.append(event)

    def compute_sliding_windows(self):
        """Compute results for all sliding windows"""
        if not self.events:
            return []

        # Get time range
        start_time = datetime.fromisoformat(self.events[0]["timestamp"])
        end_time = datetime.fromisoformat(self.events[-1]["timestamp"])

        windows = []
        current = start_time

        while current <= end_time:
            window_end = current + timedelta(seconds=self.window_size)

            # Get events in this window
            window_events = [
                e
                for e in self.events
                if current <= datetime.fromisoformat(e["timestamp"]) < window_end
            ]

            if window_events:
                # Calculate metrics
                values = [e.get("value", 0) for e in window_events]
                windows.append(
                    {
                        "start": current,
                        "end": window_end,
                        "count": len(window_events),
                        "sum": sum(values),
                        "avg": sum(values) / len(values) if values else 0,
                    }
                )

            # Slide to next window
            current += timedelta(seconds=self.slide)

        return windows

    def print_windows(self, windows):
        """Print sliding window results"""
        print("\n[DATA] Sliding Window Results:")
        print(f"Window size: {self.window_size}s, Slide: {self.slide}s\n")

        for i, window in enumerate(windows, 1):
            print(
                f"Window {i}: {window['start'].strftime('%H:%M:%S')} to {window['end'].strftime('%H:%M:%S')}"
            )
            print(f"  Count: {window['count']}, Sum: {window['sum']}, Avg: {window['avg']:.2f}")


# Generate events for sliding windows
processor = SlidingWindowProcessor(window_size_seconds=10, slide_seconds=5)

base_time = datetime.now()
for i in range(20):
    event = {
        "id": i,
        "timestamp": (base_time + timedelta(seconds=i)).isoformat(),
        "value": random.randint(1, 10),
    }
    processor.add_event(event)

print("[OK] Computing sliding windows...")
windows = processor.compute_sliding_windows()
processor.print_windows(windows)

print("\n[OK] Notice: Windows overlap! Each event appears in multiple windows.")
print("     This creates a 'sliding' or 'moving' average effect.")

---

## 2. Watermarks and Late Data

### Watermark Strategies

**1. Bounded Out-of-Orderness:**
```
Watermark = max_event_time - max_out_of_orderness

Example: max_out_of_orderness = 5 seconds
Latest event: 10:00:10
Watermark: 10:00:05

Meaning: "No events before 10:00:05 will arrive"
```

**2. Periodic Watermarks:**
```
Generate watermark every N milliseconds
Based on latest event seen

Trade-off:
- Frequent: Lower latency, more overhead
- Infrequent: Higher latency, less overhead
```

**3. Punctuated Watermarks:**
```
Generate watermark based on special markers in stream

Example: End-of-file markers, control messages
```

### Handling Late Data

**Strategy 1: Allowed Lateness**
```
Window: 10:00-10:10
Watermark passes 10:10 at 10:12
Allowed lateness: 2 minutes

Timeline:
10:10 ─── Watermark passes, fire window
10:11 ─── Late event arrives → Update window, fire again
10:12 ─── Another late event → Update window, fire again
10:14 ─── Allowed lateness expires → Drop window
```

**Strategy 2: Side Outputs**
```
Main output: On-time events
Side output: Late events (for special handling)

Use cases:
- Log late events for analysis
- Store in separate table
- Alert for data quality issues
```

In [None]:
# Watermark generator
class WatermarkGenerator:
    """Generate watermarks with bounded out-of-orderness"""

    def __init__(self, max_out_of_orderness_seconds=5):
        self.max_out_of_orderness = timedelta(seconds=max_out_of_orderness_seconds)
        self.max_timestamp = None
        self.current_watermark = None

    def on_event(self, event_timestamp):
        """Update watermark based on new event"""
        event_time = datetime.fromisoformat(event_timestamp)

        # Track max timestamp
        if self.max_timestamp is None or event_time > self.max_timestamp:
            self.max_timestamp = event_time

        # Generate watermark
        new_watermark = self.max_timestamp - self.max_out_of_orderness

        # Watermarks never go backwards
        if self.current_watermark is None or new_watermark > self.current_watermark:
            self.current_watermark = new_watermark
            return True  # Watermark advanced

        return False  # No change

    def get_watermark(self):
        """Get current watermark"""
        return self.current_watermark

    def is_late(self, event_timestamp):
        """Check if event is late"""
        if self.current_watermark is None:
            return False

        event_time = datetime.fromisoformat(event_timestamp)
        return event_time < self.current_watermark


# Simulate out-of-order events
base_time = datetime.now()
events = [
    {"id": 1, "timestamp": (base_time + timedelta(seconds=0)).isoformat(), "value": 10},
    {"id": 2, "timestamp": (base_time + timedelta(seconds=2)).isoformat(), "value": 20},
    {"id": 3, "timestamp": (base_time + timedelta(seconds=5)).isoformat(), "value": 30},
    {"id": 4, "timestamp": (base_time + timedelta(seconds=10)).isoformat(), "value": 40},
    # Late event! Timestamp 3 but arrives after event with timestamp 10
    {"id": 5, "timestamp": (base_time + timedelta(seconds=3)).isoformat(), "value": 25},
    {"id": 6, "timestamp": (base_time + timedelta(seconds=12)).isoformat(), "value": 50},
]

watermark_gen = WatermarkGenerator(max_out_of_orderness_seconds=2)

print("[DATA] Processing events with watermark tracking:\n")

on_time_events = []
late_events = []

for event in events:
    is_late = watermark_gen.is_late(event["timestamp"])
    watermark_gen.on_event(event["timestamp"])

    event_time = event["timestamp"][11:19]
    watermark_time = watermark_gen.get_watermark()
    watermark_str = watermark_time.strftime("%H:%M:%S") if watermark_time else "None"

    status = "[LATE]" if is_late else "[ON-TIME]"
    print(f"Event {event['id']}: time={event_time}, watermark={watermark_str} {status}")

    if is_late:
        late_events.append(event)
    else:
        on_time_events.append(event)

print(f"\n[DATA] Summary:")
print(f"  On-time events: {len(on_time_events)}")
print(f"  Late events: {len(late_events)}")

if late_events:
    print(f"\n  Late event IDs: {[e['id'] for e in late_events]}")

print("\n[OK] Watermarks help detect and handle late-arriving data!")

---

## 3. Stream Joins

### Types of Joins

**1. Window Join:**
```
Join two streams within the same window

Stream A: [a1, a2, a3, ...]
Stream B: [b1, b2, b3, ...]

Window: 10:00-10:10
  A events: [a1, a2]
  B events: [b1, b3]
  Join on key → Matches: [(a1,b1), (a2,b3)]

Use cases:
- Correlate user clicks with purchases
- Match orders with shipments
```

**2. Interval Join:**
```
Join events from Stream B that occur within
a time interval of events from Stream A

Example: Purchase within 1 hour of click
Click at 10:00 → Join with purchases 10:00-11:00

   Click        Purchase Range
   10:00  →  [10:00 ──── 11:00]
               ↓
          Any purchase in this range gets joined
```

**3. Temporal Join (Table Join):**
```
Stream join with versioned table

Stream: User events
Table: User profiles (changes over time)

Join: Enrich event with profile as of event time
```

In [None]:
# Implement Window Join
class WindowJoin:
    """Join two streams within windows"""

    def __init__(self, window_size_seconds=10):
        self.window_size = window_size_seconds
        self.stream_a = {}  # window_id -> events
        self.stream_b = {}  # window_id -> events

    def get_window_id(self, timestamp):
        """Get window ID from timestamp"""
        event_time = datetime.fromisoformat(timestamp)
        epoch = int(event_time.timestamp())
        return (epoch // self.window_size) * self.window_size

    def add_to_stream_a(self, event):
        """Add event to stream A"""
        window_id = self.get_window_id(event["timestamp"])
        if window_id not in self.stream_a:
            self.stream_a[window_id] = []
        self.stream_a[window_id].append(event)

    def add_to_stream_b(self, event):
        """Add event to stream B"""
        window_id = self.get_window_id(event["timestamp"])
        if window_id not in self.stream_b:
            self.stream_b[window_id] = []
        self.stream_b[window_id].append(event)

    def join_windows(self):
        """Perform inner join on all windows"""
        results = []

        # Find common windows
        common_windows = set(self.stream_a.keys()) & set(self.stream_b.keys())

        for window_id in sorted(common_windows):
            a_events = self.stream_a[window_id]
            b_events = self.stream_b[window_id]

            # Join on key
            for a_event in a_events:
                for b_event in b_events:
                    if a_event["user_id"] == b_event["user_id"]:
                        results.append(
                            {
                                "window_id": window_id,
                                "user_id": a_event["user_id"],
                                "click": a_event,
                                "purchase": b_event,
                            }
                        )

        return results


# Example: Join clicks with purchases
join = WindowJoin(window_size_seconds=5)

base_time = datetime.now()

# Stream A: Clicks
clicks = [
    {"user_id": "Alice", "page": "/product/1", "timestamp": base_time.isoformat()},
    {
        "user_id": "Bob",
        "page": "/product/2",
        "timestamp": (base_time + timedelta(seconds=1)).isoformat(),
    },
    {
        "user_id": "Alice",
        "page": "/product/3",
        "timestamp": (base_time + timedelta(seconds=6)).isoformat(),
    },
]

# Stream B: Purchases
purchases = [
    {
        "user_id": "Alice",
        "product_id": 1,
        "amount": 50,
        "timestamp": (base_time + timedelta(seconds=2)).isoformat(),
    },
    {
        "user_id": "Charlie",
        "product_id": 4,
        "amount": 30,
        "timestamp": (base_time + timedelta(seconds=3)).isoformat(),
    },
    {
        "user_id": "Alice",
        "product_id": 3,
        "amount": 75,
        "timestamp": (base_time + timedelta(seconds=7)).isoformat(),
    },
]

print("[OK] Adding events to streams...\n")

for click in clicks:
    join.add_to_stream_a(click)
    print(f"Click: {click['user_id']:7s} viewed {click['page']}")

print()

for purchase in purchases:
    join.add_to_stream_b(purchase)
    print(
        f"Purchase: {purchase['user_id']:7s} bought product {purchase['product_id']} for ${purchase['amount']}"
    )

# Perform join
joined = join.join_windows()

print("\n[DATA] Join Results (Click → Purchase):\n")

for match in joined:
    print(f"User: {match['user_id']}")
    print(f"  Clicked: {match['click']['page']}")
    print(
        f"  Purchased: Product {match['purchase']['product_id']} for ${match['purchase']['amount']}"
    )
    print()

print(f"[SUCCESS] Found {len(joined)} matches in the same windows!")

---

## 4. Process Functions

### What are Process Functions?

**Low-level API** for custom stream processing logic:
- Access to event timestamps and watermarks
- Register timers for future callbacks
- Access and modify state
- Emit to side outputs

### Process Function Types

**1. ProcessFunction:**
```python
class MyProcessFunction(ProcessFunction):
    def process_element(self, value, ctx):
        # Custom processing logic
        # ctx.timestamp() - event timestamp
        # ctx.timer_service() - register timers
        yield result
```

**2. KeyedProcessFunction:**
```python
# For keyed streams (after key_by)
class MyKeyedProcessFunction(KeyedProcessFunction):
    def process_element(self, value, ctx):
        # Access state per key
        state = self.get_runtime_context().get_state(...)
        yield result
```

### Timers

**Register callbacks for future times:**
```
Event arrives at 10:00
  ↓
Register timer for 10:05
  ↓
Continue processing...
  ↓
Watermark reaches 10:05
  ↓
Timer fires → on_timer() called

Use cases:
- Timeout detection
- Session window close
- Scheduled cleanup
```

In [None]:
# Simulate Process Function with timers
class TimeoutDetector:
    """
    Detect when users are inactive for too long
    (Simulates KeyedProcessFunction with timers)
    """

    def __init__(self, timeout_seconds=5):
        self.timeout = timedelta(seconds=timeout_seconds)
        self.last_seen = {}  # user_id -> last event time
        self.timers = {}  # user_id -> timer time
        self.timeouts = []  # Timeout events

    def process_event(self, event, current_time):
        """Process event and manage timers"""
        user_id = event["user_id"]
        event_time = datetime.fromisoformat(event["timestamp"])

        # Update last seen
        self.last_seen[user_id] = event_time

        # Register/update timer
        timer_time = event_time + self.timeout
        self.timers[user_id] = timer_time

        # Check if any timers should fire
        self.check_timers(current_time)

    def check_timers(self, current_time):
        """Fire timers that have passed"""
        current_dt = datetime.fromisoformat(current_time)

        fired = []
        for user_id, timer_time in list(self.timers.items()):
            if current_dt >= timer_time:
                # Timer fired!
                self.on_timer(user_id, timer_time)
                fired.append(user_id)

        # Remove fired timers
        for user_id in fired:
            del self.timers[user_id]

    def on_timer(self, user_id, timer_time):
        """Callback when timer fires"""
        last_seen = self.last_seen.get(user_id)
        if last_seen:
            inactive_duration = (timer_time - last_seen).total_seconds()
            self.timeouts.append(
                {
                    "user_id": user_id,
                    "last_seen": last_seen.isoformat(),
                    "timeout_at": timer_time.isoformat(),
                    "inactive_duration": inactive_duration,
                }
            )


# Test timeout detection
detector = TimeoutDetector(timeout_seconds=3)

base_time = datetime.now()
events = [
    {"user_id": "Alice", "action": "click", "timestamp": base_time.isoformat()},
    {
        "user_id": "Bob",
        "action": "view",
        "timestamp": (base_time + timedelta(seconds=1)).isoformat(),
    },
    {
        "user_id": "Alice",
        "action": "scroll",
        "timestamp": (base_time + timedelta(seconds=2)).isoformat(),
    },
    # Alice timeout: last event at 2s, timeout at 5s
    # Bob timeout: last event at 1s, timeout at 4s
]

print("[OK] Processing events with timeout detection...\n")

for event in events:
    detector.process_event(event, event["timestamp"])
    print(f"Event: {event['user_id']:7s} - {event['action']} at {event['timestamp'][11:19]}")

# Advance time to trigger timeouts
print("\n[OK] Advancing time to trigger timeouts...\n")

for i in range(3, 8):
    current_time = (base_time + timedelta(seconds=i)).isoformat()
    detector.check_timers(current_time)

print("[DATA] Timeout Events:\n")
for timeout in detector.timeouts:
    print(f"User {timeout['user_id']} timed out:")
    print(f"  Last seen: {timeout['last_seen'][11:19]}")
    print(f"  Timeout at: {timeout['timeout_at'][11:19]}")
    print(f"  Inactive: {timeout['inactive_duration']:.1f}s")
    print()

print("[SUCCESS] Timers allow detection of patterns over time!")

---

## 5. Side Outputs

### What are Side Outputs?

**Multiple output streams from a single operator:**
```
Input Stream
     ↓
 [Processor]
     ↓
    ├─→ Main Output (valid events)
    ├─→ Side Output 1 (late events)
    └─→ Side Output 2 (errors)
```

### Use Cases

**1. Late Data Handling:**
```
Main: On-time events
Side: Late events (for monitoring/reprocessing)
```

**2. Error Handling:**
```
Main: Successfully processed events
Side: Parse errors, validation failures
```

**3. Multi-Tenancy:**
```
Main: Tenant A events
Side 1: Tenant B events
Side 2: Tenant C events
```

In [None]:
# Simulate side outputs
class MultiOutputProcessor:
    """Process events with multiple outputs"""

    def __init__(self):
        self.main_output = []  # Valid events
        self.error_output = []  # Errors
        self.late_output = []  # Late events
        self.watermark = None

    def set_watermark(self, watermark):
        """Update watermark"""
        self.watermark = watermark

    def process_event(self, event):
        """Process and route to appropriate output"""
        try:
            # Validate event
            if "value" not in event or event["value"] < 0:
                # Invalid → Error output
                self.error_output.append({**event, "error": "Invalid value"})
                return "error"

            # Check if late
            if self.watermark:
                event_time = datetime.fromisoformat(event["timestamp"])
                if event_time < self.watermark:
                    # Late → Late output
                    self.late_output.append(event)
                    return "late"

            # Valid → Main output
            self.main_output.append(event)
            return "main"

        except Exception as e:
            # Exception → Error output
            self.error_output.append({**event, "error": str(e)})
            return "error"

    def print_outputs(self):
        """Print all outputs"""
        print("\n[DATA] Output Streams:\n")

        print(f"Main Output ({len(self.main_output)} events):")
        for event in self.main_output[:3]:
            print(f"  {event}")
        if len(self.main_output) > 3:
            print(f"  ... and {len(self.main_output) - 3} more")

        print(f"\nLate Output ({len(self.late_output)} events):")
        for event in self.late_output:
            print(f"  {event}")

        print(f"\nError Output ({len(self.error_output)} events):")
        for event in self.error_output:
            print(f"  {event}")


# Test multi-output processing
processor = MultiOutputProcessor()

base_time = datetime.now()

events = [
    {"id": 1, "value": 10, "timestamp": base_time.isoformat()},
    {"id": 2, "value": 20, "timestamp": (base_time + timedelta(seconds=5)).isoformat()},
    {"id": 3, "value": -5, "timestamp": (base_time + timedelta(seconds=6)).isoformat()},  # Invalid!
    {"id": 4, "value": 30, "timestamp": (base_time + timedelta(seconds=10)).isoformat()},
    # Late event
    {"id": 5, "value": 15, "timestamp": (base_time + timedelta(seconds=2)).isoformat()},
]

print("[OK] Processing events with multiple outputs...\n")

# Set watermark after event 4
for i, event in enumerate(events):
    if i == 4:
        # Set watermark to 8 seconds
        watermark = base_time + timedelta(seconds=8)
        processor.set_watermark(watermark)
        print(f"[Watermark advanced to {watermark.strftime('%H:%M:%S')}]\n")

    output = processor.process_event(event)
    print(f"Event {event['id']}: value={event.get('value', 'N/A'):3} → {output} output")

processor.print_outputs()

print("\n[SUCCESS] Side outputs enable flexible event routing!")

---

## 6. Key Takeaways

[OK] **Session Windows**: Dynamic windows based on inactivity gaps

[OK] **Sliding Windows**: Overlapping windows for moving averages

[OK] **Watermarks**: Track event time progress and detect late data

[OK] **Stream Joins**: Correlate events across multiple streams

[OK] **Process Functions**: Low-level API for custom logic with timers

[OK] **Side Outputs**: Route events to multiple output streams

### Design Patterns

**1. Session Detection:**
```
User events → Session windows → Aggregate per session
```

**2. Late Data Handling:**
```
Events → Watermarks → Main output + Late side output
```

**3. Event Correlation:**
```
Stream A ──┐
           ├─→ Window Join → Correlated events
Stream B ──┘
```

### Best Practices

1. **Choose window type** based on use case
2. **Configure watermarks** with appropriate lateness
3. **Handle late data** explicitly (allowed lateness or side outputs)
4. **Use timers** for timeout detection
5. **Monitor watermark lag** in production
6. **Test with realistic delays** to validate late data handling

---

## 7. Practice Exercises

1. **Implement session window** for user activity tracking
2. **Build sliding window** for moving average calculation
3. **Create interval join** to correlate clicks with purchases
4. **Add timeout detection** using process function with timers
5. **Implement multi-output** processor with error handling

In [None]:
# Your practice code here

---

## 8. Next Steps

Congratulations on completing Module 05!

### What You've Learned

- [OK] Advanced windowing (session, sliding)
- [OK] Watermark strategies and late data handling
- [OK] Stream joins (window, interval)
- [OK] Process functions with timers
- [OK] Side outputs for flexible routing

### Coming Up in Module 06: State Management and Checkpointing

You'll learn:
- State backends (Memory, RocksDB)
- Checkpoint configuration and tuning
- Savepoints and job migration
- State schema evolution
- Queryable state

### Resources

- [Flink Windowing](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/)
- [Event Time and Watermarks](https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/time/)
- [Process Function](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/process_function/)
- [Side Outputs](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/side_output/)

---

**Ready for state management?** Open `06_state_management_checkpointing.ipynb` to continue!