In [None]:
#@title üéß Download Narration Audio & Play Introduction
import os as _os
if not _os.path.exists("/content/narration"):
    !pip install -q gdown
    import gdown
    gdown.download(id="1K3QJmxvgc0_ZwXawSeV_oQll-6WeibaO", output="/content/narration.zip", quiet=False)
    !unzip -q /content/narration.zip -d /content/narration
    !rm /content/narration.zip
    print(f"Loaded {len(_os.listdir('/content/narration'))} narration segments")
else:
    print("Narration audio already loaded.")

from IPython.display import Audio, display
display(Audio("/content/narration/01_00_intro.mp3"))


In [None]:
#@title üéß Code Walkthrough: Setup
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_01_setup.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


In [None]:
# üîß Setup: Run this cell first!
# Check GPU availability and install dependencies

import torch
import sys

# Check GPU
if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f"‚úÖ GPU available: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")
else:
    device = torch.device('cpu')
    print("‚ö†Ô∏è No GPU detected. Some cells may run slowly.")
    print("   Go to Runtime ‚Üí Change runtime type ‚Üí GPU")

print(f"\nüì¶ Python {sys.version.split()[0]}")
print(f"üî• PyTorch {torch.__version__}")

# Set random seeds for reproducibility
import random
import numpy as np

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

print(f"üé≤ Random seed set to {SEED}")

%matplotlib inline

# üöÄ Conversation Rollouts and Next-State Signals: Turning Chat into Training Data

*Part 1 of the Vizuara series on OpenClaw-RL*
*Estimated time: 45 minutes*

# ü§ñ AI Teaching Assistant

Need help with this notebook? Open the **AI Teaching Assistant** ‚Äî it has already read this entire notebook and can help with concepts, code, and exercises.

**[üëâ Open AI Teaching Assistant](https://pods.vizuara.ai/courses/openclaw-rl/practice/1/assistant)**

*Tip: Open it in a separate tab and work through this notebook side-by-side.*


In [None]:
#@title üéß Listen: Why It Matters
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_02_why_it_matters.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 1. Why Does This Matter?

Every time you chat with an AI assistant, you are generating valuable training signal ‚Äî but current systems throw it all away.

When you say "No, I wanted Python not JavaScript," that correction contains precise information about your preferences. When you say "Perfect, thanks!", that confirmation tells the model it did something right.

**OpenClaw-RL captures this signal.** In this notebook, we will build the **rollout collection** system from scratch ‚Äî the component that transforms your everyday conversations into structured training data.

By the end of this notebook, you will have a working `ConversationRolloutCollector` that:
- Tracks multi-turn conversation sessions
- Classifies each turn as trainable or non-trainable
- Extracts **next-state signals** ‚Äî where the user's next message becomes the feedback for the model's previous response
- Produces ready-to-train samples with (prompt, response, feedback) triples

In [None]:
# üéØ Teaser: Here's what we'll build
# Input: A raw multi-turn conversation
# Output: Structured training samples with natural feedback signals

# conversation = [
#     {"role": "user", "content": "Write me a sorting function in Python"},
#     {"role": "assistant", "content": "Here's a JavaScript sort: ..."},
#     {"role": "user", "content": "No, I said Python not JavaScript"},
#     {"role": "assistant", "content": "Sorry! Here's the Python version: ..."},
#     {"role": "user", "content": "Perfect, thanks!"},
# ]
#
# collector.process(conversation) ‚Üí
#   Sample 1: response="JS sort", feedback="No, I said Python" ‚Üí NEGATIVE
#   Sample 2: response="Python sort", feedback="Perfect, thanks!" ‚Üí POSITIVE

In [None]:
#@title üéß Listen: Intuition
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_03_intuition.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 2. Building Intuition

Let us think about what happens in a typical conversation with an AI assistant.

You ask something. The model responds. Then you react to that response.

That reaction ‚Äî your next message ‚Äî is incredibly informative:

- **"Great, exactly what I needed!"** ‚Üí The model did well. Positive signal.
- **"No, I meant the other thing."** ‚Üí The model misunderstood. Negative signal with direction.
- **"Can you also add error handling?"** ‚Üí The model did okay, but there is room for improvement. Neutral/mildly positive signal.
- **"That code has a bug on line 3."** ‚Üí Specific negative signal with a precise correction.

This is the **next-state signal** idea. In traditional reinforcement learning, we need a separate reward function to tell the agent how well it did. In OpenClaw-RL, the reward is already there ‚Äî embedded naturally in the conversation flow.

But not every message in a conversation is useful for training. Some messages are:
- System messages or metadata
- The user asking for clarification (not reacting to a response)
- Tool calls or function results

We need to classify each turn and extract only the useful training pairs.

### ü§î Think About This

Before we start coding, consider this question:

In a 10-turn conversation, how many training samples do you think we can extract? Is it 10? 5? Just 1?

Think about which turns have a clear (response, feedback) pair and which do not.

In [None]:
#@title üéß Listen: Math
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_04_math.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 3. The Mathematics

The rollout collection system does not involve heavy mathematics, but it does have a formal structure.

A **conversation session** is a sequence of turns:

$$\tau = \{(r_1, m_1), (r_2, m_2), \ldots, (r_T, m_T)\}$$

where $r_t \in \{\text{user}, \text{assistant}, \text{system}\}$ is the role and $m_t$ is the message content.

A turn $(r_t, m_t)$ is **trainable** (main-line) if:
1. $r_t = \text{assistant}$ (the model produced a response)
2. There exists a subsequent user turn $(r_{t+k}, m_{t+k})$ where $r_{t+k} = \text{user}$ and $k$ is minimal
3. That user turn is a **reaction** to the assistant's response (not a new topic)

The **next-state signal** for a trainable assistant turn at position $t$ is:

$$\text{feedback}_t = m_{t+k}$$

where $t+k$ is the position of the next user message after turn $t$.

Computationally, this means: for every assistant response, we look ahead in the conversation to find the user's next message, and treat that message as natural feedback.

The resulting training sample is a triple:

$$s_t = (\text{context}_t, \text{response}_t, \text{feedback}_t)$$

where $\text{context}_t$ is everything the model saw before generating $\text{response}_t$.

In [None]:
#@title üéß Code Walkthrough: Data Structures
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_05_data_structures.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 4. Let's Build It ‚Äî Component by Component

### 4.1 Setup and Imports

In [None]:
import json
import uuid
import hashlib
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
from datetime import datetime
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

print("‚úÖ All imports ready!")

### 4.2 Define the Data Structures

First, let us define the core data types. In OpenClaw-RL, every turn has a **role**, a **classification**, and optional metadata.

In [None]:
class TurnRole(Enum):
    """Who sent this message?"""
    USER = "user"
    ASSISTANT = "assistant"
    SYSTEM = "system"
    TOOL = "tool"

class TurnClassification(Enum):
    """Is this turn useful for training?"""
    MAIN_LINE = "main_line"     # Trainable ‚Äî core interaction turn
    SIDE = "side"               # Non-trainable ‚Äî metadata, system, or off-topic
    UNCLASSIFIED = "unclassified"

@dataclass
class ConversationTurn:
    """A single turn in a conversation."""
    role: TurnRole
    content: str
    turn_index: int
    classification: TurnClassification = TurnClassification.UNCLASSIFIED
    timestamp: Optional[str] = None
    is_tool_call: bool = False

    def __repr__(self):
        cls = self.classification.value[:4].upper()
        return f"Turn {self.turn_index} [{self.role.value:>9}] ({cls}): {self.content[:60]}..."

@dataclass
class TrainingSample:
    """A structured training sample extracted from a conversation."""
    session_id: str
    context: list           # All turns before the response
    response: str           # The assistant's response
    feedback: str           # The user's next message (next-state signal)
    response_turn_index: int
    feedback_turn_index: int

    def __repr__(self):
        return (f"TrainingSample(response='{self.response[:40]}...', "
                f"feedback='{self.feedback[:40]}...')")

print("‚úÖ Data structures defined!")

Let us create a sample conversation to work with:

In [None]:
# A realistic multi-turn conversation
sample_conversation = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write me a function to sort a list in Python."},
    {"role": "assistant", "content": "Here's a sorting function in JavaScript:\n\nfunction sortArray(arr) {\n  return arr.sort((a, b) => a - b);\n}"},
    {"role": "user", "content": "No, I said Python not JavaScript. Please use Python."},
    {"role": "assistant", "content": "Sorry about that! Here's the Python version:\n\ndef sort_list(lst):\n    return sorted(lst)"},
    {"role": "user", "content": "Perfect, thanks! Can you also add a reverse parameter?"},
    {"role": "assistant", "content": "Sure! Here you go:\n\ndef sort_list(lst, reverse=False):\n    return sorted(lst, reverse=reverse)"},
    {"role": "user", "content": "Great, exactly what I needed!"},
]

print(f"Sample conversation has {len(sample_conversation)} turns")
for i, turn in enumerate(sample_conversation):
    content_preview = turn['content'][:60].replace('\n', ' ')
    print(f"  Turn {i}: [{turn['role']:>9}] {content_preview}...")

In [None]:
#@title üéß Code Walkthrough: Classifier
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_06_classifier.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


### 4.3 The Turn Classifier

Now let us build the turn classifier. This is the component that decides which turns are **main-line** (trainable) and which are **side** (non-trainable).

The rules are:
1. **System turns** ‚Üí always SIDE (they are instructions, not interactions)
2. **Tool turns** ‚Üí always SIDE (tool results are not conversational feedback)
3. **Assistant turns** ‚Üí MAIN_LINE if followed by a user reaction; SIDE otherwise
4. **User turns** ‚Üí MAIN_LINE if they follow an assistant turn (they might be feedback); SIDE if they follow a system/tool turn

In [None]:
class TurnClassifier:
    """Classifies conversation turns as main-line (trainable) or side (non-trainable)."""

    # Keywords that indicate a message is metadata, not real feedback
    METADATA_KEYWORDS = [
        "function_call", "tool_result", "system:",
        "[METADATA]", "[INTERNAL]"
    ]

    def classify_turns(self, raw_turns: list) -> list:
        """
        Classify each turn in a conversation.

        Args:
            raw_turns: List of dicts with 'role' and 'content' keys

        Returns:
            List of ConversationTurn objects with classifications set
        """
        turns = []
        for i, raw in enumerate(raw_turns):
            role = TurnRole(raw["role"])
            is_tool = raw.get("is_tool_call", False)
            turn = ConversationTurn(
                role=role,
                content=raw["content"],
                turn_index=i,
                is_tool_call=is_tool,
                timestamp=raw.get("timestamp", datetime.now().isoformat())
            )
            turns.append(turn)

        # Now classify each turn based on context
        for i, turn in enumerate(turns):
            turn.classification = self._classify_single(turn, i, turns)

        return turns

    def _classify_single(self, turn, index, all_turns):
        """Classify a single turn based on its role and context."""
        # Rule 1: System and tool turns are always side
        if turn.role == TurnRole.SYSTEM:
            return TurnClassification.SIDE
        if turn.is_tool_call or turn.role == TurnRole.TOOL:
            return TurnClassification.SIDE

        # Rule 2: Check for metadata content
        if self._is_metadata(turn.content):
            return TurnClassification.SIDE

        # Rule 3: Assistant turns are main-line if followed by a user turn
        if turn.role == TurnRole.ASSISTANT:
            has_user_followup = any(
                t.role == TurnRole.USER
                for t in all_turns[index + 1:]
            )
            return TurnClassification.MAIN_LINE if has_user_followup else TurnClassification.SIDE

        # Rule 4: User turns are main-line if they follow an assistant turn
        if turn.role == TurnRole.USER:
            if index > 0 and all_turns[index - 1].role == TurnRole.ASSISTANT:
                return TurnClassification.MAIN_LINE
            # First user message (the initial prompt) is also main-line
            if index == 0 or (index > 0 and all_turns[index - 1].role == TurnRole.SYSTEM):
                return TurnClassification.MAIN_LINE
            return TurnClassification.SIDE

        return TurnClassification.SIDE

    def _is_metadata(self, content: str) -> bool:
        """Check if content looks like metadata rather than real conversation."""
        content_lower = content.lower()
        return any(kw.lower() in content_lower for kw in self.METADATA_KEYWORDS)

# Test the classifier
classifier = TurnClassifier()
classified_turns = classifier.classify_turns(sample_conversation)

print("Classified turns:")
for turn in classified_turns:
    emoji = "üü¢" if turn.classification == TurnClassification.MAIN_LINE else "‚ö™"
    print(f"  {emoji} {turn}")

In [None]:
#@title üéß Code Walkthrough: Viz Flow
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_07_viz_flow.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


### üìä Visualization: Conversation Flow with Classifications

Let us visualize which turns are trainable and which are not.

In [None]:
def visualize_conversation_flow(turns):
    """Visualize a conversation with turn classifications."""
    fig, ax = plt.subplots(figsize=(14, len(turns) * 0.8 + 1))

    colors = {
        TurnClassification.MAIN_LINE: '#2ecc71',  # Green
        TurnClassification.SIDE: '#95a5a6',         # Gray
    }
    role_x = {
        TurnRole.SYSTEM: 0.1,
        TurnRole.USER: 0.3,
        TurnRole.ASSISTANT: 0.7,
    }

    for i, turn in enumerate(turns):
        y = len(turns) - i - 1
        x = role_x.get(turn.role, 0.5)
        color = colors.get(turn.classification, '#bdc3c7')

        # Draw the turn box
        content_short = turn.content[:50].replace('\n', ' ')
        bbox = dict(boxstyle="round,pad=0.3", facecolor=color, alpha=0.7)
        ax.text(x, y, f"[{turn.role.value}] {content_short}...",
                fontsize=9, ha='center', va='center', bbox=bbox,
                fontfamily='monospace')

        # Draw arrow from previous turn
        if i > 0:
            prev_y = len(turns) - (i - 1) - 1
            prev_x = role_x.get(turns[i-1].role, 0.5)
            ax.annotate('', xy=(x, y + 0.3), xytext=(prev_x, prev_y - 0.3),
                       arrowprops=dict(arrowstyle='->', color='#34495e', lw=1.5))

    # Legend
    main_patch = mpatches.Patch(color='#2ecc71', alpha=0.7, label='Main-line (trainable)')
    side_patch = mpatches.Patch(color='#95a5a6', alpha=0.7, label='Side (non-trainable)')
    ax.legend(handles=[main_patch, side_patch], loc='upper right', fontsize=10)

    ax.set_xlim(-0.1, 1.1)
    ax.set_ylim(-0.8, len(turns) + 0.2)
    ax.axis('off')
    ax.set_title('Conversation Flow with Turn Classifications', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

visualize_conversation_flow(classified_turns)

In [None]:
#@title üéß Code Walkthrough: Extractor
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_08_extractor.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


### 4.4 The Next-State Signal Extractor

Now for the key insight of OpenClaw-RL. For every assistant response, we extract the **next-state signal** ‚Äî the user's next message that serves as natural feedback.

In [None]:
class NextStateExtractor:
    """Extracts next-state signals from classified conversation turns."""

    def extract_pairs(self, turns: list) -> list:
        """
        For each main-line assistant turn, find the next user turn
        and create a (response, feedback) pair.

        Args:
            turns: List of classified ConversationTurn objects

        Returns:
            List of (assistant_turn, user_feedback_turn) pairs
        """
        pairs = []

        for i, turn in enumerate(turns):
            # Only process main-line assistant turns
            if turn.role != TurnRole.ASSISTANT:
                continue
            if turn.classification != TurnClassification.MAIN_LINE:
                continue

            # Look ahead to find the next user turn
            feedback_turn = self._find_next_user_turn(turns, i)

            if feedback_turn is not None:
                pairs.append((turn, feedback_turn))

        return pairs

    def _find_next_user_turn(self, turns, start_index):
        """Find the next user turn after position start_index."""
        for j in range(start_index + 1, len(turns)):
            if turns[j].role == TurnRole.USER:
                return turns[j]
        return None

# Extract next-state pairs
extractor = NextStateExtractor()
pairs = extractor.extract_pairs(classified_turns)

print(f"Extracted {len(pairs)} (response, feedback) pairs:\n")
for i, (response_turn, feedback_turn) in enumerate(pairs):
    print(f"Pair {i + 1}:")
    print(f"  üì§ Assistant: {response_turn.content[:80]}...")
    print(f"  üì• Feedback:  {feedback_turn.content[:80]}...")
    print()

### ü§î Think About This

Look at the pairs we extracted:
- Pair 1: The assistant wrote JavaScript, and the user corrected it ‚Üí **negative signal**
- Pair 2: The assistant fixed it to Python, and the user said "Perfect" ‚Üí **positive signal**
- Pair 3: The assistant added the reverse parameter, and the user confirmed ‚Üí **positive signal**

Notice how we got 3 training samples from a single 8-turn conversation ‚Äî all without any manual labeling!

In [None]:
#@title üéß Before You Start: Todo Context
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_09_todo_context.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 5. üîß Your Turn

### TODO 1: Build the Context Assembler

The training sample needs the full **context** ‚Äî everything the model saw before generating its response. This includes all previous turns in the conversation.

Implement the `build_context` function:

In [None]:
def build_context(turns: list, response_turn_index: int) -> list:
    """
    Build the context (conversation history) that the model saw
    before generating the response at response_turn_index.

    Args:
        turns: All conversation turns
        response_turn_index: Index of the assistant's response turn

    Returns:
        List of dicts with 'role' and 'content' for all turns
        BEFORE the response (not including the response itself)

    Example:
        If turns are [system, user, assistant, user, assistant]
        and response_turn_index = 4, the context should include
        turns at indices [0, 1, 2, 3]
    """
    # ============ TODO ============
    # Collect all turns before response_turn_index
    # Return them as a list of {"role": ..., "content": ...} dicts
    # ==============================

    context = ???  # YOUR CODE HERE

    return context

# ‚úÖ Verification
test_context = build_context(classified_turns, 2)  # Context for first assistant response
assert len(test_context) == 2, f"‚ùå Expected 2 turns in context, got {len(test_context)}"
assert test_context[0]["role"] == "system", f"‚ùå First turn should be system, got {test_context[0]['role']}"
assert test_context[1]["role"] == "user", f"‚ùå Second turn should be user, got {test_context[1]['role']}"
print("‚úÖ Context assembler works correctly!")

In [None]:
#@title üéß Before You Start: Todo Collector
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_10_todo_collector.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


### TODO 2: Implement the Complete Rollout Collector

Now combine everything into a single `ConversationRolloutCollector` class:

In [None]:
class ConversationRolloutCollector:
    """
    The complete rollout collection pipeline.
    Takes raw conversations and produces structured training samples.
    """

    def __init__(self):
        self.classifier = TurnClassifier()
        self.extractor = NextStateExtractor()
        self.sessions = {}  # session_id -> list of samples

    def process_conversation(self, raw_turns: list, session_id: str = None) -> list:
        """
        Process a raw conversation into training samples.

        Args:
            raw_turns: List of dicts with 'role' and 'content'
            session_id: Optional session identifier

        Returns:
            List of TrainingSample objects

        Steps:
            1. Generate session_id if not provided
            2. Classify all turns
            3. Extract (response, feedback) pairs
            4. Build context for each pair
            5. Create TrainingSample objects
        """
        if session_id is None:
            session_id = str(uuid.uuid4())[:8]

        # ============ TODO ============
        # Step 1: Classify turns using self.classifier
        # Step 2: Extract pairs using self.extractor
        # Step 3: For each pair, build context and create TrainingSample
        # Step 4: Store in self.sessions and return
        # ==============================

        samples = ???  # YOUR CODE HERE

        return samples

    def get_all_samples(self) -> list:
        """Return all training samples across all sessions."""
        all_samples = []
        for samples in self.sessions.values():
            all_samples.extend(samples)
        return all_samples

    def get_stats(self) -> dict:
        """Return statistics about collected rollouts."""
        all_samples = self.get_all_samples()
        return {
            "total_sessions": len(self.sessions),
            "total_samples": len(all_samples),
            "avg_samples_per_session": len(all_samples) / max(len(self.sessions), 1),
        }

# ‚úÖ Verification
collector = ConversationRolloutCollector()
samples = collector.process_conversation(sample_conversation, session_id="test_001")

assert len(samples) == 3, f"‚ùå Expected 3 training samples, got {len(samples)}"
assert samples[0].feedback == "No, I said Python not JavaScript. Please use Python.", \
    f"‚ùå First feedback should be the correction, got: {samples[0].feedback[:50]}"
assert samples[1].feedback == "Perfect, thanks! Can you also add a reverse parameter?", \
    f"‚ùå Second feedback should be positive"
print(f"‚úÖ Rollout collector works! Extracted {len(samples)} samples from 1 conversation.")

In [None]:
#@title üéß Code Walkthrough: Putting Together
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_11_putting_together.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 6. Putting It All Together

Let us now test our collector on multiple realistic conversations:

In [None]:
# Multiple conversation sessions
conversations = [
    {
        "session_id": "session_001",
        "turns": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What's the capital of France?"},
            {"role": "assistant", "content": "The capital of France is Paris."},
            {"role": "user", "content": "Correct! And what about Germany?"},
            {"role": "assistant", "content": "The capital of Germany is Berlin."},
            {"role": "user", "content": "Right, thanks!"},
        ]
    },
    {
        "session_id": "session_002",
        "turns": [
            {"role": "user", "content": "Explain recursion in simple terms."},
            {"role": "assistant", "content": "Recursion is when a function calls itself. It's like looking into two mirrors facing each other ‚Äî the reflection contains itself infinitely."},
            {"role": "user", "content": "That's a great analogy! Can you show me a code example?"},
            {"role": "assistant", "content": "Sure! Here's a recursive factorial function:\n\ndef factorial(n):\n    if n <= 1:\n        return 1\n    return n * factorial(n - 1)"},
            {"role": "user", "content": "Perfect, very clear."},
        ]
    },
    {
        "session_id": "session_003",
        "turns": [
            {"role": "user", "content": "Help me write a REST API in Flask."},
            {"role": "assistant", "content": "Here's how to create a REST API using Django:\n\nfrom django.http import JsonResponse..."},
            {"role": "user", "content": "I asked for Flask, not Django. Please use Flask."},
            {"role": "assistant", "content": "My apologies! Here's the Flask version:\n\nfrom flask import Flask, jsonify\n\napp = Flask(__name__)\n\n@app.route('/api/items')\ndef get_items():\n    return jsonify({'items': []})"},
            {"role": "user", "content": "Now that's what I wanted. Add a POST endpoint too."},
            {"role": "assistant", "content": "Here's the updated version with both GET and POST:\n\n@app.route('/api/items', methods=['GET', 'POST'])\ndef items():\n    if request.method == 'POST':\n        data = request.get_json()\n        return jsonify(data), 201\n    return jsonify({'items': []})"},
            {"role": "user", "content": "Excellent work!"},
        ]
    },
]

# Process all conversations
collector = ConversationRolloutCollector()
all_samples = []

for conv in conversations:
    samples = collector.process_conversation(conv["turns"], conv["session_id"])
    all_samples.extend(samples)
    print(f"Session {conv['session_id']}: {len(samples)} training samples")

stats = collector.get_stats()
print(f"\nüìä Overall Statistics:")
print(f"  Total sessions: {stats['total_sessions']}")
print(f"  Total samples: {stats['total_samples']}")
print(f"  Avg samples per session: {stats['avg_samples_per_session']:.1f}")

In [None]:
#@title üéß Code Walkthrough: Analysis
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_12_analysis.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 7. Training Data Analysis

Let us analyze the quality of our extracted training data:

In [None]:
def analyze_feedback_signals(samples):
    """Analyze the distribution of feedback signals in our training data."""

    # Simple heuristic-based sentiment classification
    positive_keywords = ["perfect", "great", "correct", "thanks", "excellent",
                         "good", "right", "exactly", "wonderful", "clear"]
    negative_keywords = ["no", "wrong", "not", "incorrect", "don't",
                         "didn't", "mistake", "error", "instead"]

    sentiments = []
    for sample in samples:
        feedback_lower = sample.feedback.lower()
        pos_score = sum(1 for kw in positive_keywords if kw in feedback_lower)
        neg_score = sum(1 for kw in negative_keywords if kw in feedback_lower)

        if pos_score > neg_score:
            sentiments.append("positive")
        elif neg_score > pos_score:
            sentiments.append("negative")
        else:
            sentiments.append("neutral")

    return sentiments

sentiments = analyze_feedback_signals(all_samples)

# Visualize the distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Sentiment distribution
colors_map = {"positive": "#2ecc71", "negative": "#e74c3c", "neutral": "#f39c12"}
sentiment_counts = {s: sentiments.count(s) for s in ["positive", "negative", "neutral"]}
bars = axes[0].bar(sentiment_counts.keys(), sentiment_counts.values(),
                   color=[colors_map[s] for s in sentiment_counts.keys()],
                   edgecolor='white', linewidth=2)
axes[0].set_title("Feedback Signal Distribution", fontsize=13, fontweight='bold')
axes[0].set_ylabel("Count")
for bar, count in zip(bars, sentiment_counts.values()):
    axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1,
                str(count), ha='center', fontsize=12, fontweight='bold')

# Right: Samples per session
session_counts = [len(s) for s in collector.sessions.values()]
session_labels = list(collector.sessions.keys())
axes[1].barh(session_labels, session_counts, color='#3498db', edgecolor='white', linewidth=2)
axes[1].set_title("Training Samples per Session", fontsize=13, fontweight='bold')
axes[1].set_xlabel("Number of Samples")

plt.tight_layout()
plt.show()

print(f"\nüìä Feedback Signal Analysis:")
for sentiment, count in sentiment_counts.items():
    pct = 100 * count / len(sentiments)
    print(f"  {sentiment.capitalize():>8}: {count} ({pct:.0f}%)")

### üìä Visualization: The Next-State Signal Flow

In [None]:
def visualize_next_state_flow(samples, max_samples=5):
    """Visualize how next-state signals connect responses to feedback."""
    fig, ax = plt.subplots(figsize=(14, max(len(samples[:max_samples]) * 2, 4)))

    for i, sample in enumerate(samples[:max_samples]):
        y = len(samples[:max_samples]) - i - 1

        # Draw response box
        resp_text = sample.response[:45].replace('\n', ' ')
        ax.text(0.15, y * 2, f"ü§ñ {resp_text}...", fontsize=9,
                ha='left', va='center',
                bbox=dict(boxstyle="round", facecolor='#3498db', alpha=0.3))

        # Draw feedback box
        fb_text = sample.feedback[:45].replace('\n', ' ')
        sentiment = analyze_feedback_signals([sample])[0]
        fb_color = {"positive": "#2ecc71", "negative": "#e74c3c", "neutral": "#f39c12"}[sentiment]
        ax.text(0.15, y * 2 - 0.7, f"üë§ {fb_text}...", fontsize=9,
                ha='left', va='center',
                bbox=dict(boxstyle="round", facecolor=fb_color, alpha=0.3))

        # Draw arrow
        ax.annotate('next-state\nsignal',
                    xy=(0.13, y * 2 - 0.5), xytext=(0.13, y * 2 - 0.2),
                    arrowprops=dict(arrowstyle='->', color='#e74c3c', lw=2),
                    fontsize=7, ha='center', color='#e74c3c')

    ax.set_xlim(0, 1)
    ax.set_ylim(-1.5, len(samples[:max_samples]) * 2)
    ax.axis('off')
    ax.set_title('Next-State Signal Flow: Response ‚Üí Feedback', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

visualize_next_state_flow(all_samples)

In [None]:
#@title üéß Code Walkthrough: Jsonl
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_13_jsonl.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 8. JSONL Logging ‚Äî Just Like OpenClaw-RL

OpenClaw-RL logs all rollouts to JSONL files for analysis and debugging. Let us implement this:

In [None]:
def samples_to_jsonl(samples, filepath="rollout_log.jsonl"):
    """Export training samples to JSONL format, matching OpenClaw-RL's logging."""
    records = []
    for sample in samples:
        record = {
            "session_id": sample.session_id,
            "response_turn": sample.response_turn_index,
            "feedback_turn": sample.feedback_turn_index,
            "context": sample.context,
            "response": sample.response,
            "feedback": sample.feedback,
            "timestamp": datetime.now().isoformat(),
        }
        records.append(record)

    # Write to JSONL
    with open(filepath, 'w') as f:
        for record in records:
            f.write(json.dumps(record) + '\n')

    print(f"üìù Logged {len(records)} training samples to {filepath}")
    return records

# Export our samples
records = samples_to_jsonl(all_samples)

# Show one record
print("\nSample JSONL record:")
print(json.dumps(records[0], indent=2)[:500])

In [None]:
#@title üéß Code Walkthrough: Final Output
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_14_final_output.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 9. üéØ Final Output: The At-Least-One Guarantee

OpenClaw-RL guarantees that **every session contributes at least one training sample**. Let us verify this property and show our final system in action:

In [None]:
# Test the at-least-one guarantee
def test_at_least_one_guarantee():
    """Verify that every processed session produces at least one sample."""
    test_collector = ConversationRolloutCollector()

    # Even a minimal conversation should produce a sample
    minimal_conversations = [
        [
            {"role": "user", "content": "Hi there!"},
            {"role": "assistant", "content": "Hello! How can I help?"},
            {"role": "user", "content": "Just testing."},
        ],
        [
            {"role": "system", "content": "You are helpful."},
            {"role": "user", "content": "What is 2+2?"},
            {"role": "assistant", "content": "4."},
            {"role": "user", "content": "Correct!"},
        ],
    ]

    all_pass = True
    for i, conv in enumerate(minimal_conversations):
        samples = test_collector.process_conversation(conv, f"minimal_{i}")
        passed = len(samples) >= 1
        emoji = "‚úÖ" if passed else "‚ùå"
        print(f"  {emoji} Conversation {i}: {len(samples)} sample(s)")
        all_pass = all_pass and passed

    return all_pass

print("üîç Testing at-least-one guarantee:")
guarantee_holds = test_at_least_one_guarantee()

if guarantee_holds:
    print("\n‚úÖ At-least-one guarantee holds for all test cases!")
else:
    print("\n‚ùå Guarantee violated ‚Äî some sessions produced no samples.")

In [None]:
# üéØ Final summary visualization
fig, ax = plt.subplots(figsize=(12, 3))

# Pipeline flow
steps = [
    ("Raw\nConversation", "#ecf0f1"),
    ("Turn\nClassification", "#3498db"),
    ("Next-State\nExtraction", "#2ecc71"),
    ("Training\nSamples", "#e74c3c"),
    ("JSONL\nLog", "#9b59b6"),
]

for i, (label, color) in enumerate(steps):
    x = i * 2.2
    rect = mpatches.FancyBboxPatch((x, 0), 1.8, 1.2,
                                     boxstyle="round,pad=0.1",
                                     facecolor=color, alpha=0.7)
    ax.add_patch(rect)
    ax.text(x + 0.9, 0.6, label, ha='center', va='center',
            fontsize=10, fontweight='bold', color='white' if color != '#ecf0f1' else 'black')

    if i < len(steps) - 1:
        ax.annotate('', xy=((i+1)*2.2, 0.6), xytext=(x + 1.8, 0.6),
                    arrowprops=dict(arrowstyle='->', lw=2, color='#34495e'))

ax.set_xlim(-0.3, len(steps) * 2.2)
ax.set_ylim(-0.3, 1.7)
ax.axis('off')
ax.set_title('OpenClaw-RL Rollout Collection Pipeline ‚Äî Built from Scratch!',
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

stats = collector.get_stats()
print(f"\nüéâ Congratulations! You've built the rollout collection pipeline from scratch!")
print(f"   Processed {stats['total_sessions']} sessions ‚Üí {stats['total_samples']} training samples")
print(f"   Average: {stats['avg_samples_per_session']:.1f} samples per session")
print(f"   Every conversation you have with your AI now generates training data! üöÄ")

In [None]:
#@title üéß Wrap-Up: Closing
from IPython.display import Audio, display
import os as _os
_f = "/content/narration/01_15_closing.mp3"
if _os.path.exists(_f):
    display(Audio(_f))
else:
    print("Run the first cell to download narration audio.")


## 10. Reflection and Next Steps

### ü§î Reflection Questions
1. What happens if a user asks a completely new question (changes topic) instead of providing feedback on the assistant's response? Should that be treated as a next-state signal?
2. In OpenClaw-RL, the rollout collector runs **asynchronously** ‚Äî it processes conversations while the model continues serving. What challenges does this introduce?
3. We used simple keyword matching for sentiment classification. In the full system, a PRM (Process Reward Model) handles this. What advantages would a learned model have over keywords?

### üèÜ Optional Challenges
1. **Handle tool calls**: Extend the turn classifier to handle conversations that include tool calls (e.g., the assistant calling a calculator or web search).
2. **Sliding window context**: For very long conversations, implement a sliding window that keeps only the last N turns as context.
3. **Session splitting**: Implement logic to split a single long conversation into multiple sessions when the user changes topic.