feat: add message trace support for LLM generation by johnnygreco · Pull Request #272 · NVIDIA-NeMo/DataDesigner

johnnygreco · 2026-01-30T18:49:42Z

📋 Summary

Add message trace support for LLM generation, allowing users to capture the full conversation history (system/user/assistant messages) during generation. This is essential for debugging, understanding model behavior, and iterating on prompts. Changes pulled from feat/mcp.

🔄 Changes

✨ Added

New docs/concepts/traces.md documentation explaining trace functionality
ChatMessage dataclass in engine/models/utils.py for representing conversation messages with factory methods (as_user, as_assistant, as_system, as_tool)
with_trace field on LLMTextColumnConfig to enable per-column traces
debug_override_save_all_column_traces field on RunConfig for global trace enablement

🔧 Changed

Renamed REASONING_TRACE_COLUMN_POSTFIX → TRACE_COLUMN_POSTFIX (__reasoning_trace → __trace)
ModelFacade.generate() now returns tuple[Any, list[ChatMessage]] instead of tuple[Any, str | None] to provide full conversation history
ModelFacade.completion() now accepts list[ChatMessage] instead of list[dict]
Updated llm_completion.py generator to save traces when enabled via config or run config
Updated column configs documentation to reflect new trace behavior

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

facade.py - Signature change for generate() method now returns full message trace instead of just reasoning content
utils.py - New ChatMessage dataclass that replaces dict-based message handling

🤖 Generated with AI

Add support for capturing full conversation traces during LLM generation, enabling debugging and fine-tuning dataset creation. Changes: - Add `with_trace` field to LLMTextColumnConfig for per-column trace control - Add `debug_override_save_all_column_traces` to RunConfig for global trace - Introduce ChatMessage dataclass for structured message representation - Update ModelFacade.generate() to return full message trace - Rename trace column postfix from `__reasoning_trace` to `__trace` - Add comprehensive traces documentation Traces capture system/user/assistant messages in order, enabling visibility into the full generation conversation including correction retries.

greptile-apps · 2026-01-30T18:53:09Z

Greptile Overview

Greptile Summary

This PR adds comprehensive message trace support for LLM generation, enabling users to capture the full conversation history (system/user/assistant messages with reasoning content) during generation. This is valuable for debugging, prompt iteration, and understanding model behavior.

Key Changes

New ChatMessage dataclass (utils.py): Replaces dict-based message handling with a strongly-typed dataclass featuring factory methods (as_user, as_assistant, as_system, as_tool) and proper handling of reasoning content
Signature change (facade.py): ModelFacade.generate() now returns tuple[Any, list[ChatMessage]] instead of tuple[Any, str | None], providing the full conversation trace instead of just reasoning content
Per-column trace control: New with_trace field on LLMTextColumnConfig enables trace capture for specific columns
Global debug override: New debug_override_save_all_column_traces field on RunConfig enables traces for all LLM columns
Trace column naming: Renamed from __reasoning_trace to __trace to reflect broader scope
Comprehensive documentation: New docs/concepts/traces.md with clear examples of trace structure for simple and retry scenarios

Critical Issue Found

Logic bug in facade.py: curr_num_correction_steps is incremented unconditionally before parsing (line 211), causing an off-by-one error. When max_correction_steps=0, the first failure incorrectly skips the correction branch because 1 <= 0 evaluates to False. This breaks the retry logic entirely when correction steps are disabled.

Additional Notes

Tests have been updated to handle ChatMessage objects and include basic trace functionality coverage
The refactoring from dicts to ChatMessage is clean and improves type safety
Documentation is thorough and includes helpful examples of both simple generation and retry scenarios

Confidence Score: 2/5

This PR contains a critical logic bug that breaks retry functionality
The off-by-one error in correction step counting (line 211 of facade.py) is a blocking issue that will cause incorrect behavior in production. The increment happens before attempting to parse, meaning the first attempt starts at 1 instead of 0, breaking the comparison logic for retry limits.
Pay close attention to packages/data-designer-engine/src/data_designer/engine/models/facade.py (contains critical logic bug in retry loop)

Important Files Changed

Filename	Overview
packages/data-designer-engine/src/data_designer/engine/models/utils.py	Introduces `ChatMessage` dataclass with factory methods to replace dict-based message handling; clean refactoring with comprehensive docstrings
packages/data-designer-engine/src/data_designer/engine/models/facade.py	Changes return type to include full message trace and updates message handling; has critical logic bug in correction step counting
packages/data-designer-engine/src/data_designer/engine/column_generators/generators/llm_completion.py	Updates to serialize full message traces when enabled via config; correctly handles both per-column and global trace settings
packages/data-designer-config/src/data_designer/config/column_configs.py	Adds `with_trace` field to `LLMTextColumnConfig` and renames trace column postfix; documentation updates accurately reflect new behavior

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMGenerator as LLM Column Generator
    participant Facade as ModelFacade
    participant LiteLLM as LiteLLM Router
    participant Dataset

    User->>LLMGenerator: generate(data)
    LLMGenerator->>LLMGenerator: render prompt template
    LLMGenerator->>Facade: generate(prompt, parser, system_prompt)
    
    Note over Facade: Build initial messages
    Facade->>Facade: prompt_to_messages()
    Facade->>Facade: Create ChatMessage objects
    
    loop Until valid or max retries
        Facade->>Facade: completion(messages)
        Facade->>LiteLLM: completion(model, message_dicts)
        LiteLLM-->>Facade: ModelResponse
        Facade->>Facade: Extract content + reasoning_content
        Facade->>Facade: Append ChatMessage.as_assistant()
        
        alt Parser succeeds
            Facade->>Facade: parser(response)
            Facade-->>LLMGenerator: (parsed_obj, trace)
        else Parser fails
            Facade->>Facade: Check retry limits
            alt Can retry
                Facade->>Facade: Append ChatMessage.as_user(error)
                Note over Facade: Continue loop for correction
            else Max retries reached
                Facade-->>LLMGenerator: Raise GenerationValidationFailureError
            end
        end
    end
    
    alt with_trace enabled or debug override
        LLMGenerator->>LLMGenerator: Serialize trace to dicts
        LLMGenerator->>Dataset: Store {name}__trace column
    end
    
    LLMGenerator->>Dataset: Store {name} column
    LLMGenerator-->>User: data with generated columns

greptile-apps

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

johnnygreco · 2026-01-30T19:09:40Z

Claude is happy with the extraction @nabinchha 👇

Review: feat/traces Branch vs feat/mcp

Summary of Extraction

The feat/traces branch successfully extracts the foundational trace infrastructure from feat/mcp. This is a clean separation: traces provide the message history capture mechanism, while MCP adds tool-calling capabilities on top.

Branch	Scope	Files Changed	Lines
`feat/traces`	Core trace/ChatMessage infrastructure	14 files	+372/-125
`feat/mcp`	Traces + MCP tool-use + CLI + docs	85 files	+12,345/-428

What feat/traces Includes

1. ChatMessage Dataclass (utils.py)

New ChatMessage dataclass replaces raw dictionaries:

@dataclass
class ChatMessage:
    role: Literal["user", "assistant", "system", "tool"]
    content: str | list[dict[str, Any]] = ""
    reasoning_content: str | None = None
    tool_calls: list[dict[str, Any]] = field(default_factory=list)
    tool_call_id: str | None = None

Includes factory methods: as_user(), as_assistant(), as_system(), as_tool()

2. Updated ModelFacade.generate() (facade.py)

Changed return type: tuple[Any, str | None] -> tuple[Any, list[ChatMessage]]
Now returns full conversation trace instead of just reasoning
Messages internally use ChatMessage objects

3. Column Config Changes (column_configs.py)

Added with_trace: bool = False field to LLMTextColumnConfig
Renamed constant: REASONING_TRACE_COLUMN_POSTFIX -> TRACE_COLUMN_POSTFIX ("__trace")
Updated side_effect_columns to return {name}__trace

4. RunConfig Addition (run_config.py)

Added debug_override_save_all_column_traces: bool = False
Enables traces for ALL LLM columns globally (debug mode)

5. Generator Updates (llm_completion.py)

Conditional trace saving based on with_trace or debug_override_save_all_column_traces
Serializes trace as [message.to_dict() for message in trace]

6. Documentation (traces.md)

New comprehensive trace documentation covering:

Enabling traces per-column and globally
Trace data structure (roles, fields)
Example traces (simple and with correction retries)

What Remains in feat/mcp (Not in feat/traces)

MCP Infrastructure

packages/data-designer-engine/src/data_designer/engine/mcp/ (facade, registry, io, factory, errors)
MCP provider and tool configuration system
Tool schemas and execution

CLI Extensions

dd list mcp-providers and dd list tools commands
MCP provider and tool builder forms
Controllers, services, repositories for MCP management

ModelFacade Tool Calling

tool_alias parameter in generate()
mcp_registry parameter in __init__
Tool call processing loop with turn limits
_get_mcp_facade() method

Column Config Extensions

tool_alias: str | None = None field on LLM columns

Extended Documentation

Tool use examples in traces.md
MCP concept docs (docs/concepts/mcp/, docs/concepts/tool_use_and_mcp.md)
MCP recipes

Compatibility Analysis

feat/traces is Forward-Compatible

The ChatMessage dataclass already includes fields needed for MCP:

tool_calls: list[dict[str, Any]] - For assistant tool requests
tool_call_id: str | None - For tool response linking
as_tool() factory method - For creating tool messages

This means feat/mcp can be cleanly rebased onto main after feat/traces merges.

Differences to Reconcile After Merge

File	feat/traces	feat/mcp adds
`column_configs.py`	`with_trace` field	`tool_alias` field
`run_config.py`	`debug_override_save_all_column_traces`	(same)
`facade.py`	`ChatMessage` usage, trace return	`mcp_registry`, `tool_alias`, tool loop
`llm_completion.py`	Conditional trace saving	`tool_alias` passthrough
`traces.md`	Basic trace examples	Tool use examples

Review Findings

Correctness

The extraction is clean and self-contained
All trace functionality works independently of MCP
Tests updated appropriately for new ChatMessage type

Potential Issues to Verify

Test coverage: The test_facade.py changes look correct, but verify reasoning trace handling still works
Backward compatibility: Existing code expecting (output, reasoning_trace_str) tuple needs migration

Documentation Quality

traces.md is comprehensive but omits "correction retry" example that feat/mcp has
feat/traces uses simpler correction example; feat/mcp adds tool use examples

johnnygreco · 2026-01-30T19:09:59Z

@nabinchha – all tutorial notebooks and example recipes ran successfully

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-30T19:14:29Z

packages/data-designer-engine/src/data_designer/engine/models/facade.py

-                reasoning_trace = reasoning_trace.strip()
-
+            messages.append(ChatMessage.as_assistant(content=response, reasoning_content=reasoning_trace or None))
            curr_num_correction_steps += 1


curr_num_correction_steps is incremented unconditionally before attempting parsing, causing an off-by-one error. On the first attempt, this becomes 1 immediately, so when max_correction_steps=0, the condition on line 222 (curr_num_correction_steps <= max_correction_steps) evaluates to 1 <= 0 = False, incorrectly skipping correction.

Move the increment after the try/except block, only incrementing when retrying:

Suggested change

curr_num_correction_steps += 1

messages.append(ChatMessage.as_assistant(content=response, reasoning_content=reasoning_trace or None))

Your boss disagrees:

I'll search for the code mentioned in this feedback to evaluate its accuracy.

[2 tools called]

Let me also look for any tests related to this function to understand the intended behavior.

[2 tools called]

Based on my analysis, the feedback is incorrect. The current implementation is correct and the tests confirm this.

Analysis

Let me trace through the current code behavior:

Current code logic (lines 198-235 in facade.py):

curr_num_correction_steps = 0 while True: completion_response = self.completion(...) # Make LLM call curr_num_correction_steps += 1 # Incremented BEFORE parsing try: output_obj = parser(response) break except ParserException as exc: if curr_num_correction_steps <= max_correction_steps: # Attempt correction

Test expectations from test_facade.py (lines 66-75):

@pytest.mark.parametrize( "max_correction_steps,max_conversation_restarts,total_calls", [ (0, 0, 1), # 0 corrections, 0 restarts → 1 call (5, 0, 6), # 5 corrections, 0 restarts → 6 calls (initial + 5) (0, 5, 6), # 0 corrections, 5 restarts → 6 calls (1 per restart) ], )

Tracing max_correction_steps=5:

Call 1: curr=1, fails, 1 <= 5 → correct

Call 2: curr=2, fails, 2 <= 5 → correct

...

Call 5: curr=5, fails, 5 <= 5 → correct

Call 6: curr=6, fails, 6 <= 5 → stop (raises error)

Total: 6 calls (matches test ✓)

Tracing max_correction_steps=0:

Call 1: curr=1, fails, 1 <= 0 → skip correction (go to restart or fail)

Total: 1 call per restart window (matches test ✓)

Why the feedback is wrong

The feedback claims "incorrectly skipping correction" when max_correction_steps=0. However:

max_correction_steps=0 semantically means "no corrections allowed" - skipping is the CORRECT behavior

The tests explicitly verify this - (0, 5, 6) expects 6 total calls with 0 corrections and 5 restarts (1 call per window, no corrections)

max_correction_steps=N means N corrections after the initial attempt - with N=5, we get 6 total calls (initial + 5 corrections)

The variable curr_num_correction_steps tracks the number of attempts in the current conversation window, and the check curr_num_correction_steps <= max_correction_steps correctly allows exactly max_correction_steps additional attempts after the initial one.

Conclusion

The current implementation is correct - there is no off-by-one error. The feedback author appears to have misunderstood that max_correction_steps=0 should disable corrections entirely, which is exactly what the current code does.

Resolve conflicts between feat/traces PR (#272) and feat/mcp: - column_configs.py: Keep both with_trace and tool_alias fields - run_config.py: Include tool messages in trace description - facade.py: Keep MCP tool calling loop with trace support - traces.md: Include tool use examples alongside basic traces - columns.md: Document both trace and tool_alias features

johnnygreco requested a review from a team as a code owner January 30, 2026 18:49

greptile-apps bot reviewed Jan 30, 2026

View reviewed changes

johnnygreco closed this Jan 30, 2026

johnnygreco reopened this Jan 30, 2026

greptile-apps bot reviewed Jan 30, 2026

View reviewed changes

nabinchha approved these changes Jan 30, 2026

View reviewed changes

johnnygreco merged commit 0d51539 into main Jan 30, 2026
92 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add message trace support for LLM generation#272

feat: add message trace support for LLM generation#272
johnnygreco merged 1 commit intomainfrom
feat/traces

johnnygreco commented Jan 30, 2026

Uh oh!

greptile-apps bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

johnnygreco commented Jan 30, 2026

Uh oh!

johnnygreco commented Jan 30, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 30, 2026

Uh oh!

johnnygreco Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	curr_num_correction_steps += 1
	messages.append(ChatMessage.as_assistant(content=response, reasoning_content=reasoning_trace or None))

Conversation

johnnygreco commented Jan 30, 2026

📋 Summary

🔄 Changes

✨ Added

🔧 Changed

🔍 Attention Areas

Uh oh!

greptile-apps bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Key Changes

Critical Issue Found

Additional Notes

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

johnnygreco commented Jan 30, 2026

Review: feat/traces Branch vs feat/mcp

Summary of Extraction

What feat/traces Includes

1. ChatMessage Dataclass (utils.py)

2. Updated ModelFacade.generate() (facade.py)

3. Column Config Changes (column_configs.py)

4. RunConfig Addition (run_config.py)

5. Generator Updates (llm_completion.py)

6. Documentation (traces.md)

What Remains in feat/mcp (Not in feat/traces)

MCP Infrastructure

CLI Extensions

ModelFacade Tool Calling

Column Config Extensions

Extended Documentation

Compatibility Analysis

feat/traces is Forward-Compatible

Differences to Reconcile After Merge

Review Findings

Correctness

Potential Issues to Verify

Documentation Quality

Uh oh!

johnnygreco commented Jan 30, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

johnnygreco Jan 30, 2026

Choose a reason for hiding this comment

Analysis

Why the feedback is wrong

Conclusion

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Jan 30, 2026 •

edited

Loading