Skip to content

feat: add message trace support for LLM generation#272

Merged
johnnygreco merged 1 commit intomainfrom
feat/traces
Jan 30, 2026
Merged

feat: add message trace support for LLM generation#272
johnnygreco merged 1 commit intomainfrom
feat/traces

Conversation

@johnnygreco
Copy link
Contributor

📋 Summary

Add message trace support for LLM generation, allowing users to capture the full conversation history (system/user/assistant messages) during generation. This is essential for debugging, understanding model behavior, and iterating on prompts. Changes pulled from feat/mcp.

🔄 Changes

✨ Added

  • New docs/concepts/traces.md documentation explaining trace functionality
  • ChatMessage dataclass in engine/models/utils.py for representing conversation messages with factory methods (as_user, as_assistant, as_system, as_tool)
  • with_trace field on LLMTextColumnConfig to enable per-column traces
  • debug_override_save_all_column_traces field on RunConfig for global trace enablement

🔧 Changed

  • Renamed REASONING_TRACE_COLUMN_POSTFIXTRACE_COLUMN_POSTFIX (__reasoning_trace__trace)
  • ModelFacade.generate() now returns tuple[Any, list[ChatMessage]] instead of tuple[Any, str | None] to provide full conversation history
  • ModelFacade.completion() now accepts list[ChatMessage] instead of list[dict]
  • Updated llm_completion.py generator to save traces when enabled via config or run config
  • Updated column configs documentation to reflect new trace behavior

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • facade.py - Signature change for generate() method now returns full message trace instead of just reasoning content
  • utils.py - New ChatMessage dataclass that replaces dict-based message handling

🤖 Generated with AI

Add support for capturing full conversation traces during LLM generation,
enabling debugging and fine-tuning dataset creation.

Changes:
- Add `with_trace` field to LLMTextColumnConfig for per-column trace control
- Add `debug_override_save_all_column_traces` to RunConfig for global trace
- Introduce ChatMessage dataclass for structured message representation
- Update ModelFacade.generate() to return full message trace
- Rename trace column postfix from `__reasoning_trace` to `__trace`
- Add comprehensive traces documentation

Traces capture system/user/assistant messages in order, enabling visibility
into the full generation conversation including correction retries.
@johnnygreco johnnygreco requested a review from a team as a code owner January 30, 2026 18:49
@greptile-apps
Copy link

greptile-apps bot commented Jan 30, 2026

Greptile Overview

Greptile Summary

This PR adds comprehensive message trace support for LLM generation, enabling users to capture the full conversation history (system/user/assistant messages with reasoning content) during generation. This is valuable for debugging, prompt iteration, and understanding model behavior.

Key Changes

  • New ChatMessage dataclass (utils.py): Replaces dict-based message handling with a strongly-typed dataclass featuring factory methods (as_user, as_assistant, as_system, as_tool) and proper handling of reasoning content
  • Signature change (facade.py): ModelFacade.generate() now returns tuple[Any, list[ChatMessage]] instead of tuple[Any, str | None], providing the full conversation trace instead of just reasoning content
  • Per-column trace control: New with_trace field on LLMTextColumnConfig enables trace capture for specific columns
  • Global debug override: New debug_override_save_all_column_traces field on RunConfig enables traces for all LLM columns
  • Trace column naming: Renamed from __reasoning_trace to __trace to reflect broader scope
  • Comprehensive documentation: New docs/concepts/traces.md with clear examples of trace structure for simple and retry scenarios

Critical Issue Found

Logic bug in facade.py: curr_num_correction_steps is incremented unconditionally before parsing (line 211), causing an off-by-one error. When max_correction_steps=0, the first failure incorrectly skips the correction branch because 1 <= 0 evaluates to False. This breaks the retry logic entirely when correction steps are disabled.

Additional Notes

  • Tests have been updated to handle ChatMessage objects and include basic trace functionality coverage
  • The refactoring from dicts to ChatMessage is clean and improves type safety
  • Documentation is thorough and includes helpful examples of both simple generation and retry scenarios

Confidence Score: 2/5

  • This PR contains a critical logic bug that breaks retry functionality
  • The off-by-one error in correction step counting (line 211 of facade.py) is a blocking issue that will cause incorrect behavior in production. The increment happens before attempting to parse, meaning the first attempt starts at 1 instead of 0, breaking the comparison logic for retry limits.
  • Pay close attention to packages/data-designer-engine/src/data_designer/engine/models/facade.py (contains critical logic bug in retry loop)

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/models/utils.py Introduces ChatMessage dataclass with factory methods to replace dict-based message handling; clean refactoring with comprehensive docstrings
packages/data-designer-engine/src/data_designer/engine/models/facade.py Changes return type to include full message trace and updates message handling; has critical logic bug in correction step counting
packages/data-designer-engine/src/data_designer/engine/column_generators/generators/llm_completion.py Updates to serialize full message traces when enabled via config; correctly handles both per-column and global trace settings
packages/data-designer-config/src/data_designer/config/column_configs.py Adds with_trace field to LLMTextColumnConfig and renames trace column postfix; documentation updates accurately reflect new behavior

Sequence Diagram

sequenceDiagram
    participant User
    participant LLMGenerator as LLM Column Generator
    participant Facade as ModelFacade
    participant LiteLLM as LiteLLM Router
    participant Dataset

    User->>LLMGenerator: generate(data)
    LLMGenerator->>LLMGenerator: render prompt template
    LLMGenerator->>Facade: generate(prompt, parser, system_prompt)
    
    Note over Facade: Build initial messages
    Facade->>Facade: prompt_to_messages()
    Facade->>Facade: Create ChatMessage objects
    
    loop Until valid or max retries
        Facade->>Facade: completion(messages)
        Facade->>LiteLLM: completion(model, message_dicts)
        LiteLLM-->>Facade: ModelResponse
        Facade->>Facade: Extract content + reasoning_content
        Facade->>Facade: Append ChatMessage.as_assistant()
        
        alt Parser succeeds
            Facade->>Facade: parser(response)
            Facade-->>LLMGenerator: (parsed_obj, trace)
        else Parser fails
            Facade->>Facade: Check retry limits
            alt Can retry
                Facade->>Facade: Append ChatMessage.as_user(error)
                Note over Facade: Continue loop for correction
            else Max retries reached
                Facade-->>LLMGenerator: Raise GenerationValidationFailureError
            end
        end
    end
    
    alt with_trace enabled or debug override
        LLMGenerator->>LLMGenerator: Serialize trace to dicts
        LLMGenerator->>Dataset: Store {name}__trace column
    end
    
    LLMGenerator->>Dataset: Store {name} column
    LLMGenerator-->>User: data with generated columns
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@johnnygreco
Copy link
Contributor Author

Claude is happy with the extraction @nabinchha 👇

Review: feat/traces Branch vs feat/mcp

Summary of Extraction

The feat/traces branch successfully extracts the foundational trace infrastructure from feat/mcp. This is a clean separation: traces provide the message history capture mechanism, while MCP adds tool-calling capabilities on top.

Branch Scope Files Changed Lines
feat/traces Core trace/ChatMessage infrastructure 14 files +372/-125
feat/mcp Traces + MCP tool-use + CLI + docs 85 files +12,345/-428

What feat/traces Includes

1. ChatMessage Dataclass (utils.py)

New ChatMessage dataclass replaces raw dictionaries:

@dataclass
class ChatMessage:
    role: Literal["user", "assistant", "system", "tool"]
    content: str | list[dict[str, Any]] = ""
    reasoning_content: str | None = None
    tool_calls: list[dict[str, Any]] = field(default_factory=list)
    tool_call_id: str | None = None

Includes factory methods: as_user(), as_assistant(), as_system(), as_tool()

2. Updated ModelFacade.generate() (facade.py)

  • Changed return type: tuple[Any, str | None] -> tuple[Any, list[ChatMessage]]
  • Now returns full conversation trace instead of just reasoning
  • Messages internally use ChatMessage objects

3. Column Config Changes (column_configs.py)

  • Added with_trace: bool = False field to LLMTextColumnConfig
  • Renamed constant: REASONING_TRACE_COLUMN_POSTFIX -> TRACE_COLUMN_POSTFIX ("__trace")
  • Updated side_effect_columns to return {name}__trace

4. RunConfig Addition (run_config.py)

  • Added debug_override_save_all_column_traces: bool = False
  • Enables traces for ALL LLM columns globally (debug mode)

5. Generator Updates (llm_completion.py)

  • Conditional trace saving based on with_trace or debug_override_save_all_column_traces
  • Serializes trace as [message.to_dict() for message in trace]

6. Documentation (traces.md)

New comprehensive trace documentation covering:

  • Enabling traces per-column and globally
  • Trace data structure (roles, fields)
  • Example traces (simple and with correction retries)

What Remains in feat/mcp (Not in feat/traces)

MCP Infrastructure

  • packages/data-designer-engine/src/data_designer/engine/mcp/ (facade, registry, io, factory, errors)
  • MCP provider and tool configuration system
  • Tool schemas and execution

CLI Extensions

  • dd list mcp-providers and dd list tools commands
  • MCP provider and tool builder forms
  • Controllers, services, repositories for MCP management

ModelFacade Tool Calling

  • tool_alias parameter in generate()
  • mcp_registry parameter in __init__
  • Tool call processing loop with turn limits
  • _get_mcp_facade() method

Column Config Extensions

  • tool_alias: str | None = None field on LLM columns

Extended Documentation

  • Tool use examples in traces.md
  • MCP concept docs (docs/concepts/mcp/, docs/concepts/tool_use_and_mcp.md)
  • MCP recipes

Compatibility Analysis

feat/traces is Forward-Compatible

The ChatMessage dataclass already includes fields needed for MCP:

  • tool_calls: list[dict[str, Any]] - For assistant tool requests
  • tool_call_id: str | None - For tool response linking
  • as_tool() factory method - For creating tool messages

This means feat/mcp can be cleanly rebased onto main after feat/traces merges.

Differences to Reconcile After Merge

File feat/traces feat/mcp adds
column_configs.py with_trace field tool_alias field
run_config.py debug_override_save_all_column_traces (same)
facade.py ChatMessage usage, trace return mcp_registry, tool_alias, tool loop
llm_completion.py Conditional trace saving tool_alias passthrough
traces.md Basic trace examples Tool use examples

Review Findings

Correctness

  • The extraction is clean and self-contained
  • All trace functionality works independently of MCP
  • Tests updated appropriately for new ChatMessage type

Potential Issues to Verify

  1. Test coverage: The test_facade.py changes look correct, but verify reasoning trace handling still works
  2. Backward compatibility: Existing code expecting (output, reasoning_trace_str) tuple needs migration

Documentation Quality

  • traces.md is comprehensive but omits "correction retry" example that feat/mcp has
  • feat/traces uses simpler correction example; feat/mcp adds tool use examples

@johnnygreco
Copy link
Contributor Author

@nabinchha – all tutorial notebooks and example recipes ran successfully

@johnnygreco johnnygreco reopened this Jan 30, 2026
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

reasoning_trace = reasoning_trace.strip()

messages.append(ChatMessage.as_assistant(content=response, reasoning_content=reasoning_trace or None))
curr_num_correction_steps += 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curr_num_correction_steps is incremented unconditionally before attempting parsing, causing an off-by-one error. On the first attempt, this becomes 1 immediately, so when max_correction_steps=0, the condition on line 222 (curr_num_correction_steps <= max_correction_steps) evaluates to 1 <= 0 = False, incorrectly skipping correction.

Move the increment after the try/except block, only incrementing when retrying:

Suggested change
curr_num_correction_steps += 1
messages.append(ChatMessage.as_assistant(content=response, reasoning_content=reasoning_trace or None))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your boss disagrees:


I'll search for the code mentioned in this feedback to evaluate its accuracy.

[2 tools called]

Let me also look for any tests related to this function to understand the intended behavior.

[2 tools called]

Based on my analysis, the feedback is incorrect. The current implementation is correct and the tests confirm this.

Analysis

Let me trace through the current code behavior:

Current code logic (lines 198-235 in facade.py):

curr_num_correction_steps = 0
while True:
    completion_response = self.completion(...)  # Make LLM call
    curr_num_correction_steps += 1  # Incremented BEFORE parsing
    
    try:
        output_obj = parser(response)
        break
    except ParserException as exc:
        if curr_num_correction_steps <= max_correction_steps:
            # Attempt correction

Test expectations from test_facade.py (lines 66-75):

@pytest.mark.parametrize(
    "max_correction_steps,max_conversation_restarts,total_calls",
    [
        (0, 0, 1),   # 0 corrections, 0 restarts → 1 call
        (5, 0, 6),   # 5 corrections, 0 restarts → 6 calls (initial + 5)
        (0, 5, 6),   # 0 corrections, 5 restarts → 6 calls (1 per restart)
    ],
)

Tracing max_correction_steps=5:

  • Call 1: curr=1, fails, 1 <= 5 → correct
  • Call 2: curr=2, fails, 2 <= 5 → correct
  • ...
  • Call 5: curr=5, fails, 5 <= 5 → correct
  • Call 6: curr=6, fails, 6 <= 5stop (raises error)
  • Total: 6 calls (matches test ✓)

Tracing max_correction_steps=0:

  • Call 1: curr=1, fails, 1 <= 0skip correction (go to restart or fail)
  • Total: 1 call per restart window (matches test ✓)

Why the feedback is wrong

The feedback claims "incorrectly skipping correction" when max_correction_steps=0. However:

  1. max_correction_steps=0 semantically means "no corrections allowed" - skipping is the CORRECT behavior
  2. The tests explicitly verify this - (0, 5, 6) expects 6 total calls with 0 corrections and 5 restarts (1 call per window, no corrections)
  3. max_correction_steps=N means N corrections after the initial attempt - with N=5, we get 6 total calls (initial + 5 corrections)

The variable curr_num_correction_steps tracks the number of attempts in the current conversation window, and the check curr_num_correction_steps <= max_correction_steps correctly allows exactly max_correction_steps additional attempts after the initial one.

Conclusion

The current implementation is correct - there is no off-by-one error. The feedback author appears to have misunderstood that max_correction_steps=0 should disable corrections entirely, which is exactly what the current code does.

@johnnygreco johnnygreco merged commit 0d51539 into main Jan 30, 2026
92 checks passed
johnnygreco added a commit that referenced this pull request Jan 30, 2026
Resolve conflicts between feat/traces PR (#272) and feat/mcp:
- column_configs.py: Keep both with_trace and tool_alias fields
- run_config.py: Include tool messages in trace description
- facade.py: Keep MCP tool calling loop with trace support
- traces.md: Include tool use examples alongside basic traces
- columns.md: Document both trace and tool_alias features
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants