Add customizable prompts and per-scenario LLM config overrides (#52) by richardkiene · Pull Request #53 · Liquescent-Development/mcprobe

richardkiene · 2026-01-23T21:15:01Z

Summary

Add the ability to customize judge and synthetic user behavior through:

extra_instructions that augment the default prompts
Per-scenario LLM configuration overrides
Separate LLM configs for judge and synthetic_user

Configuration Priority (highest to lowest)

CLI arguments
Scenario-level config (from scenario YAML)
Component-specific config (judge:, synthetic_user:)
Shared LLM config (llm:)
Defaults

Note: extra_instructions are appended from all levels (not replaced).

Example Usage

Global Config (`mcprobe.yaml`)

llm:
  provider: openai
  model: gpt-4o

judge:
  model: gpt-4o-mini  # Use cheaper model for judging
  extra_instructions: |
    Be lenient about formatting differences.
    Consider partial credit for directionally correct answers.

synthetic_user:
  model: gpt-4o
  extra_instructions: |
    Push back firmly on vague answers.

Per-Scenario Override

name: Complex Financial Analysis
description: Tests multi-step calculations

config:
  judge:
    model: gpt-4o  # Need smarter model for this one
    extra_instructions: |
      This scenario requires exact numerical precision.
  synthetic_user:
    extra_instructions: |
      You are a CFO who expects exact figures.

synthetic_user:
  persona: A CFO reviewing quarterly reports
  initial_query: What was our Q3 revenue growth?

evaluation:
  correctness_criteria:
    - Provides exact revenue figures

Changes

Added extra_instructions field to LLMConfig model
Added ScenarioLLMOverride and ScenarioConfig to scenario model
Added optional config section to TestScenario
Updated resolve_llm_config to handle scenario-level overrides
Updated prompt builders to append extra_instructions
Updated pytest plugin, MCP server, and CLI to pass scenario configs
Added LLMDefaults dataclass to avoid too-many-arguments lint issue

Test plan

All 255 unit tests pass
Ruff linting passes
Mypy type checking passes
Manual testing with scenario overrides

Closes #52

Add the ability to customize judge and synthetic user behavior through: - extra_instructions that augment the default prompts - Per-scenario LLM configuration overrides - Separate LLM configs for judge and synthetic_user (already existed but now properly documented and supported with scenario-level overrides) Configuration priority (highest to lowest): 1. CLI arguments 2. Scenario-level config (from scenario YAML) 3. Component-specific config (judge:, synthetic_user:) 4. Shared LLM config (llm:) 5. Defaults Extra instructions are appended from all levels (not replaced). Example global config (mcprobe.yaml): judge: model: gpt-4o-mini extra_instructions: | Be lenient about formatting differences. Example per-scenario override: config: judge: model: gpt-4o extra_instructions: | This scenario requires exact precision.

Adds documentation for features from PR #53: - Add extra_instructions field to LLMConfig table in configuration reference - Document extra_instructions usage with examples - Add config section to scenario format schema - Document ScenarioConfig and ScenarioLLMOverride fields - Add per-scenario configuration examples - Update complete example to include config section

richardkiene merged commit b44a02a into main Jan 23, 2026
3 checks passed

richardkiene deleted the feature/customizable-prompts-52 branch January 23, 2026 21:41

richardkiene mentioned this pull request Jan 23, 2026

Document extra_instructions and per-scenario config overrides #59

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add customizable prompts and per-scenario LLM config overrides (#52)#53

Add customizable prompts and per-scenario LLM config overrides (#52)#53
richardkiene merged 1 commit into
mainfrom
feature/customizable-prompts-52

richardkiene commented Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

richardkiene commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Configuration Priority (highest to lowest)

Example Usage

Global Config (mcprobe.yaml)

Per-Scenario Override

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

richardkiene commented Jan 23, 2026 •

edited

Loading

Global Config (`mcprobe.yaml`)