Skip to content

Dspy playground#28

Merged
abossard merged 81 commits intomainfrom
dspy-playground
Mar 25, 2026
Merged

Dspy playground#28
abossard merged 81 commits intomainfrom
dspy-playground

Conversation

@abossard
Copy link
Copy Markdown
Owner

This pull request introduces significant improvements to the repository's onboarding, documentation, and developer experience, especially for LLM-powered features and CSV ticket data workflows. The changes clarify project boundaries, add environment and VS Code setup guides, and provide more comprehensive instructions for LLM and KBA Drafter integration. The most important changes are grouped below.

Documentation and Onboarding Improvements:

  • Expanded the README.md with detailed instructions for usecase demo development, KBA Drafter setup, OpenAI API key configuration, and a new DSPy prompt tuning notebook section. Added links to new and existing documentation for LLM, KBA, and CSV AI workflows. [1] [2] [3] [4]
  • Added a comprehensive .env.example file with explanations for LLM, OpenAI, and KBA Drafter environment variables, including future SharePoint and ITSM adapters.
  • Added CLAUDE.md with clear setup, run, test, and architecture instructions for Claude Code, covering both backend and frontend workflows, LLM configuration, and testing strategies.
  • Rewrote .github/copilot-instructions.md to clarify the repository's learning-focused scope, CSV data structure, and boundaries for Copilot/AI contributions, emphasizing what should and should not be changed.

Developer Experience:

  • Added .vscode/extensions.json and .vscode/launch.json to recommend key extensions and provide launch configurations for backend, frontend, and notebook workflows, supporting full stack and agentic development. [1] [2]

Environment and Tooling:

  • Updated .claude/settings.local.json and .dockerignore to align with the .venv Python environment, ensuring consistent activation and Docker context. [1] [2]
  • Added .gitattributes to strip outputs from Jupyter notebooks and enable proper notebook diffs, supporting clean version control for notebook-based workflows.

These changes collectively make the repository easier to set up, contribute to, and use for LLM-powered CSV ticket analysis and KBA generation.

Copilot AI and others added 30 commits November 19, 2025 08:14
Co-authored-by: abossard <86611+abossard@users.noreply.github.com>
…ted On"

Co-authored-by: abossard <86611+abossard@users.noreply.github.com>
…Node 20 LTS

Co-authored-by: abossard <86611+abossard@users.noreply.github.com>
- Change virtual environment creation to use `.venv` at the repo root
- Update activation commands in various documentation files
- Modify setup and start scripts to reflect new virtual environment structure
- Ensure consistency across installation guides and troubleshooting documentation
- Added `httpx` dependency for async HTTP requests to Ollama API.
- Implemented OllamaChat component in frontend for user interaction with the LLM.
- Created backend service for handling chat requests and model listing.
- Updated setup scripts to check for Ollama installation and pull required models.
- Added API endpoints for chat and model listing in the backend.
- Implemented end-to-end tests for Ollama integration, covering model listing and chat functionality.
- Enhanced error handling and user feedback in the chat interface.
* feat: Implement MCP JSON-RPC 2.0 handler and refactor API decorators

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* fix: Update API decorators for optional HTTP path and clean up imports
docs: Enhance LEVEL_UP.md with Copilot chat testing instructions

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Migrate task management to SQLModel ORM and update related documentation

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Implement LangGraph agent with Azure OpenAI integration and extend API decorators for tool conversion

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor AgentService to use OpenAI SDK and enhance tool integration

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor AgentService to integrate LangGraph and replace OpenAI SDK usage

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* fix: Correct docstring formatting in tool_wrapper for consistency

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Update documentation structure for Day 4 lessons and announcements

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Signed-off-by: Andre Bossard <anbossar@microsoft.com>
* refactor: update Azure OpenAI configuration and streamline environment variable usage

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: integrate FastMCP client for external tool support and add tests

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: implement Ticket MCP integration with FastMCP client and add REST endpoints for ticket management

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
* feat: Add QA tickets management with new TicketList component and API integration

* feat: Add initial diagram for project planning in explain.drawio

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add RULES.md to document project guidelines

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add ticket models and reminder functionality for "Assigned without Assignee"

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: Rearrange imports and enhance startup logging for REST API and MCP JSON-RPC

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance ticket handling by adding mapping functions and updating QA tickets endpoint

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add TicketsWithoutAnAssignee component to display unassigned tickets

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: Clean up code formatting and improve ticket handling in various components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Refactor Ollama integration to use Azure OpenAI agent; remove OllamaChat component and related API calls, add AgentChat component for task management; update frontend routing and backend operations accordingly.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance AgentService with detailed logging for MCP tool calls and agent execution

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
…ariable handling in agents.py (#11)

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
* feat: Implement CSV Ticket Viewer

- Refactor App component to replace existing features with CSV Ticket Table.
- Add CSVTicketTable component for displaying tickets from CSV data source.
- Introduce API functions for fetching CSV ticket fields, tickets, and statistics.
- Create CSV data source in backend to handle loading and processing of CSV files.
- Enhance AgentChat component to display error details from API responses.
- Update styles and layout for improved user experience in ticket viewing.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: Update import formatting and enhance status badge display in CSVTicketTable component

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add Nivo chart visualizations for CSV tickets and enhance documentation

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
* refactor: update configuration from Azure OpenAI to OpenAI and enhance agent service initialization

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: update AgentChat component for OpenAI integration and enhance markdown support

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
- Added backend orchestration for usecase demo agent runs in `usecase_demo.py`.
- Created documentation for CSV ticket guidance in `CSV_AI_GUIDANCE.md`.
- Developed frontend components for usecase demo description and page in `UsecaseDemoDescription.jsx` and `UsecaseDemoPage.jsx`.
- Introduced demo definitions for usecase demos in `demoDefinitions.js`.
- Implemented result views for structured table and markdown in `resultViews.jsx`.
- Added utility functions for handling usecase demo runs in `usecaseDemoUtils.js`.
- Included a network diagram in `net.drawio`.
* feat: Enhance SLA Breach Risk functionality and UI integration
- Increased max_length for agent prompt to 5000
- Added fields parameter to list and search tickets for selective data retrieval
- Updated timeout for usecase demo agent to 300 seconds
- Introduced SLA Breach Risk demo with detailed prompt and ticket analysis
- Added E2E tests for SLA Breach Risk demo page

* feat: add incident_id field to ticket model and related components

- Added incident_id to the ticket mapping in app.py.
- Updated csv_data.py to include incident_id when converting CSV rows to tickets.
- Modified operations.py to define incident_id as a CSV ticket field.
- Enhanced the Ticket model in tickets.py to include incident_id.
- Updated usecase_demo.py to accommodate changes in ticket structure.
- Modified CSVTicketTable.jsx to display incident_id in the ticket table.
- Updated TicketList.jsx to filter and display incident_id in the ticket list.
- Enhanced TicketsWithoutAnAssignee.jsx to include incident_id in ticket operations.
- Updated UsecaseDemoPage.jsx to pass matchingTickets to the render function.
- Enhanced demoDefinitions.js to improve prompts for use case demos.
- Added SLA Breach Overview result view in resultViews.jsx to visualize SLA status of tickets.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: clean up import statements across multiple components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: standardize import statement formatting in resultViews.jsx

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: add SLA breach reporting functionality and related API endpoints

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: implement SLA breach report retrieval for unassigned tickets

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
…ig.js (#16)

Co-authored-by: luca Spring <luca.spring@bit.admin.ch>
* feat: Implement Tool Registry and Workbench Integration

- Added ToolRegistry class to manage LangChain StructuredTool instances.
- Created workbench_integration.py to wire tools into the Agent Workbench.
- Developed WorkbenchPage component for agent management in the frontend.
- Implemented backend tests for tool registration and agent operations.
- Added end-to-end tests for agent creation and deletion in the UI.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor Agent Workbench to Agent Fabric and enhance tool metadata handling

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add required input handling to agent definitions and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance Markdown output handling in agent workflow and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
* feat: Implement Tool Registry and Workbench Integration

- Added ToolRegistry class to manage LangChain StructuredTool instances.
- Created workbench_integration.py to wire tools into the Agent Workbench.
- Developed WorkbenchPage component for agent management in the frontend.
- Implemented backend tests for tool registration and agent operations.
- Added end-to-end tests for agent creation and deletion in the UI.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Refactor Agent Workbench to Agent Fabric and enhance tool metadata handling

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add required input handling to agent definitions and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance Markdown output handling in agent workflow and update UI components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Enhance ticket handling by adding incident ID support and improve UI components for better user experience

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add tool invocation logging with latency tracking in WorkbenchService

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
* kba-draft implementiert

* - test dateien entfernt
- struktur aufgeräumt
- README.md angepasst
- learning_mechanism.md plan erstellt
- desing fixes

* feat: add search questions generation with database migration and UI

Database & Backend:
- Add search_questions column migration in operations.py (ALTER TABLE for existing databases)
- Add /api/kba/drafts/{id}/replace endpoint in app.py
- Fix backward compatibility in kba_service.py (_table_to_draft, _draft_to_table)
- Add search questions generation to replace_draft workflow
- Fix NULL constraint errors by ensuring empty strings for required fields
- Update related_tickets validation: accept INC + 9-12 digits (was fixed at 12)

Frontend:
- Add Text component import to KBADrafterPage.jsx (fix TypeError)
- Add full-screen blur overlay with centered spinner during KBA generation
- Show overlay for both new draft creation and replacement operations
- Update styles: loadingOverlay with backdrop-filter blur effect

Documentation:
- Update kba_prompts.py: clarify related_tickets format with examples
- Update GENERAL.md: correct related_tickets format specification

Fixes #1 - KBA drafts not loading (missing DB column)
Fixes #2 - Replace endpoint not found (405 error)
Fixes #3 - Ticket ID validation too strict

* tickets in popup ansehen

* feat(kba-drafter): add ability to reset reviewed KBAs back to draft

- Add "Zurück zu Entwurf" button for reviewed status KBAs
- Add handleUnreview() handler to update status from "reviewed" to "draft"
- Import ArrowUndo24Regular icon for the unreview action
- Allow users to continue editing KBAs after review without deletion

This enables editing of reviewed KBAs that need changes before publishing.

* feat(kba-drafter): add ticket viewer, unreview, status filter, and UI improvements

- Add ticket viewer dialog to display original incident details
  * New "Ticket" button in KBA header with DocumentSearch icon
  * Modal dialog showing incident data (ID, summary, status, priority, assignee, notes, resolution)
  * Backend endpoint /api/csv-tickets/by-incident/<incident_id> for incident ID lookup
  * Frontend API function getCSVTicketByIncident()

- Add unreview functionality for reviewed KBAs
  * "Zurück zu Entwurf" button with ArrowUndo icon
  * Allows resetting reviewed KBAs back to draft status for further editing

- Redesign KBA overview list
  * Replace corner delete button with professional overflow menu (⋮)
  * Horizontal layout: content left, status badge right-aligned, menu button
  * Menu component with delete option

- Add status filter dropdown to KBA overview
  * Filter options: All, draft, reviewed, published
  * Dropdown in card header for easy filtering

- Align EditableList "Add" button width with input fields
  * Use invisible placeholder buttons for exact width matching
  * Ensures consistent layout regardless of allowReorder setting

Files modified:
- frontend/src/features/kba-drafter/KBADrafterPage.jsx
- frontend/src/features/kba-drafter/components/EditableList.jsx
- frontend/src/services/api.js
- backend/app.py

* fix(kba): fix draft deletion bug and add collapsible AutoGenSettings

- Fix delete draft error: use response.items instead of response.drafts
- Make AutoGenSettings card collapsible with chevron icon
  - Starts collapsed to reduce visual dominance
  - Smooth slide-down animation when expanded
  - Status badge visible in collapsed header
  - Clickable header with keyboard support (Enter key)

* fix(kba): auto-scroll to top when opening draft

When clicking on a draft from the list after scrolling down,
the page now automatically scrolls to the top with a smooth animation.
This ensures users always start at the beginning of the draft content.

* feat: replace browser confirms with custom modal dialogs for unsaved changes

Replace native window.confirm() with ConfirmDialog component for better UX
consistency and modern appearance. Adds centered warning modal when user
attempts to discard unsaved changes (close draft, switch to preview, or
load different draft).

Changes:
- Add unsavedChangesDialogOpen and pendingAction states
- Update toggleEditMode, loadDraft, and handleClose to trigger modal
- Add handleDiscardChanges and handleCancelDiscard handlers
- Add ConfirmDialog with warning intent at end of component

* fix: address code review issues and add KBA drafter e2e tests

Fixes:
- Fix CSV folder case mismatch (CSV -> csv) in app.py and operations.py
- Remove duplicate get_ticket_by_incident_id method in csv_data.py
- Replace inefficient len(session.exec().all()) with SQL COUNT(*) in kba_service.py
- Replace hardcoded placeholder credentials with env var lookups in kba_service.py
- Fix scheduler swallowing exceptions (remove bare raise, return None)
- Add settings reload at start of each scheduler run to fix race condition
- Add generation_warnings field to surface search questions failures to users
- Add schema migration for generation_warnings column

Tests:
- Add 19 Playwright e2e tests for KBA Drafter feature covering:
  page load, navigation, LLM health status, draft generation,
  draft display, draft list, editing, review workflow,
  duplicate handling, and backend API integration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add LiteLLM fallback, Playwright tests, and remove OpenAI hard dependency

- LiteLLM is now the default LLM backend (no .env or API key needed)
- Multistage model fallback chain: claude-sonnet-4 → gpt-4o → gpt-4o-mini
- OpenAI SDK still used when OPENAI_API_KEY is explicitly set
- agents.py and workbench service use ChatLiteLLM when no OpenAI key
- Added csv_ticket_stats and csv_sla_breach_tickets to agent tools
- Added KBA Drafter to Playwright nav tests and menu screenshots
- Added e2e tests: publish, delete, status filter, ticket viewer
- 32 unit tests + 5 live integration tests for LLM service
- Updated .env.example with LiteLLM-first documentation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: SubSonic731 <alessandro.roschi@bit.admin.ch>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Extract agent builder into extensible module with tests

Create backend/agent_builder/ as a standalone, deeply layered module
following Grokking Simplicity (data/calculations/actions separation)
and A Philosophy of Software Design (deep modules).

Structure:
- models/: Pure data (Pydantic/SQLModel) - agent, run, evaluation, chat
- tools/: ToolRegistry, schema converter, MCP adapter
- engine/: Unified ReAct runner, callbacks, prompt builder
- evaluator.py: Success criteria evaluation (mostly calculations)
- persistence/: DB engine setup + repository pattern
- service.py: WorkbenchService (deep module facade)
- chat_service.py: ChatService using shared ReAct engine
- routes.py: Quart Blueprint replacing 200+ lines from app.py
- tests/: 107 tests (unit + integration + E2E)

Key improvements:
- Eliminated duplicate ReAct agent building (was in both agents.py
  and agent_workbench/service.py)
- DRY error handling in routes via Blueprint
- Repository pattern isolates DB from business logic
- Pure calculation modules (prompt_builder, schema_converter,
  evaluator) are independently testable
- Backward-compatible: agent_workbench/__init__.py shims to new module

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add per-agent LLM config: model, temperature, recursion_limit, max_tokens, output_instructions

Each AgentDefinition now stores configurable LLM parameters:
- model: override service default (e.g. gpt-4o vs gpt-4o-mini)
- temperature: 0.0-2.0 (deterministic to creative)
- recursion_limit: 1-100 max ReAct loop iterations
- max_tokens: cap response length (0 = unlimited)
- output_instructions: custom formatting (replaces default markdown)

Changes:
- models/agent.py: 5 new fields with validation (ge/le bounds)
- persistence/database.py: migrations for existing DBs
- engine/react_runner.py: build_llm accepts temperature+max_tokens
- engine/prompt_builder.py: append_output_instructions for custom formatting
- service.py: _resolve_llm_for_agent builds per-agent LLM when config differs
- routes.py: ui-config v2 exposes llm_config_fields and defaults
- 12 new tests (model validation, CRUD, E2E roundtrip via REST)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add output_schema for type-safe structured output, fix defaults

Changes:
- recursion_limit default: 10 → 3 (most agents finish in 1-3 tool calls)
- max_tokens default: 0 → 4096 (sensible cap instead of unlimited)
- New field: output_schema (JSON Schema stored as JSON in DB)

output_schema is config, not code. You define the expected response
shape as a JSON Schema:
  {"type":"object","properties":{"breaches":{"type":"array",...}}}

At runtime this does two things:
1. Injected into system prompt so the LLM knows the expected structure
2. Takes priority over output_instructions and default markdown

Priority chain for output formatting:
  output_schema (strict JSON) > output_instructions (free text) > default markdown

128 tests pass (9 new tests for schema handling).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add suggest-schema endpoint and UI button

New endpoint: POST /api/workbench/suggest-schema
Takes agent name, description, system_prompt and asks the LLM to
propose a JSON Schema for the agent's structured output.

Backend:
- service.py: suggest_schema() method - builds a prompt, calls LLM,
  parses JSON response (handles markdown fences), falls back to
  generic schema on parse failure
- routes.py: POST /api/workbench/suggest-schema route

Frontend:
- api.js: suggestOutputSchema() function
- WorkbenchPage.jsx: output schema textarea + Suggest Schema button
  in the create form. Schema is editable JSON, sent as output_schema
  on agent creation. Button disabled until name or prompt is filled.

129 tests pass (1 new E2E test for suggest-schema endpoint).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Wire output_schema to LangGraph response_format for SDK-level enforcement

When an agent has output_schema configured, it now does TWO things:

1. Prompt injection (existing) — schema is described in the system prompt
   so the LLM understands the expected structure
2. SDK enforcement (new) — schema is passed as response_format to
   create_react_agent(), which uses LangGraph's built-in structured
   output mechanism (provider-native or tool-based)

At runtime, structured_response from the LangGraph result takes
priority over raw message content. If the agent has no output_schema,
behavior is unchanged (markdown output from final message).

The output pipeline:
  output_schema defined → response_format=schema → structured_response → JSON
  no output_schema → final message content → markdown (default)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Always use structured_response with default schema

Every agent now always returns structured output via LangGraph's
response_format — no more untyped markdown strings.

Default schema (when no custom output_schema is set):
  {
    "message": "string (markdown)",
    "referenced_tickets": ["string"]
  }

This means:
- Plain agents → get {message: '...markdown...', referenced_tickets: [...]}
- Custom schema agents → get whatever schema they define
- Both enforced at SDK level via response_format, not just prompt

Changes:
- prompt_builder.py: DEFAULT_OUTPUT_SCHEMA, resolve_output_schema()
- service.py: always passes effective schema to create_react_agent
- routes.py: ui-config exposes default_output_schema for frontend
- Tests updated (132 pass)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add comprehensive docs with mermaid diagrams, clean up stale docs

New: docs/AGENT_BUILDER.md — full architecture documentation with:
- Architecture diagram (module layers + data flow)
- Sequence diagram (agent run lifecycle)
- Structured output pipeline flowchart
- ER diagram (DB schema)
- Data/Calculations/Actions separation diagram
- Deep modules table
- Extensibility flowchart
- API endpoint reference
- Testing commands

Updated:
- AGENTS_IMPLEMENTATION.md — replaced stale content with summary + pointer
- docs/AGENTS.md — replaced stale architecture with mermaid + pointer
- docs/PROJECT_STRUCTURE.md — added agent_builder/ to tree

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Docs overhaul + remove ~1800 lines of dead code/stale docs

Documentation:
- README.md: Complete rewrite with features table, screenshots, mermaid
  architecture diagram, agent builder section, correct tech stack
- PROJECT_STRUCTURE.md: Full rewrite matching actual codebase
- AGENTS.md: Fixed AgentService→WorkbenchService, updated examples
- LEARNING.md: Fixed broken link

Deleted stale docs:
- AGENTS_IMPLEMENTATION.md (was a 3-line redirect stub)
- docs/RULES.md (empty file)
- docs/SQLMODEL_MIGRATION.md (historical, migration complete)

Dead code removed from agents.py (~250 lines):
- MCP client stubs (_mcp_tool_to_langchain, _ensure_ticket_mcp_connection, close)
- Schema helpers only used by dead MCP code (_json_type_to_python, _schema_to_pydantic)
- OpenAI logging callback (duplicated in agent_builder/engine/callbacks.py)
- _build_state_graph learning example (dead code)
- Unused imports (get_langchain_tools, MCPClient, create_model)

Deleted old agent_workbench/ source files (~1030 lines):
- models.py, service.py, evaluator.py, tool_registry.py
- Only __init__.py shim remains for backward compatibility

132 backend tests + 15 Playwright tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add Playwright tests for suggest-schema and agent chat

New E2E tests in workbench.spec.js:
- 'creates agent with output schema via suggest button' — mocks
  /api/workbench/suggest-schema, clicks Suggest Schema, verifies
  schema populates textarea, creates agent, deletes it
- 'sends message and displays mocked response' (Agent Chat UI) —
  mocks /api/agents/run, types message, clicks send, verifies
  markdown heading and tool badge render

17 Playwright tests pass (was 15, +2 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add VPN agent and failure handling Playwright tests

New Agent Fabric E2E tests:
- 'runs VPN troubleshooting agent and verifies structured output'
  Creates agent with VPN analysis prompt, runs it (mocked),
  verifies structured JSON output with ticket IDs (INC-101, INC-312),
  referenced_tickets field, and VPN content in rendered output
- 'handles agent run failure gracefully'
  Creates agent, runs it with mocked failure response,
  verifies UI doesn't crash and shows completion state

19 Playwright tests pass (was 17, +2 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix structured output rendering in Agent Fabric UI

The output is now always structured JSON ({message, referenced_tickets}).
The UI now parses it and renders each part appropriately:

- message → rendered as GitHub-flavored Markdown (ReactMarkdown)
- referenced_tickets → rendered as monospace badges below the output
- Extra custom schema fields → rendered as formatted JSON in a pre block
- Button preview → shows message text, not raw JSON

Also handles non-JSON output gracefully (falls back to raw markdown).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add MCP App technical documentation

New: docs/MCP_APP.md — comprehensive guide on how this project
works as an MCP application:

- What an MCP App is (app that exposes business logic via MCP protocol)
- Architecture diagrams: consumers (Claude, Copilot, agents) → MCP endpoint
- Full protocol sequence diagram (initialize → tools/list → tools/call)
- The @operation decorator: single source of truth for REST + MCP + LangChain
- How to connect clients (Claude Desktop, Python, curl examples)
- 4-layer architecture diagram (business logic → operations → adapters → consumers)
- Extension roadmap: Resources, Prompts, SSE streaming
- Security considerations table

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add SchemaRenderer + visual SchemaEditor with x-ui widget system

SchemaRenderer (frontend/src/features/workbench/SchemaRenderer.jsx):
- Generic component: takes {data, schema} and renders each property
  using x-ui widget annotations
- Widgets: markdown, table, badge-list, stat-card, bar-chart (Nivo),
  pie-chart (Nivo), json, hidden
- Auto-detection when no x-ui: string→markdown, integer→stat-card,
  array of objects→table, array of strings→badge-list, object→json
- Console debug logging, data-testid per field for E2E testing

SchemaEditor (frontend/src/features/workbench/SchemaEditor.jsx):
- Visual property list editor (no raw JSON editing needed)
- Add/remove properties, set name/type/description
- Widget picker dropdown with all available widgets
- Context-sensitive options (columns for table, label for stat-card,
  indexBy/keys for bar-chart)
- Syncs with suggest-schema: LLM suggestion populates visual editor
- Outputs valid JSON Schema with x-ui annotations

Backend:
- DEFAULT_OUTPUT_SCHEMA now has x-ui annotations (markdown + badge-list)
- suggest_schema prompt updated to suggest x-ui widgets per property

Wiring:
- WorkbenchPage uses SchemaRenderer for run output (replaces hardcoded)
- WorkbenchPage uses SchemaEditor for create form (replaces textarea)

20 Playwright tests pass (including new SchemaRenderer widget test).
132 backend tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Improve suggest-schema prompt with full data domain + widget docs

The suggest-schema LLM prompt now includes:
- Ticket data domain (all field names, types, enum values, example cities)
- Available tools with descriptions (csv_list_tickets, csv_search_tickets, etc.)
- Full widget documentation with use-cases and options for each:
  markdown, table (columns), badge-list, stat-card (label),
  bar-chart (indexBy, keys), pie-chart, json, hidden
- Explicit rules: always include message+referenced_tickets,
  match widget to data shape, use snake_case names

This gives the LLM enough context to suggest schemas that actually
match the ticket data (e.g. status distribution → pie-chart,
ticket list → table with incident_id/summary/status columns).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix latency issues: schema title bug + recursion_limit headroom

Investigation found 3 root causes for slow AI calls:

1. gpt-5-nano is a REASONING model — burns 192-832 reasoning tokens
   per LLM call (invisible chain-of-thought), taking 2-8s each.
   A simple 'say hello' costs 8.4s with 832 reasoning tokens.

2. response_format adds a 3rd LLM call — LangGraph's
   generate_structured_response makes a separate LLM call to format
   the output as JSON after the ReAct loop finishes.
   Without: 4.7s (2 calls). With: 13s (3 calls).

3. Missing 'title' in output_schema crashed with_structured_output.
   OpenAI's API requires a top-level 'title' in the JSON Schema.

Fixes applied:
- resolve_output_schema() now auto-adds 'title': 'AgentOutput'
  when missing (both default and custom schemas)
- DEFAULT_OUTPUT_SCHEMA has explicit 'title' field
- recursion_limit: user's setting (default 3) is now multiplied by 4
  for the actual LangGraph graph, with a floor of 10. This prevents
  GraphRecursionError when response_format adds extra graph steps.

Note: The main latency driver (reasoning tokens) is inherent to the
model choice. Users can switch to gpt-4o-mini via per-agent 'model'
field for ~10x faster non-reasoning responses.

133 backend + 20 Playwright tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix agent tool token bloat: compact fields + lower default limits

Root cause: csv_list_tickets tool returned full Ticket objects with ALL
fields (notes, description, resolution, work logs) — ~65K tokens for
100 tickets. The LLM had to process all of this, causing 30-60s per
step with a reasoning model.

Changes to operations.py:
- csv_list_tickets: returns compact dicts (10 fields, not 30+),
  default limit 25 (was 100), max limit 100 (was 500)
- csv_search_tickets: same compact treatment, limit 25 (was 50)
- csv_get_ticket: now accepts optional 'fields' parameter for
  selective detail drill-down, returns dict (was full Ticket)
- Tool descriptions updated to guide agents: 'use csv_get_ticket
  for full details' pattern

Token impact per tool call:
  Before: 100 tickets × ~400 tokens = ~65,000 tokens
  After:  25 tickets × ~60 tokens = ~1,500 tokens (97% reduction)

Expected latency improvement:
  Before: ~13s per tool call (65K token input processing)
  After:  ~3-5s per tool call (1.5K token input)

153 tests pass (133 backend + 20 Playwright).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Drop response_format to eliminate extra LLM call

LangGraph 1.0.8 implements response_format via a SEPARATE LLM call
(generate_structured_response) — adding 5-10s latency per run.
The refactor to inline tool-based structured output (github.com/
langchain-ai/langgraph/issues/5872) hasn't shipped yet.

Fix: remove response_format from create_react_agent. The system
prompt already instructs the LLM to produce JSON matching the
schema (via append_output_instructions). The frontend's
SchemaRenderer handles both parsed JSON and raw text gracefully.

Latency impact:
  Before: 3 LLM calls (decide tool + answer + format JSON) ~13s
  After:  2 LLM calls (decide tool + answer as JSON)       ~5s

When LangGraph ships inline structured output, we can re-enable
response_format with zero code changes (just pass it back to
build_react_agent).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Enable OpenAI JSON mode for guaranteed valid JSON output

Adds response_format: {type: 'json_object'} to the ChatOpenAI
constructor via model_kwargs. This is a model-level setting that
constrains token generation to valid JSON — no extra LLM call,
no post-processing, just guaranteed JSON from every response.

This is different from LangGraph's response_format parameter
(which adds a separate LLM call). This is OpenAI's native JSON
mode applied at the API level during the same call.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert JSON mode — incompatible with non-strict tool schemas

OpenAI's response_format: json_object requires all tools to have
strict schemas. Our tools (from @operation decorator) don't set
strict=True, causing: 'csv_search_tickets is not strict. Only
strict function tools can be auto-parsed'.

Reverting to prompt-only JSON enforcement, which tested at 3/3
reliability with gpt-5-nano. The frontend fallback (wraps non-JSON
as {message: raw_text}) provides additional safety.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add widget E2E tests + strict tools + Agent Chat JSON mode

New Playwright tests (23 total, +3):
- 'renders bar-chart and pie-chart from x-ui annotations' — injects
  mock agent with output_schema containing x-ui widgets, verifies
  SVG rendering for pie/bar charts, stat-card with label, badges
- 'renders raw JSON for object data' — verifies auto-detection:
  objects render as formatted JSON in pre blocks
- 'falls back gracefully for non-JSON output' — verifies plain
  markdown string wraps as {message: text} and renders correctly

Agent Chat (agents.py) fixes:
- Added JSON output mode (response_format: json_object)
- Added strict=True tool binding for compatibility
- Matches the same pattern as agent_builder

Strict tool binding (react_runner.py):
- build_react_agent pre-binds tools with strict=True
- Required for OpenAI JSON mode (response_format: json_object)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NameError: OpenAICallLoggingCallback was removed but still referenced

The class was deleted in the dead code cleanup but agents.py still
used it. Replaced with make_llm_logging_callback from agent_builder.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add 'Show in Menu' — agents appear as tabs in navigation

When an agent has show_in_menu=true, it appears as a tab in the
main navigation bar. Clicking it opens a dedicated run page with
just the input field, run button, and SchemaRenderer output.

Backend:
- AgentDefinition: new show_in_menu bool field (default false)
- AgentDefinitionCreate/Update: show_in_menu parameter
- Migration for existing DBs
- Service wires it through create/update

Frontend:
- WorkbenchPage: 'Show in menu' checkbox in create form
- App.jsx: fetches agents with show_in_menu=true, injects as tabs
- AgentRunPage.jsx: simple standalone run page (title, description,
  optional input, run button, SchemaRenderer output)
- Dynamic routes: /agent-run/{agentId}

E2E test:
- Creates agent via API with show_in_menu=true
- Verifies tab appears in navigation with agent name
- Clicks tab, verifies AgentRunPage renders
- Runs agent (mocked), verifies output with SchemaRenderer

24 Playwright + 133 backend = 157 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add missing tools to chat agent: csv_sla_breach_tickets, csv_ticket_stats

The SLA Breach page was slow because the chat agent (agents.py)
didn't have the csv_sla_breach_tickets tool. The prompt said
'call csv_sla_breach_tickets' but the tool didn't exist, so the
LLM tried to replicate SLA breach logic manually using
csv_list_tickets — fetching many tickets and reasoning over them.

Now the chat agent has all 6 CSV tools matching the operations:
- csv_list_tickets (existing)
- csv_get_ticket (existing)
- csv_search_tickets (existing)
- csv_ticket_fields (existing)
- csv_sla_breach_tickets (NEW — pre-computed, ~1000 tokens)
- csv_ticket_stats (NEW — aggregated stats, ~350 bytes)

Expected improvement: 1 tool call (~1000 tokens) instead of
multiple list calls + manual reasoning (~30-60K tokens).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add ticket detail modal and enhance CSV ticket table functionality

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Refactor CSVTicketTable component: reorder DialogActions import for consistency

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Add reasoning_effort config + new tools for major speed improvement

Performance:
- reasoning_effort='low' as default — reduces gpt-5-nano from
  512 reasoning tokens (~7s) to 0-192 tokens (~1-3s) per LLM call
- Configurable per agent: low (fast), medium, high (deep), default
- Both agent_builder and legacy chat agent use reasoning_effort='low'

New tools:
- csv_count_tickets: count matching tickets WITHOUT returning data.
  Lets the LLM check 'how many VPN tickets?' (~50 tokens) before
  deciding to fetch details (~5000 tokens)
- csv_search_tickets_with_details: search + return full details
  (notes, resolution, description) in ONE call. Eliminates the
  N × csv_get_ticket drill-down pattern that caused the
  'Ticket Knowledgebase Creator' to make 5+ sequential LLM calls

Impact on 'Ticket Knowledgebase Creator' agent:
  Before: search(compact) → get_ticket × N → generate = 5+ LLM calls × ~5s = 25s+
  After:  search_with_details(query, limit=10) → generate = 2 LLM calls × ~2s = 4s

Also fixed: removed stale response_format: json_object from build_llm
(was causing strict tool errors).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update incident details in FALL_2_HARDWARE_PERIPHERIE and FALL_3_ZUGRIFF_BERECHTIGUNG documentation for consistency and clarity

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* Fix: all E2E tests now clean up created agents

Two tests were creating agents via the UI but not deleting them,
leaving orphans in the DB after each test run:
- 'runs an agent and appends output to run button'
- 'requires and forwards configured run input'

Added Delete button clicks at the end of both tests.
All 10 agent-creating tests now verified to clean up.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Rewrite workbench e2e tests for tabbed UI

- Add helpers: goToCreateTab, goToAgentsTab, createAgent, createAgentViaAPI, deleteAgentViaAPI, mockEmptyRuns
- Update 'creates and deletes' to use Create Agent tab and agent cards
- Update 'runs an agent' to verify output in RunsSidePanel
- Update 'requires input' to use card inline input field + Go button
- Update 'suggest schema' to navigate to Create tab first
- Update 'failure handling' to check error in run detail panel
- Refactor SchemaRenderer tests to use setupSchemaTest helper (API-created agents, run output in side panel)
- Keep Agent Chat UI and Show in Menu tests unchanged

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: redesign workbench with agent cards, runs side panel, and tabbed layout

- Split WorkbenchPage into tabbed UI: Agents (cards grid) + Create Agent
- AgentCardsPanel: icon cards with Run/Edit/Delete buttons per agent
- RunsSidePanel: scrollable run history with click-to-view output
- AgentEditDialog: edit existing agents via dialog
- AgentCreateForm: extracted creation form (reusable for create + edit)
- Added API functions: updateWorkbenchAgent, listAllRuns, getRun
- All 47 Playwright tests pass (12 workbench tests updated for new UI)
- Removed Ollama references from setup.sh and package.json

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: LiteLLM fallback in agent_builder + add live lifecycle test

- Fixed agent_builder/engine/react_runner.py: ChatLiteLLM when no API key
- Fixed agent_builder/service.py: removed hard OpenAI key requirement
- Fixed agent_builder/chat_service.py: same
- Fixed RunsSidePanel output parsing for raw string output
- Added full lifecycle e2e test (live LLM): create → run → edit → re-run → verify history → delete

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: suggest schema & tools, default no tools, pure function refactor

- 'Suggest Schema & Tools' button: LLM suggests output schema AND tool selection
- Backend: _build_suggest_prompt and _parse_suggest_response as pure functions
- Frontend: tools default to empty, populated by suggest response
- RunsSidePanel: pure calculations extracted (buildAgentMap, sortRunsNewestFirst,
  resolveOutputSchema, resolveAgentName, parseRunOutput, formatRelativeTime)
- All 49 Playwright tests pass (2 live LLM tests included)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: result dialog, chart rendering, markdown fence parsing

- Run results now open in a large Dialog (900px wide, 85vh max)
- Fixed parseRunOutput: strips markdown code fences from LLM output
- Fixed PieChartWidget: filters non-numeric values, formats labels
- Fixed BarChartWidget: accepts object {key: number} in addition to arrays
- Chart containers: 300px height, 600px max-width
- Tests: close dialog before cleanup (dialog blocks pointer events)
- All 49 Playwright tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: all-live Playwright tests, result dialog fix, runs panel fix

- Rewrote workbench tests: ZERO mocks, all 8 tests use live LLM
- Fixed RunsSidePanel: min-height for layout, runs visible on load
- Fixed parseRunOutput: strips markdown fences from LLM output
- Fixed chart widgets: pie/bar handle non-numeric values, proper sizing
- Fixed dialog close: tests use X button (in viewport) not Close (scrolled)
- Total: 43 tests, all passing, all live (1.1 min)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: extract shared parseRunOutput, add delete-all-runs

- Extracted parseRunOutput (fence-stripping + JSON parsing) into
  outputUtils.js — shared by RunsSidePanel and AgentRunPage
- Fixed AgentRunPage (show_in_menu): renders markdown instead of raw JSON
- Added DELETE /api/workbench/runs endpoint + trash button in Runs panel
- Runs panel: min-height 500px so content is visible on load

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add SSE activity monitor, settings page, agent templates & run history

- Agent Activity page with real-time SSE event stream (tool calls, LLM
  thinking, run lifecycle), filterable by run_id via URL query param
- EventBus pub/sub + StreamingCallbackHandler wired into ReAct engine
- Settings page: drag-and-drop tab reorder, hide/show toggles, icon
  picker (57 FluentUI icons), persisted to localStorage
- Agent templates dropdown (KBA from tickets, worklog stats, next step
  advisor) pre-fills the create agent form
- AgentRunPage now shows filtered run history with detail dialog and
  link to Activity page filtered by run_id
- 19 new Playwright E2E tests (8 activity + 11 settings)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add Support Workflow canvas page with interactive editor

Purely browser-side workflow visualization using HTML Canvas:
- 5 node types: Start, End, Action, Decision, Wait (each with
  distinct shape and color)
- Drag-and-drop to reposition nodes
- Shift+drag to create connections between nodes
- Double-click to rename nodes inline
- Animate button shows flowing dots along edges
- Toolbar to add/delete nodes, reset to default workflow
- Default workflow: Ticket Created → Auto-Classify → Priority
  decision → L1/L2 paths → Resolved decision → Close/Reopen
- 9 Playwright E2E tests with screenshots

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: metro-map workflow with presets, color picker, agent assignment

Rewrite WorkflowPage as metro-map style inspired by Incident &
Problem Solving methodology:
- 3 workflow presets: Incident Solving, Problem Solving, Change Mgmt
- Metro station circle nodes with thick colored edge lines
- Edge color inherited from outgoing node
- Click node → dialog with color picker (8 colors) and agent selector
  (10 agent presets)
- Agent indicator dot on nodes with assigned agents
- Color legend auto-generated from used colors
- 12 Playwright E2E tests covering presets, node config, animation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: friendlier workflow editor — connect mode, double-click add, dialog edges

- Connect Mode toggle button: click source node then target to draw edge
  (no shift key needed). Crosshair cursor + green '+' hint on target.
- Double-click empty canvas area to add a node at that position
- Node dialog now has 'Connect to…' section with buttons for each
  unconnected node — draw edges without touching the canvas
- Add Node button opens config dialog immediately for the new node
- Dynamic help text updates based on current mode
- Escape key exits connect mode
- Updated Playwright tests for new UX

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat: add 'Improve my Prompt' button to Agent Fabric

LLM-powered prompt improvement following 2025 best practices:
- Backend: /api/workbench/improve-prompt endpoint + service method
  that rewrites prompts with clear role, goals, numbered steps,
  tool references, output format, and constraints
- Frontend: '✨ Improve my Prompt' button below the system prompt
  textarea, disabled when empty, replaces prompt with improved version
- 4 Playwright E2E tests with before/after screenshots

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: prompt improvement skips output format (handled by schema)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: improve-prompt uses selected tools, not all available

Pass tool_names from frontend form state so the LLM only references
tools the user actually selected for this agent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: remove maxHeight on tools list to avoid scrolling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: replace worklog template with Topic & Product Analysis

Worklog columns in data.csv are all empty/zero. New template analyzes
topics, products, services, priority distribution, and group workload
using data that actually exists in the CSV.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
#25)

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Signed-off-by: Andre Bossard <anbossar@microsoft.com>
…ionality

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Interactive Jupyter notebook series teaching prompt optimization with DSPy.
Organized by learning concepts from Grokking Simplicity and A Philosophy
of Software Design.

Structure:
- 8 notebooks (00-07): Introduction → Data/Calc/Actions → Deep Modules →
  Evaluation as Spec → Optimizer as Compiler → Domain Tuning → Agentic → Finale
- 20 tasks across 4 tiers: Basics, Reasoning, Composition, Agentic
- dspy_tasks/ library: data.py (DATA), calculations.py (CALCULATIONS),
  actions.py (ACTIONS), tools.py, visualize.py (ipywidgets + Plotly)
- 16 JSON datasets (13 generic + 3 CSV-derived from ticket data)
- 165 passing pytest tests covering signatures, metrics, and registry

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
abossard and others added 7 commits March 24, 2026 20:08
litellm PyPI versions 1.82.7 and 1.82.8 were compromised by attacker
TeamPCP with credential-stealing malware. See:
BerriAI/litellm#24518

Pinned to known-safe versions:
- backend: litellm==1.82.1
- notebooks: litellm==1.82.6

Do NOT upgrade until BerriAI confirms PyPI is clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NB02: removed 'Beliebige Aufgabe' section (belongs in NB03)
NB03: 6-step arc with task catalog + domain tuning:
  1. See all 20 tasks
  2. Pick any task and optimize it
  3. Load real ticket data
  4. Run generic prompt → mediocre
  5. Tune with domain data → much better
  6. Takeaway: your data is your moat

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dump_state() can return a list or dict depending on DSPy version.
Handle both gracefully with type checks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Updated task definitions in notebooks to use more descriptive field names (e.g., 'query' to 'question').
- Changed the default task in domain tuning notebook from "sentiment" to "plan_execute".
- Improved agent behavior optimization by refining prompts and adding explanations for model choices.
- Enhanced search functionality in tools to provide better ticket search results and counts.
- Updated calculations for plan quality and self-correct accuracy to align with new output structures.
- Added MIPROv2 optimization step to improve agent responses based on vague prompts.
- Adjusted dataset for search agent to include more complex queries and answers.
- Updated kernel specifications across notebooks to use Python 3.13.12.
…agent optimization examples. Change model to 'github_copilot/gpt-4o-mini' for faster performance. Enhance explanations for prompt optimization and MIPROv2. Adjust markdown formatting and update kernel specifications across notebooks.
…ot and notebooks directory

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Copilot AI review requested due to automatic review settings March 25, 2026 08:56
@abossard abossard self-assigned this Mar 25, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an “DSPy/LLM playground” developer experience layer (docs + editor config) while also introducing substantial backend functionality for agent tooling, auto-generation scheduling, and persistence.

Changes:

  • Added extensive documentation for KBA Drafter, Agent Builder, CSV AI guidance, and Ubuntu setup.
  • Introduced Agent Builder module + workbench integration, including REST routes, tool registry, evaluation, and SSE event streaming.
  • Added auto-generation scheduler/service/models and migrated tasks.py storage from in-memory to SQLite/SQLModel, plus expanded backend dependencies and tests.

Reviewed changes

Copilot reviewed 89 out of 227 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
docs/KBA_DRAFTER.md New KBA Drafter architecture + API + ops documentation.
docs/INSTALL_UBUNTU.md New Ubuntu 22.04 prerequisite install guide.
docs/CSV_AI_GUIDANCE.md New guidance for agents using CSV ticket tools.
docs/AGENT_BUILDER.md New Agent Builder module documentation with diagrams.
backend/workbench_integration.py Wires project operations into Agent Builder ToolRegistry + services.
backend/usecase_demo.py Adds in-memory background runner with markdown→JSON row extraction.
backend/tests/test_workbench_integration_e2e.py E2E-style REST verification for workbench integration with mocked LLM.
backend/tests/test_usecase_demo.py Async tests for UsecaseDemoRunService execution + timeout handling.
backend/tests/test_search_questions.py Adds tests for search-questions schema + cleaning behavior.
backend/tests/test_litellm_integration.py Adds opt-in live LiteLLM/Copilot integration tests (skipped if unauthenticated).
backend/tests/test_kba_schema.py Adds JSON Schema validation tests for legacy KBA schema.
backend/tests/test_agents.py Adds basic tests for operation registry + langchain tool conversion.
backend/tests/conftest.py Ensures backend import path is set up for tests.
backend/test_auto_gen.py Adds a manual “quick test” script for auto-generation components.
backend/tasks.py Migrates tasks to SQLModel/SQLite; adds engine/session helpers and DB init on import.
backend/scheduler.py Adds APScheduler-based auto-generation scheduler + manual trigger.
backend/requirements.txt Expands backend dependencies for agents, SQLModel, schedulers, LLM tooling.
backend/pytest.ini Adds pytest config for async mode and discovery.
backend/mcp_handler.py Adds MCP JSON-RPC handler routing to unified operations.
backend/kba_schemas.py Adds legacy Draft-07 JSON Schema + example for KBA output.
backend/kba_prompts.py Adds prompt builders for KBA generation, retries, markdown fallback, search questions.
backend/kba_output_models.py Adds Pydantic models/validators for structured KBA output + search questions.
backend/kba_exceptions.py Adds explicit KBA exception hierarchy.
backend/kba_audit.py Adds audit logging service for KBA draft lifecycle.
backend/kb_published/KB-81C44260-gerätewechsel-für-vpn-probleme-im-efd-durchführen.md Adds a published KBA markdown artifact.
backend/auto_gen_service.py Implements ticket selection + sequential draft generation and settings updates.
backend/auto_gen_models.py Adds settings DTOs + SQLModel persistence table + run result model.
backend/api_decorators.py Extends Operation with MCP arg parsing, JSON serialization, LangChain tool conversion.
backend/agent_workbench/init.py Adds backward-compat shim re-exporting Agent Builder API.
backend/agent_builder/tools/schema_converter.py Converts JSON Schema → Pydantic args models (pure).
backend/agent_builder/tools/registry.py Tool registry for dependency-injected StructuredTools.
backend/agent_builder/tools/mcp_adapter.py Adapts external MCP tools into LangChain StructuredTools.
backend/agent_builder/tools/init.py Re-exports tool helpers.
backend/agent_builder/tests/test_service.py Adds WorkbenchService CRUD/tool introspection tests.
backend/agent_builder/tests/test_schema_converter.py Adds schema converter tests.
backend/agent_builder/tests/test_registry.py Adds ToolRegistry tests.
backend/agent_builder/tests/test_prompt_builder.py Adds prompt builder tests (schema/defaults/efficiency mode).
backend/agent_builder/tests/test_persistence.py Adds repository tests with real temp SQLite DB.
backend/agent_builder/tests/test_models.py Adds extensive model validation/roundtrip tests.
backend/agent_builder/tests/test_evaluator.py Adds evaluator tests (criteria types + scoring + llm_judge guard).
backend/agent_builder/tests/test_engine.py Adds extract_tools_used tests.
backend/agent_builder/routes.py Adds Quart blueprint routes for workbench CRUD/runs/eval + SSE stream.
backend/agent_builder/persistence/repository.py Adds persistence repository for agents/runs/evaluations.
backend/agent_builder/persistence/database.py Adds engine builder + lightweight SQLite migrations.
backend/agent_builder/persistence/init.py Re-exports persistence public API.
backend/agent_builder/models/run.py Adds SQLModel run table + JSON property accessors + DTO.
backend/agent_builder/models/evaluation.py Adds criteria/evaluation models + SQLModel table.
backend/agent_builder/models/chat.py Adds chat request/response models.
backend/agent_builder/models/agent.py Adds agent definition models + JSON-backed fields + create/update DTOs.
backend/agent_builder/models/init.py Re-exports models API.
backend/agent_builder/evaluator.py Implements criteria evaluation + optional llm_judge + scoring.
backend/agent_builder/engine/react_runner.py Implements LLM construction, ReAct agent execution, tool usage extraction.
backend/agent_builder/engine/prompt_builder.py Implements structured output schema prompting + efficiency mode prompt.
backend/agent_builder/engine/event_bus.py Adds in-process pub/sub event bus with history for SSE.
backend/agent_builder/engine/callbacks.py Adds logging + SSE publishing callbacks for tools/LLM calls.
backend/agent_builder/engine/init.py Re-exports engine API.
backend/agent_builder/chat_service.py Adds one-shot chat service using shared ReAct runner.
backend/agent_builder/init.py Defines Agent Builder public API surface.
backend/=3.10.4 Adds a file containing pip install output (likely accidental).
CLAUDE.md Adds Claude Code guidance: setup/run/test/architecture notes.
.vscode/launch.json Adds VS Code launch configs for backend/frontend/notebooks/browser.
.vscode/extensions.json Adds recommended VS Code extensions.
.github/copilot-instructions.md Rewrites Copilot guidance for CSV-ticket learning scope and boundaries.
.gitattributes Enables nbstripout + notebook diff driver configuration.
.env.example Adds documented environment variable template for LLM/KBA features.
.dockerignore Updates ignore rules to align with .venv naming.
.claude/settings.local.json Updates Claude allowed command to source .venv.

Comment on lines +86 to +89
Result dictionary with generation stats
"""
logger.info("Manual trigger of auto-generation")
result = await self._run_auto_generation()
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_now() assumes _run_auto_generation() always returns an object with model_dump(), but _run_auto_generation() returns None when auto-generation is disabled (and also on exceptions). This will raise an AttributeError on manual runs; return a consistent result shape (e.g., an AutoGenRunResult with success=False + reason) or handle None explicitly in run_now().

Suggested change
Result dictionary with generation stats
"""
logger.info("Manual trigger of auto-generation")
result = await self._run_auto_generation()
Result dictionary with generation stats. If auto-generation
did not run (e.g., disabled or failed), returns a failure
dictionary with a reason message.
"""
logger.info("Manual trigger of auto-generation")
result = await self._run_auto_generation()
if result is None:
# Ensure we always return a dictionary, even when auto-generation
# is disabled or fails with an exception.
return {
"success": False,
"reason": "Auto-generation did not run; see logs for details.",
}

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +103
settings = self.auto_gen_service.get_settings()
if not settings.enabled:
logger.info("Auto-generation is disabled, skipping run")
return None
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_now() assumes _run_auto_generation() always returns an object with model_dump(), but _run_auto_generation() returns None when auto-generation is disabled (and also on exceptions). This will raise an AttributeError on manual runs; return a consistent result shape (e.g., an AutoGenRunResult with success=False + reason) or handle None explicitly in run_now().

Copilot uses AI. Check for mistakes.
Comment on lines +185 to +193
def on_tool_end(self, output: str, *, run_id: Any = None, **kwargs: Any) -> None:
started = self._start_times.pop(run_id, None)
duration_ms = int((perf_counter() - started) * 1000) if started is not None else None
preview = output[:500] if isinstance(output, str) else str(output)[:500]
event_bus.publish(AgentEvent(
run_id=run_id_outer,
event_type="tool_end",
data={"tool_name": kwargs.get("name", ""), "output": preview, "duration_ms": duration_ms},
))
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSE tool_end event reports tool_name using kwargs.get("name"), but on_tool_end typically doesn’t receive the tool name in kwargs; this will often publish empty tool names and break UI/telemetry correlation. Persist the tool name from on_tool_start keyed by callback run_id (similar to _start_times) and reuse it in on_tool_end/on_tool_error.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +104
count = session.exec(select(AgentRun)).all().__len__()
session.exec(sql_delete(AgentRun))
session.commit()
return count
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete_all_runs() loads all runs into memory just to compute the count. Prefer a COUNT(*) query (or use a delete statement with rowcount if supported by the driver) to avoid O(n) memory/time for large tables.

Suggested change
count = session.exec(select(AgentRun)).all().__len__()
session.exec(sql_delete(AgentRun))
session.commit()
return count
result = session.exec(sql_delete(AgentRun))
session.commit()
# result.rowcount may be None if not supported; normalize to int
return int(result.rowcount or 0)

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +9
Collecting APScheduler
Downloading apscheduler-3.11.2-py3-none-any.whl.metadata (6.4 kB)
Collecting tzlocal>=3.0 (from APScheduler)
Downloading tzlocal-5.3.1-py3-none-any.whl.metadata (7.6 kB)
Downloading apscheduler-3.11.2-py3-none-any.whl (64 kB)
Downloading tzlocal-5.3.1-py3-none-any.whl (18 kB)
Installing collected packages: tzlocal, APScheduler

Successfully installed APScheduler-3.11.2 tzlocal-5.3.1
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like accidental pip output committed as a file (backend/=3.10.4). It will confuse contributors and pollute the repo; remove it and consider adding a guard (e.g., .gitignore patterns) to prevent similar artifacts from being committed.

Suggested change
Collecting APScheduler
Downloading apscheduler-3.11.2-py3-none-any.whl.metadata (6.4 kB)
Collecting tzlocal>=3.0 (from APScheduler)
Downloading tzlocal-5.3.1-py3-none-any.whl.metadata (7.6 kB)
Downloading apscheduler-3.11.2-py3-none-any.whl (64 kB)
Downloading tzlocal-5.3.1-py3-none-any.whl (18 kB)
Installing collected packages: tzlocal, APScheduler
Successfully installed APScheduler-3.11.2 tzlocal-5.3.1

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +131
# Database path - use environment variable or default to data/tasks.db
DB_PATH = Path(__file__).parent / "data" / "tasks.db"
DB_PATH.parent.mkdir(parents=True, exist_ok=True)

# Create SQLAlchemy engine
DATABASE_URL = f"sqlite:///{DB_PATH}"
engine = create_engine(DATABASE_URL, echo=False)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the DB path can be set via environment variable, but the implementation always uses Path(__file__).parent / "data" / "tasks.db". Either read an env var (e.g., TASKS_DB_PATH) or adjust the comment so behavior is accurate.

Copilot uses AI. Check for mistakes.
Comment on lines 242 to +254
def get_stats() -> TaskStats:
"""
Get task statistics.

Consolidated stats calculation - we could split into
count_total(), count_completed(), count_pending(), but
that's unnecessary fragmentation. This is clearer.
Get task statistics using SQLModel queries.
"""
all_tasks = list(_tasks_db.values())
completed = sum(1 for task in all_tasks if task.completed)

return TaskStats(
total=len(all_tasks),
completed=completed,
pending=len(all_tasks) - completed
)
with get_session() as session:
total = len(session.exec(select(Task)).all())
completed = len(session.exec(select(Task).where(Task.completed == True)).all()) # noqa: E712

return TaskStats(
total=total,
completed=completed,
pending=total - completed
)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_stats() materializes full result sets just to count rows, which will scale poorly as tasks grow. Use COUNT(*) aggregation queries to compute total and completed without loading rows into Python.

Copilot uses AI. Check for mistakes.
Comment on lines +127 to 147
DB_PATH.parent.mkdir(parents=True, exist_ok=True)

# Create SQLAlchemy engine
DATABASE_URL = f"sqlite:///{DB_PATH}"
engine = create_engine(DATABASE_URL, echo=False)


def init_db():
"""Initialize database - create all tables."""
SQLModel.metadata.create_all(engine)


def get_session():
"""Get database session."""
return Session(engine)


# Initialize database on module import
init_db()


Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing/creating the SQLite DB on module import introduces side effects during import (creates directories/files, runs DDL) which can be problematic in test discovery, tooling, or multi-process deployments. Consider moving initialization to an explicit app startup hook (or a lazy “init once” path) so imports remain side-effect free.

Suggested change
DB_PATH.parent.mkdir(parents=True, exist_ok=True)
# Create SQLAlchemy engine
DATABASE_URL = f"sqlite:///{DB_PATH}"
engine = create_engine(DATABASE_URL, echo=False)
def init_db():
"""Initialize database - create all tables."""
SQLModel.metadata.create_all(engine)
def get_session():
"""Get database session."""
return Session(engine)
# Initialize database on module import
init_db()
# Create SQLAlchemy engine
DATABASE_URL = f"sqlite:///{DB_PATH}"
engine = create_engine(DATABASE_URL, echo=False)
_DB_INITIALIZED = False
def init_db() -> None:
"""Initialize database - ensure directory exists and create all tables.
This function is idempotent and safe to call multiple times.
"""
global _DB_INITIALIZED
if _DB_INITIALIZED:
return
# Create parent directory for the SQLite file if it does not exist.
DB_PATH.parent.mkdir(parents=True, exist_ok=True)
# Create all tables defined on SQLModel metadata.
SQLModel.metadata.create_all(engine)
_DB_INITIALIZED = True
def get_session() -> Session:
"""Get database session, initializing the database on first use."""
init_db()
return Session(engine)
# Database initialization is performed lazily when a session is first requested.

Copilot uses AI. Check for mistakes.
elif hasattr(result, 'model_dump'):
return json.dumps(result.model_dump(mode='json'), indent=2)
elif isinstance(result, bool):
return f"Success: {result}"
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialize_result() returns a non-JSON string for bool results ("Success: True"), while other branches serialize JSON. For MCP clients expecting consistent JSON text (especially if they parse tool outputs), this inconsistency can cause failures; prefer serializing booleans as JSON (e.g., true/false with json.dumps).

Suggested change
return f"Success: {result}"
return json.dumps(result, indent=2)

Copilot uses AI. Check for mistakes.
# ============================================================================

class Task(BaseModel):
class Task(SQLModel, table=True):
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description emphasizes onboarding/documentation improvements, but the diff includes major functional/backend changes (e.g., migrating tasks persistence to SQLite/SQLModel, adding agent builder modules, scheduler/auto-generation). Consider updating the PR description to explicitly call out these runtime-affecting changes (or splitting into separate PRs) so reviewers can scope risk and release impact accurately.

Copilot uses AI. Check for mistakes.
- Cleaned up README.md by removing unnecessary blank lines.
- Updated test_llm_service.py to mock OpenAI API key for better error handling.
- Added multiple new screenshot files for documentation and demo purposes.
- Minor formatting adjustments in setup.sh to remove extra blank lines.
- Changed app.spec.js to wait for the full page load instead of DOM content loaded for more reliable test execution.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
…M service tests and evaluation notebook

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Copilot AI review requested due to automatic review settings March 25, 2026 09:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 68 out of 79 changed files in this pull request and generated 2 comments.

Comment on lines +59 to +64
def test_sentiment_baseline_scores(self):
"""Full baseline run on sentiment should produce a score > 0."""
result = run_baseline("sentiment", get_default_model(), max_eval=5)
assert result.score > 0.0
assert len(result.individual_scores) == 5
assert result.elapsed_seconds > 0
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_baseline is defined as run_baseline(task_id: str, *, max_eval: Optional[int] = None) and uses the currently configured DSPy LM. These calls pass a model as a positional arg, which will raise TypeError: run_baseline() takes 1 positional argument but 2 were given. Configure the model via configure_dspy(model=...) (or dspy.context) and call run_baseline("sentiment", max_eval=...) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +123
def test_bootstrap_fewshot_runs(self):
"""BootstrapFewShot optimization should complete and return scores."""
result = run_optimization(
"sentiment",
get_default_model(),
"BootstrapFewShot",
max_eval=5,
)
assert result.baseline_score >= 0.0
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_optimization is defined as run_optimization(task_id: str, optimizer: str = "BootstrapFewShot", *, max_eval=None, instructions=None) and does not take a model argument. Passing get_default_model() as the second positional argument will be interpreted as the optimizer string, and passing "BootstrapFewShot" as the third positional argument will raise a TypeError. Configure the LM separately (e.g., configure_dspy(get_default_model())) and call run_optimization("sentiment", optimizer="BootstrapFewShot", max_eval=5) (or pass instructions= if intended).

Copilot uses AI. Check for mistakes.
…ChatService and WorkbenchService, and enhance LLM selection tests

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
… script

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Copilot AI review requested due to automatic review settings March 25, 2026 09:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 74 out of 85 changed files in this pull request and generated no new comments.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
…ntRunPage and RunsSidePanel components

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
Copilot AI review requested due to automatic review settings March 25, 2026 10:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 76 out of 87 changed files in this pull request and generated 19 comments.

Comment on lines +103 to +107
def test_ticket_routing_baseline(self):
"""Ticket routing baseline on real CSV-derived data."""
result = run_baseline("ticket_routing", get_default_model(), max_eval=3)
assert result.score >= 0.0
assert len(result.individual_scores) == 3
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: run_baseline is keyword-only after task_id, so passing get_default_model() positionally will raise TypeError. Configure the model via configure_dspy(model=...) before calling run_baseline(...), or change the action signature to accept a model.

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +85
def test_math_baseline_scores(self):
"""Math baseline should run and score without errors."""
result = run_baseline("math_word", get_default_model(), max_eval=3)
assert result.score >= 0.0
assert result.llm_calls > 0
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: run_baseline doesn’t accept a positional model argument (only task_id + keyword args). This call will raise a TypeError unless you configure the model separately via configure_dspy(model=...).

Copilot uses AI. Check for mistakes.
Comment on lines +142 to +145
scores = {}
for model in chat_models:
result = run_baseline("sentiment", model, max_eval=3)
scores[model] = result.score
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop calls run_baseline("sentiment", model, max_eval=3) but run_baseline does not accept a positional model argument. This will raise TypeError; configure the model with configure_dspy(model) before each baseline run (or change run_baseline to accept a model).

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +8
"""
End-to-end tests for the DSPy Playground — runs real LLM calls via LiteLLM.

These tests hit the actual Copilot/LiteLLM backend. They verify the full
pipeline: config → DSPy module → LiteLLM → model API → metric scoring.

Run: cd notebooks && python -m pytest tests/ -v
"""
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are explicitly making real LLM calls (network + cost + rate limits), which makes the default test suite flaky and non-deterministic in CI. Consider gating them behind an env var (e.g. RUN_LIVE_LLM_TESTS=1) and pytest.skip by default, or marking them (e.g. @pytest.mark.live) and excluding that marker in CI.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +7
with open('/Users/abossard/Desktop/projects/python-quart-vite-react/csv/data.csv', encoding='utf-8', errors='replace') as f:
reader = csv.DictReader(f)
rows = list(reader)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script uses an absolute, developer-specific file path (/Users/.../csv/data.csv), which makes it non-portable. Use a repo-relative Path(...) to csv/data.csv instead.

Copilot uses AI. Check for mistakes.
Comment on lines +214 to +233
def _evaluate_examples(module, examples, metric_fn, timeout_per_example: int = 30) -> list[dict]:
import signal

class TimeoutError(Exception):
pass

def _handler(signum, frame):
raise TimeoutError("LLM call timed out")

results = []
for i, ex in enumerate(examples):
print(f" [{i+1}/{len(examples)}]", end=" ", flush=True)
try:
old_handler = signal.signal(signal.SIGALRM, _handler)
signal.alarm(timeout_per_example)
input_kwargs = {k: ex[k] for k in ex.inputs().keys()}
prediction = module(**input_kwargs)
signal.alarm(0)
signal.signal(signal.SIGALRM, old_handler)
score = float(metric_fn(ex, prediction) or 0.0)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signal.SIGALRM is not available on Windows and alarm-based timeouts only work reliably in the main thread. As written, this helper will crash on platforms without SIGALRM. Consider guarding with hasattr(signal, "SIGALRM") and falling back to no timeout (or another timeout mechanism) when unavailable.

Copilot uses AI. Check for mistakes.
Comment on lines +252 to +258
"from dspy_tasks.config import configure_dspy\n",
"from dspy_tasks.data import ClassifySentiment\n",
"import dspy\n",
"\n",
"lm = \n",
"tricky_reviews = [\n",
" (\"Super, schon wieder ein Produkt das nach einer Woche kaputt geht. Genau was ich brauchte.\", \"negative\"),\n",
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook cell contains an incomplete statement (lm =) which will cause a SyntaxError when executed. Remove it or replace it with the intended model configuration (e.g. call configure_dspy(...) and assign the returned LM if you need it).

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +93
"source": [
"\n",
"# SHALLOW: Just predict\n",
"shallow = dspy.Predict(TranslateEnDe)\n",
"result_shallow = shallow(english_text=\"The cat sat on the mat while the dog chased its tail.\")\n",
"\n",
"# DEEP: Chain of Thought\n",
"deep = dspy.ChainOfThought(TranslateEnDe)\n",
"result_deep = deep(english_text=\"The cat sat on the mat while the dog chased its tail.\")\n",
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cell uses dspy and TranslateEnDe but neither is imported in the notebook prior to use, so execution will fail with NameError. Add the missing imports (e.g. import dspy and an import for TranslateEnDe).

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +6
with open('/Users/abossard/Desktop/projects/python-quart-vite-react/csv/data.csv', encoding='utf-8', errors='replace') as f:
reader = csv.DictReader(f)
rows = list(reader)

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script uses an absolute, developer-specific file path (/Users/.../csv/data.csv), which makes it non-portable. Use a repo-relative Path(...) to csv/data.csv instead.

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +6
with open('/Users/abossard/Desktop/projects/python-quart-vite-react/csv/data.csv', encoding='utf-8', errors='replace') as f:
reader = csv.DictReader(f)
rows = list(reader)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script uses an absolute, developer-specific file path (/Users/.../csv/data.csv), which makes it non-portable. Use a repo-relative Path(...) to csv/data.csv instead.

Copilot uses AI. Check for mistakes.
…e widget resolution in SchemaRenderer

Signed-off-by: Andre Bossard <anbossar@microsoft.com>
@abossard abossard merged commit 593f45f into main Mar 25, 2026
2 checks passed
@abossard abossard deleted the dspy-playground branch March 25, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants