Dspy playground by abossard · Pull Request #28 · abossard/python-quart-vite-react

abossard · 2026-03-25T08:56:20Z

This pull request introduces significant improvements to the repository's onboarding, documentation, and developer experience, especially for LLM-powered features and CSV ticket data workflows. The changes clarify project boundaries, add environment and VS Code setup guides, and provide more comprehensive instructions for LLM and KBA Drafter integration. The most important changes are grouped below.

Documentation and Onboarding Improvements:

Expanded the README.md with detailed instructions for usecase demo development, KBA Drafter setup, OpenAI API key configuration, and a new DSPy prompt tuning notebook section. Added links to new and existing documentation for LLM, KBA, and CSV AI workflows. [1] [2] [3] [4]
Added a comprehensive .env.example file with explanations for LLM, OpenAI, and KBA Drafter environment variables, including future SharePoint and ITSM adapters.
Added CLAUDE.md with clear setup, run, test, and architecture instructions for Claude Code, covering both backend and frontend workflows, LLM configuration, and testing strategies.
Rewrote .github/copilot-instructions.md to clarify the repository's learning-focused scope, CSV data structure, and boundaries for Copilot/AI contributions, emphasizing what should and should not be changed.

Developer Experience:

Added .vscode/extensions.json and .vscode/launch.json to recommend key extensions and provide launch configurations for backend, frontend, and notebook workflows, supporting full stack and agentic development. [1] [2]

Environment and Tooling:

Updated .claude/settings.local.json and .dockerignore to align with the .venv Python environment, ensuring consistent activation and Docker context. [1] [2]
Added .gitattributes to strip outputs from Jupyter notebooks and enable proper notebook diffs, supporting clean version control for notebook-based workflows.

These changes collectively make the repository easier to set up, contribute to, and use for LLM-powered CSV ticket analysis and KBA generation.

Co-authored-by: abossard <86611+abossard@users.noreply.github.com>

…ted On" Co-authored-by: abossard <86611+abossard@users.noreply.github.com>

…Node 20 LTS Co-authored-by: abossard <86611+abossard@users.noreply.github.com>

- Change virtual environment creation to use `.venv` at the repo root - Update activation commands in various documentation files - Modify setup and start scripts to reflect new virtual environment structure - Ensure consistency across installation guides and troubleshooting documentation

…u guide

…sites-ubuntu

…pment

…ch.json

…opment

- Added `httpx` dependency for async HTTP requests to Ollama API. - Implemented OllamaChat component in frontend for user interaction with the LLM. - Created backend service for handling chat requests and model listing. - Updated setup scripts to check for Ollama installation and pull required models. - Added API endpoints for chat and model listing in the backend. - Implemented end-to-end tests for Ollama integration, covering model listing and chat functionality. - Enhanced error handling and user feedback in the chat interface.

* feat: Implement MCP JSON-RPC 2.0 handler and refactor API decorators Signed-off-by: Andre Bossard <anbossar@microsoft.com> * fix: Update API decorators for optional HTTP path and clean up imports docs: Enhance LEVEL_UP.md with Copilot chat testing instructions Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Migrate task management to SQLModel ORM and update related documentation Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Implement LangGraph agent with Azure OpenAI integration and extend API decorators for tool conversion Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Refactor AgentService to use OpenAI SDK and enhance tool integration Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Refactor AgentService to integrate LangGraph and replace OpenAI SDK usage Signed-off-by: Andre Bossard <anbossar@microsoft.com> * fix: Correct docstring formatting in tool_wrapper for consistency Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Update documentation structure for Day 4 lessons and announcements Signed-off-by: Andre Bossard <anbossar@microsoft.com> --------- Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: update Azure OpenAI configuration and streamline environment variable usage Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: integrate FastMCP client for external tool support and add tests Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: implement Ticket MCP integration with FastMCP client and add REST endpoints for ticket management Signed-off-by: Andre Bossard <anbossar@microsoft.com> --------- Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Add QA tickets management with new TicketList component and API integration * feat: Add initial diagram for project planning in explain.drawio Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Add RULES.md to document project guidelines Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Add ticket models and reminder functionality for "Assigned without Assignee" Signed-off-by: Andre Bossard <anbossar@microsoft.com> * refactor: Rearrange imports and enhance startup logging for REST API and MCP JSON-RPC Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Enhance ticket handling by adding mapping functions and updating QA tickets endpoint Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Add TicketsWithoutAnAssignee component to display unassigned tickets Signed-off-by: Andre Bossard <anbossar@microsoft.com> * refactor: Clean up code formatting and improve ticket handling in various components Signed-off-by: Andre Bossard <anbossar@microsoft.com> * Refactor Ollama integration to use Azure OpenAI agent; remove OllamaChat component and related API calls, add AgentChat component for task management; update frontend routing and backend operations accordingly. Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Enhance AgentService with detailed logging for MCP tool calls and agent execution Signed-off-by: Andre Bossard <anbossar@microsoft.com> --------- Signed-off-by: Andre Bossard <anbossar@microsoft.com>

…ariable handling in agents.py (#11) Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Implement CSV Ticket Viewer - Refactor App component to replace existing features with CSV Ticket Table. - Add CSVTicketTable component for displaying tickets from CSV data source. - Introduce API functions for fetching CSV ticket fields, tickets, and statistics. - Create CSV data source in backend to handle loading and processing of CSV files. - Enhance AgentChat component to display error details from API responses. - Update styles and layout for improved user experience in ticket viewing. Signed-off-by: Andre Bossard <anbossar@microsoft.com> * refactor: Update import formatting and enhance status badge display in CSVTicketTable component Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Add Nivo chart visualizations for CSV tickets and enhance documentation Signed-off-by: Andre Bossard <anbossar@microsoft.com> --------- Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* refactor: update configuration from Azure OpenAI to OpenAI and enhance agent service initialization Signed-off-by: Andre Bossard <anbossar@microsoft.com> * refactor: update AgentChat component for OpenAI integration and enhance markdown support Signed-off-by: Andre Bossard <anbossar@microsoft.com> --------- Signed-off-by: Andre Bossard <anbossar@microsoft.com>

- Added backend orchestration for usecase demo agent runs in `usecase_demo.py`. - Created documentation for CSV ticket guidance in `CSV_AI_GUIDANCE.md`. - Developed frontend components for usecase demo description and page in `UsecaseDemoDescription.jsx` and `UsecaseDemoPage.jsx`. - Introduced demo definitions for usecase demos in `demoDefinitions.js`. - Implemented result views for structured table and markdown in `resultViews.jsx`. - Added utility functions for handling usecase demo runs in `usecaseDemoUtils.js`. - Included a network diagram in `net.drawio`.

* feat: Enhance SLA Breach Risk functionality and UI integration - Increased max_length for agent prompt to 5000 - Added fields parameter to list and search tickets for selective data retrieval - Updated timeout for usecase demo agent to 300 seconds - Introduced SLA Breach Risk demo with detailed prompt and ticket analysis - Added E2E tests for SLA Breach Risk demo page * feat: add incident_id field to ticket model and related components - Added incident_id to the ticket mapping in app.py. - Updated csv_data.py to include incident_id when converting CSV rows to tickets. - Modified operations.py to define incident_id as a CSV ticket field. - Enhanced the Ticket model in tickets.py to include incident_id. - Updated usecase_demo.py to accommodate changes in ticket structure. - Modified CSVTicketTable.jsx to display incident_id in the ticket table. - Updated TicketList.jsx to filter and display incident_id in the ticket list. - Enhanced TicketsWithoutAnAssignee.jsx to include incident_id in ticket operations. - Updated UsecaseDemoPage.jsx to pass matchingTickets to the render function. - Enhanced demoDefinitions.js to improve prompts for use case demos. - Added SLA Breach Overview result view in resultViews.jsx to visualize SLA status of tickets. Signed-off-by: Andre Bossard <anbossar@microsoft.com> * refactor: clean up import statements across multiple components Signed-off-by: Andre Bossard <anbossar@microsoft.com> * refactor: standardize import statement formatting in resultViews.jsx Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: add SLA breach reporting functionality and related API endpoints Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: implement SLA breach report retrieval for unassigned tickets Signed-off-by: Andre Bossard <anbossar@microsoft.com> --------- Signed-off-by: Andre Bossard <anbossar@microsoft.com>

…ig.js (#16) Co-authored-by: luca Spring <luca.spring@bit.admin.ch>

* feat: Implement Tool Registry and Workbench Integration - Added ToolRegistry class to manage LangChain StructuredTool instances. - Created workbench_integration.py to wire tools into the Agent Workbench. - Developed WorkbenchPage component for agent management in the frontend. - Implemented backend tests for tool registration and agent operations. - Added end-to-end tests for agent creation and deletion in the UI. Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Refactor Agent Workbench to Agent Fabric and enhance tool metadata handling Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Add required input handling to agent definitions and update UI components Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Enhance Markdown output handling in agent workflow and update UI components Signed-off-by: Andre Bossard <anbossar@microsoft.com> --------- Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* feat: Implement Tool Registry and Workbench Integration - Added ToolRegistry class to manage LangChain StructuredTool instances. - Created workbench_integration.py to wire tools into the Agent Workbench. - Developed WorkbenchPage component for agent management in the frontend. - Implemented backend tests for tool registration and agent operations. - Added end-to-end tests for agent creation and deletion in the UI. Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Refactor Agent Workbench to Agent Fabric and enhance tool metadata handling Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Add required input handling to agent definitions and update UI components Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Enhance Markdown output handling in agent workflow and update UI components Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Enhance ticket handling by adding incident ID support and improve UI components for better user experience Signed-off-by: Andre Bossard <anbossar@microsoft.com> * feat: Add tool invocation logging with latency tracking in WorkbenchService Signed-off-by: Andre Bossard <anbossar@microsoft.com> --------- Signed-off-by: Andre Bossard <anbossar@microsoft.com>

* kba-draft implementiert * - test dateien entfernt - struktur aufgeräumt - README.md angepasst - learning_mechanism.md plan erstellt - desing fixes * feat: add search questions generation with database migration and UI Database & Backend: - Add search_questions column migration in operations.py (ALTER TABLE for existing databases) - Add /api/kba/drafts/{id}/replace endpoint in app.py - Fix backward compatibility in kba_service.py (_table_to_draft, _draft_to_table) - Add search questions generation to replace_draft workflow - Fix NULL constraint errors by ensuring empty strings for required fields - Update related_tickets validation: accept INC + 9-12 digits (was fixed at 12) Frontend: - Add Text component import to KBADrafterPage.jsx (fix TypeError) - Add full-screen blur overlay with centered spinner during KBA generation - Show overlay for both new draft creation and replacement operations - Update styles: loadingOverlay with backdrop-filter blur effect Documentation: - Update kba_prompts.py: clarify related_tickets format with examples - Update GENERAL.md: correct related_tickets format specification Fixes #1 - KBA drafts not loading (missing DB column) Fixes #2 - Replace endpoint not found (405 error) Fixes #3 - Ticket ID validation too strict * tickets in popup ansehen * feat(kba-drafter): add ability to reset reviewed KBAs back to draft - Add "Zurück zu Entwurf" button for reviewed status KBAs - Add handleUnreview() handler to update status from "reviewed" to "draft" - Import ArrowUndo24Regular icon for the unreview action - Allow users to continue editing KBAs after review without deletion This enables editing of reviewed KBAs that need changes before publishing. * feat(kba-drafter): add ticket viewer, unreview, status filter, and UI improvements - Add ticket viewer dialog to display original incident details * New "Ticket" button in KBA header with DocumentSearch icon * Modal dialog showing incident data (ID, summary, status, priority, assignee, notes, resolution) * Backend endpoint /api/csv-tickets/by-incident/<incident_id> for incident ID lookup * Frontend API function getCSVTicketByIncident() - Add unreview functionality for reviewed KBAs * "Zurück zu Entwurf" button with ArrowUndo icon * Allows resetting reviewed KBAs back to draft status for further editing - Redesign KBA overview list * Replace corner delete button with professional overflow menu (⋮) * Horizontal layout: content left, status badge right-aligned, menu button * Menu component with delete option - Add status filter dropdown to KBA overview * Filter options: All, draft, reviewed, published * Dropdown in card header for easy filtering - Align EditableList "Add" button width with input fields * Use invisible placeholder buttons for exact width matching * Ensures consistent layout regardless of allowReorder setting Files modified: - frontend/src/features/kba-drafter/KBADrafterPage.jsx - frontend/src/features/kba-drafter/components/EditableList.jsx - frontend/src/services/api.js - backend/app.py * fix(kba): fix draft deletion bug and add collapsible AutoGenSettings - Fix delete draft error: use response.items instead of response.drafts - Make AutoGenSettings card collapsible with chevron icon - Starts collapsed to reduce visual dominance - Smooth slide-down animation when expanded - Status badge visible in collapsed header - Clickable header with keyboard support (Enter key) * fix(kba): auto-scroll to top when opening draft When clicking on a draft from the list after scrolling down, the page now automatically scrolls to the top with a smooth animation. This ensures users always start at the beginning of the draft content. * feat: replace browser confirms with custom modal dialogs for unsaved changes Replace native window.confirm() with ConfirmDialog component for better UX consistency and modern appearance. Adds centered warning modal when user attempts to discard unsaved changes (close draft, switch to preview, or load different draft). Changes: - Add unsavedChangesDialogOpen and pendingAction states - Update toggleEditMode, loadDraft, and handleClose to trigger modal - Add handleDiscardChanges and handleCancelDiscard handlers - Add ConfirmDialog with warning intent at end of component * fix: address code review issues and add KBA drafter e2e tests Fixes: - Fix CSV folder case mismatch (CSV -> csv) in app.py and operations.py - Remove duplicate get_ticket_by_incident_id method in csv_data.py - Replace inefficient len(session.exec().all()) with SQL COUNT(*) in kba_service.py - Replace hardcoded placeholder credentials with env var lookups in kba_service.py - Fix scheduler swallowing exceptions (remove bare raise, return None) - Add settings reload at start of each scheduler run to fix race condition - Add generation_warnings field to surface search questions failures to users - Add schema migration for generation_warnings column Tests: - Add 19 Playwright e2e tests for KBA Drafter feature covering: page load, navigation, LLM health status, draft generation, draft display, draft list, editing, review workflow, duplicate handling, and backend API integration Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add LiteLLM fallback, Playwright tests, and remove OpenAI hard dependency - LiteLLM is now the default LLM backend (no .env or API key needed) - Multistage model fallback chain: claude-sonnet-4 → gpt-4o → gpt-4o-mini - OpenAI SDK still used when OPENAI_API_KEY is explicitly set - agents.py and workbench service use ChatLiteLLM when no OpenAI key - Added csv_ticket_stats and csv_sla_breach_tickets to agent tools - Added KBA Drafter to Playwright nav tests and menu screenshots - Added e2e tests: publish, delete, status filter, ticket viewer - 32 unit tests + 5 live integration tests for LLM service - Updated .env.example with LiteLLM-first documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: SubSonic731 <alessandro.roschi@bit.admin.ch> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@operation

* Extract agent builder into extensible module with tests Create backend/agent_builder/ as a standalone, deeply layered module following Grokking Simplicity (data/calculations/actions separation) and A Philosophy of Software Design (deep modules). Structure: - models/: Pure data (Pydantic/SQLModel) - agent, run, evaluation, chat - tools/: ToolRegistry, schema converter, MCP adapter - engine/: Unified ReAct runner, callbacks, prompt builder - evaluator.py: Success criteria evaluation (mostly calculations) - persistence/: DB engine setup + repository pattern - service.py: WorkbenchService (deep module facade) - chat_service.py: ChatService using shared ReAct engine - routes.py: Quart Blueprint replacing 200+ lines from app.py - tests/: 107 tests (unit + integration + E2E) Key improvements: - Eliminated duplicate ReAct agent building (was in both agents.py and agent_workbench/service.py) - DRY error handling in routes via Blueprint - Repository pattern isolates DB from business logic - Pure calculation modules (prompt_builder, schema_converter, evaluator) are independently testable - Backward-compatible: agent_workbench/__init__.py shims to new module Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add per-agent LLM config: model, temperature, recursion_limit, max_tokens, output_instructions Each AgentDefinition now stores configurable LLM parameters: - model: override service default (e.g. gpt-4o vs gpt-4o-mini) - temperature: 0.0-2.0 (deterministic to creative) - recursion_limit: 1-100 max ReAct loop iterations - max_tokens: cap response length (0 = unlimited) - output_instructions: custom formatting (replaces default markdown) Changes: - models/agent.py: 5 new fields with validation (ge/le bounds) - persistence/database.py: migrations for existing DBs - engine/react_runner.py: build_llm accepts temperature+max_tokens - engine/prompt_builder.py: append_output_instructions for custom formatting - service.py: _resolve_llm_for_agent builds per-agent LLM when config differs - routes.py: ui-config v2 exposes llm_config_fields and defaults - 12 new tests (model validation, CRUD, E2E roundtrip via REST) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add output_schema for type-safe structured output, fix defaults Changes: - recursion_limit default: 10 → 3 (most agents finish in 1-3 tool calls) - max_tokens default: 0 → 4096 (sensible cap instead of unlimited) - New field: output_schema (JSON Schema stored as JSON in DB) output_schema is config, not code. You define the expected response shape as a JSON Schema: {"type":"object","properties":{"breaches":{"type":"array",...}}} At runtime this does two things: 1. Injected into system prompt so the LLM knows the expected structure 2. Takes priority over output_instructions and default markdown Priority chain for output formatting: output_schema (strict JSON) > output_instructions (free text) > default markdown 128 tests pass (9 new tests for schema handling). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add suggest-schema endpoint and UI button New endpoint: POST /api/workbench/suggest-schema Takes agent name, description, system_prompt and asks the LLM to propose a JSON Schema for the agent's structured output. Backend: - service.py: suggest_schema() method - builds a prompt, calls LLM, parses JSON response (handles markdown fences), falls back to generic schema on parse failure - routes.py: POST /api/workbench/suggest-schema route Frontend: - api.js: suggestOutputSchema() function - WorkbenchPage.jsx: output schema textarea + Suggest Schema button in the create form. Schema is editable JSON, sent as output_schema on agent creation. Button disabled until name or prompt is filled. 129 tests pass (1 new E2E test for suggest-schema endpoint). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Wire output_schema to LangGraph response_format for SDK-level enforcement When an agent has output_schema configured, it now does TWO things: 1. Prompt injection (existing) — schema is described in the system prompt so the LLM understands the expected structure 2. SDK enforcement (new) — schema is passed as response_format to create_react_agent(), which uses LangGraph's built-in structured output mechanism (provider-native or tool-based) At runtime, structured_response from the LangGraph result takes priority over raw message content. If the agent has no output_schema, behavior is unchanged (markdown output from final message). The output pipeline: output_schema defined → response_format=schema → structured_response → JSON no output_schema → final message content → markdown (default) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Always use structured_response with default schema Every agent now always returns structured output via LangGraph's response_format — no more untyped markdown strings. Default schema (when no custom output_schema is set): { "message": "string (markdown)", "referenced_tickets": ["string"] } This means: - Plain agents → get {message: '...markdown...', referenced_tickets: [...]} - Custom schema agents → get whatever schema they define - Both enforced at SDK level via response_format, not just prompt Changes: - prompt_builder.py: DEFAULT_OUTPUT_SCHEMA, resolve_output_schema() - service.py: always passes effective schema to create_react_agent - routes.py: ui-config exposes default_output_schema for frontend - Tests updated (132 pass) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add comprehensive docs with mermaid diagrams, clean up stale docs New: docs/AGENT_BUILDER.md — full architecture documentation with: - Architecture diagram (module layers + data flow) - Sequence diagram (agent run lifecycle) - Structured output pipeline flowchart - ER diagram (DB schema) - Data/Calculations/Actions separation diagram - Deep modules table - Extensibility flowchart - API endpoint reference - Testing commands Updated: - AGENTS_IMPLEMENTATION.md — replaced stale content with summary + pointer - docs/AGENTS.md — replaced stale architecture with mermaid + pointer - docs/PROJECT_STRUCTURE.md — added agent_builder/ to tree Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Docs overhaul + remove ~1800 lines of dead code/stale docs Documentation: - README.md: Complete rewrite with features table, screenshots, mermaid architecture diagram, agent builder section, correct tech stack - PROJECT_STRUCTURE.md: Full rewrite matching actual codebase - AGENTS.md: Fixed AgentService→WorkbenchService, updated examples - LEARNING.md: Fixed broken link Deleted stale docs: - AGENTS_IMPLEMENTATION.md (was a 3-line redirect stub) - docs/RULES.md (empty file) - docs/SQLMODEL_MIGRATION.md (historical, migration complete) Dead code removed from agents.py (~250 lines): - MCP client stubs (_mcp_tool_to_langchain, _ensure_ticket_mcp_connection, close) - Schema helpers only used by dead MCP code (_json_type_to_python, _schema_to_pydantic) - OpenAI logging callback (duplicated in agent_builder/engine/callbacks.py) - _build_state_graph learning example (dead code) - Unused imports (get_langchain_tools, MCPClient, create_model) Deleted old agent_workbench/ source files (~1030 lines): - models.py, service.py, evaluator.py, tool_registry.py - Only __init__.py shim remains for backward compatibility 132 backend tests + 15 Playwright tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add Playwright tests for suggest-schema and agent chat New E2E tests in workbench.spec.js: - 'creates agent with output schema via suggest button' — mocks /api/workbench/suggest-schema, clicks Suggest Schema, verifies schema populates textarea, creates agent, deletes it - 'sends message and displays mocked response' (Agent Chat UI) — mocks /api/agents/run, types message, clicks send, verifies markdown heading and tool badge render 17 Playwright tests pass (was 15, +2 new). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add VPN agent and failure handling Playwright tests New Agent Fabric E2E tests: - 'runs VPN troubleshooting agent and verifies structured output' Creates agent with VPN analysis prompt, runs it (mocked), verifies structured JSON output with ticket IDs (INC-101, INC-312), referenced_tickets field, and VPN content in rendered output - 'handles agent run failure gracefully' Creates agent, runs it with mocked failure response, verifies UI doesn't crash and shows completion state 19 Playwright tests pass (was 17, +2 new). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix structured output rendering in Agent Fabric UI The output is now always structured JSON ({message, referenced_tickets}). The UI now parses it and renders each part appropriately: - message → rendered as GitHub-flavored Markdown (ReactMarkdown) - referenced_tickets → rendered as monospace badges below the output - Extra custom schema fields → rendered as formatted JSON in a pre block - Button preview → shows message text, not raw JSON Also handles non-JSON output gracefully (falls back to raw markdown). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add MCP App technical documentation New: docs/MCP_APP.md — comprehensive guide on how this project works as an MCP application: - What an MCP App is (app that exposes business logic via MCP protocol) - Architecture diagrams: consumers (Claude, Copilot, agents) → MCP endpoint - Full protocol sequence diagram (initialize → tools/list → tools/call) - The @operation decorator: single source of truth for REST + MCP + LangChain - How to connect clients (Claude Desktop, Python, curl examples) - 4-layer architecture diagram (business logic → operations → adapters → consumers) - Extension roadmap: Resources, Prompts, SSE streaming - Security considerations table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add SchemaRenderer + visual SchemaEditor with x-ui widget system SchemaRenderer (frontend/src/features/workbench/SchemaRenderer.jsx): - Generic component: takes {data, schema} and renders each property using x-ui widget annotations - Widgets: markdown, table, badge-list, stat-card, bar-chart (Nivo), pie-chart (Nivo), json, hidden - Auto-detection when no x-ui: string→markdown, integer→stat-card, array of objects→table, array of strings→badge-list, object→json - Console debug logging, data-testid per field for E2E testing SchemaEditor (frontend/src/features/workbench/SchemaEditor.jsx): - Visual property list editor (no raw JSON editing needed) - Add/remove properties, set name/type/description - Widget picker dropdown with all available widgets - Context-sensitive options (columns for table, label for stat-card, indexBy/keys for bar-chart) - Syncs with suggest-schema: LLM suggestion populates visual editor - Outputs valid JSON Schema with x-ui annotations Backend: - DEFAULT_OUTPUT_SCHEMA now has x-ui annotations (markdown + badge-list) - suggest_schema prompt updated to suggest x-ui widgets per property Wiring: - WorkbenchPage uses SchemaRenderer for run output (replaces hardcoded) - WorkbenchPage uses SchemaEditor for create form (replaces textarea) 20 Playwright tests pass (including new SchemaRenderer widget test). 132 backend tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Improve suggest-schema prompt with full data domain + widget docs The suggest-schema LLM prompt now includes: - Ticket data domain (all field names, types, enum values, example cities) - Available tools with descriptions (csv_list_tickets, csv_search_tickets, etc.) - Full widget documentation with use-cases and options for each: markdown, table (columns), badge-list, stat-card (label), bar-chart (indexBy, keys), pie-chart, json, hidden - Explicit rules: always include message+referenced_tickets, match widget to data shape, use snake_case names This gives the LLM enough context to suggest schemas that actually match the ticket data (e.g. status distribution → pie-chart, ticket list → table with incident_id/summary/status columns). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix latency issues: schema title bug + recursion_limit headroom Investigation found 3 root causes for slow AI calls: 1. gpt-5-nano is a REASONING model — burns 192-832 reasoning tokens per LLM call (invisible chain-of-thought), taking 2-8s each. A simple 'say hello' costs 8.4s with 832 reasoning tokens. 2. response_format adds a 3rd LLM call — LangGraph's generate_structured_response makes a separate LLM call to format the output as JSON after the ReAct loop finishes. Without: 4.7s (2 calls). With: 13s (3 calls). 3. Missing 'title' in output_schema crashed with_structured_output. OpenAI's API requires a top-level 'title' in the JSON Schema. Fixes applied: - resolve_output_schema() now auto-adds 'title': 'AgentOutput' when missing (both default and custom schemas) - DEFAULT_OUTPUT_SCHEMA has explicit 'title' field - recursion_limit: user's setting (default 3) is now multiplied by 4 for the actual LangGraph graph, with a floor of 10. This prevents GraphRecursionError when response_format adds extra graph steps. Note: The main latency driver (reasoning tokens) is inherent to the model choice. Users can switch to gpt-4o-mini via per-agent 'model' field for ~10x faster non-reasoning responses. 133 backend + 20 Playwright tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix agent tool token bloat: compact fields + lower default limits Root cause: csv_list_tickets tool returned full Ticket objects with ALL fields (notes, description, resolution, work logs) — ~65K tokens for 100 tickets. The LLM had to process all of this, causing 30-60s per step with a reasoning model. Changes to operations.py: - csv_list_tickets: returns compact dicts (10 fields, not 30+), default limit 25 (was 100), max limit 100 (was 500) - csv_search_tickets: same compact treatment, limit 25 (was 50) - csv_get_ticket: now accepts optional 'fields' parameter for selective detail drill-down, returns dict (was full Ticket) - Tool descriptions updated to guide agents: 'use csv_get_ticket for full details' pattern Token impact per tool call: Before: 100 tickets × ~400 tokens = ~65,000 tokens After: 25 tickets × ~60 tokens = ~1,500 tokens (97% reduction) Expected latency improvement: Before: ~13s per tool call (65K token input processing) After: ~3-5s per tool call (1.5K token input) 153 tests pass (133 backend + 20 Playwright). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Drop response_format to eliminate extra LLM call LangGraph 1.0.8 implements response_format via a SEPARATE LLM call (generate_structured_response) — adding 5-10s latency per run. The refactor to inline tool-based structured output (github.com/ langchain-ai/langgraph/issues/5872) hasn't shipped yet. Fix: remove response_format from create_react_agent. The system prompt already instructs the LLM to produce JSON matching the schema (via append_output_instructions). The frontend's SchemaRenderer handles both parsed JSON and raw text gracefully. Latency impact: Before: 3 LLM calls (decide tool + answer + format JSON) ~13s After: 2 LLM calls (decide tool + answer as JSON) ~5s When LangGraph ships inline structured output, we can re-enable response_format with zero code changes (just pass it back to build_react_agent). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Enable OpenAI JSON mode for guaranteed valid JSON output Adds response_format: {type: 'json_object'} to the ChatOpenAI constructor via model_kwargs. This is a model-level setting that constrains token generation to valid JSON — no extra LLM call, no post-processing, just guaranteed JSON from every response. This is different from LangGraph's response_format parameter (which adds a separate LLM call). This is OpenAI's native JSON mode applied at the API level during the same call. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Revert JSON mode — incompatible with non-strict tool schemas OpenAI's response_format: json_object requires all tools to have strict schemas. Our tools (from @operation decorator) don't set strict=True, causing: 'csv_search_tickets is not strict. Only strict function tools can be auto-parsed'. Reverting to prompt-only JSON enforcement, which tested at 3/3 reliability with gpt-5-nano. The frontend fallback (wraps non-JSON as {message: raw_text}) provides additional safety. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add widget E2E tests + strict tools + Agent Chat JSON mode New Playwright tests (23 total, +3): - 'renders bar-chart and pie-chart from x-ui annotations' — injects mock agent with output_schema containing x-ui widgets, verifies SVG rendering for pie/bar charts, stat-card with label, badges - 'renders raw JSON for object data' — verifies auto-detection: objects render as formatted JSON in pre blocks - 'falls back gracefully for non-JSON output' — verifies plain markdown string wraps as {message: text} and renders correctly Agent Chat (agents.py) fixes: - Added JSON output mode (response_format: json_object) - Added strict=True tool binding for compatibility - Matches the same pattern as agent_builder Strict tool binding (react_runner.py): - build_react_agent pre-binds tools with strict=True - Required for OpenAI JSON mode (response_format: json_object) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix NameError: OpenAICallLoggingCallback was removed but still referenced The class was deleted in the dead code cleanup but agents.py still used it. Replaced with make_llm_logging_callback from agent_builder. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add 'Show in Menu' — agents appear as tabs in navigation When an agent has show_in_menu=true, it appears as a tab in the main navigation bar. Clicking it opens a dedicated run page with just the input field, run button, and SchemaRenderer output. Backend: - AgentDefinition: new show_in_menu bool field (default false) - AgentDefinitionCreate/Update: show_in_menu parameter - Migration for existing DBs - Service wires it through create/update Frontend: - WorkbenchPage: 'Show in menu' checkbox in create form - App.jsx: fetches agents with show_in_menu=true, injects as tabs - AgentRunPage.jsx: simple standalone run page (title, description, optional input, run button, SchemaRenderer output) - Dynamic routes: /agent-run/{agentId} E2E test: - Creates agent via API with show_in_menu=true - Verifies tab appears in navigation with agent name - Clicks tab, verifies AgentRunPage renders - Runs agent (mocked), verifies output with SchemaRenderer 24 Playwright + 133 backend = 157 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add missing tools to chat agent: csv_sla_breach_tickets, csv_ticket_stats The SLA Breach page was slow because the chat agent (agents.py) didn't have the csv_sla_breach_tickets tool. The prompt said 'call csv_sla_breach_tickets' but the tool didn't exist, so the LLM tried to replicate SLA breach logic manually using csv_list_tickets — fetching many tickets and reasoning over them. Now the chat agent has all 6 CSV tools matching the operations: - csv_list_tickets (existing) - csv_get_ticket (existing) - csv_search_tickets (existing) - csv_ticket_fields (existing) - csv_sla_breach_tickets (NEW — pre-computed, ~1000 tokens) - csv_ticket_stats (NEW — aggregated stats, ~350 bytes) Expected improvement: 1 tool call (~1000 tokens) instead of multiple list calls + manual reasoning (~30-60K tokens). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add ticket detail modal and enhance CSV ticket table functionality Signed-off-by: Andre Bossard <anbossar@microsoft.com> * Refactor CSVTicketTable component: reorder DialogActions import for consistency Signed-off-by: Andre Bossard <anbossar@microsoft.com> * Add reasoning_effort config + new tools for major speed improvement Performance: - reasoning_effort='low' as default — reduces gpt-5-nano from 512 reasoning tokens (~7s) to 0-192 tokens (~1-3s) per LLM call - Configurable per agent: low (fast), medium, high (deep), default - Both agent_builder and legacy chat agent use reasoning_effort='low' New tools: - csv_count_tickets: count matching tickets WITHOUT returning data. Lets the LLM check 'how many VPN tickets?' (~50 tokens) before deciding to fetch details (~5000 tokens) - csv_search_tickets_with_details: search + return full details (notes, resolution, description) in ONE call. Eliminates the N × csv_get_ticket drill-down pattern that caused the 'Ticket Knowledgebase Creator' to make 5+ sequential LLM calls Impact on 'Ticket Knowledgebase Creator' agent: Before: search(compact) → get_ticket × N → generate = 5+ LLM calls × ~5s = 25s+ After: search_with_details(query, limit=10) → generate = 2 LLM calls × ~2s = 4s Also fixed: removed stale response_format: json_object from build_llm (was causing strict tool errors). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update incident details in FALL_2_HARDWARE_PERIPHERIE and FALL_3_ZUGRIFF_BERECHTIGUNG documentation for consistency and clarity Signed-off-by: Andre Bossard <anbossar@microsoft.com> * Fix: all E2E tests now clean up created agents Two tests were creating agents via the UI but not deleting them, leaving orphans in the DB after each test run: - 'runs an agent and appends output to run button' - 'requires and forwards configured run input' Added Delete button clicks at the end of both tests. All 10 agent-creating tests now verified to clean up. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Rewrite workbench e2e tests for tabbed UI - Add helpers: goToCreateTab, goToAgentsTab, createAgent, createAgentViaAPI, deleteAgentViaAPI, mockEmptyRuns - Update 'creates and deletes' to use Create Agent tab and agent cards - Update 'runs an agent' to verify output in RunsSidePanel - Update 'requires input' to use card inline input field + Go button - Update 'suggest schema' to navigate to Create tab first - Update 'failure handling' to check error in run detail panel - Refactor SchemaRenderer tests to use setupSchemaTest helper (API-created agents, run output in side panel) - Keep Agent Chat UI and Show in Menu tests unchanged Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: redesign workbench with agent cards, runs side panel, and tabbed layout - Split WorkbenchPage into tabbed UI: Agents (cards grid) + Create Agent - AgentCardsPanel: icon cards with Run/Edit/Delete buttons per agent - RunsSidePanel: scrollable run history with click-to-view output - AgentEditDialog: edit existing agents via dialog - AgentCreateForm: extracted creation form (reusable for create + edit) - Added API functions: updateWorkbenchAgent, listAllRuns, getRun - All 47 Playwright tests pass (12 workbench tests updated for new UI) - Removed Ollama references from setup.sh and package.json Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: LiteLLM fallback in agent_builder + add live lifecycle test - Fixed agent_builder/engine/react_runner.py: ChatLiteLLM when no API key - Fixed agent_builder/service.py: removed hard OpenAI key requirement - Fixed agent_builder/chat_service.py: same - Fixed RunsSidePanel output parsing for raw string output - Added full lifecycle e2e test (live LLM): create → run → edit → re-run → verify history → delete Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: suggest schema & tools, default no tools, pure function refactor - 'Suggest Schema & Tools' button: LLM suggests output schema AND tool selection - Backend: _build_suggest_prompt and _parse_suggest_response as pure functions - Frontend: tools default to empty, populated by suggest response - RunsSidePanel: pure calculations extracted (buildAgentMap, sortRunsNewestFirst, resolveOutputSchema, resolveAgentName, parseRunOutput, formatRelativeTime) - All 49 Playwright tests pass (2 live LLM tests included) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: result dialog, chart rendering, markdown fence parsing - Run results now open in a large Dialog (900px wide, 85vh max) - Fixed parseRunOutput: strips markdown code fences from LLM output - Fixed PieChartWidget: filters non-numeric values, formats labels - Fixed BarChartWidget: accepts object {key: number} in addition to arrays - Chart containers: 300px height, 600px max-width - Tests: close dialog before cleanup (dialog blocks pointer events) - All 49 Playwright tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: all-live Playwright tests, result dialog fix, runs panel fix - Rewrote workbench tests: ZERO mocks, all 8 tests use live LLM - Fixed RunsSidePanel: min-height for layout, runs visible on load - Fixed parseRunOutput: strips markdown fences from LLM output - Fixed chart widgets: pie/bar handle non-numeric values, proper sizing - Fixed dialog close: tests use X button (in viewport) not Close (scrolled) - Total: 43 tests, all passing, all live (1.1 min) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: extract shared parseRunOutput, add delete-all-runs - Extracted parseRunOutput (fence-stripping + JSON parsing) into outputUtils.js — shared by RunsSidePanel and AgentRunPage - Fixed AgentRunPage (show_in_menu): renders markdown instead of raw JSON - Added DELETE /api/workbench/runs endpoint + trash button in Runs panel - Runs panel: min-height 500px so content is visible on load Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add SSE activity monitor, settings page, agent templates & run history - Agent Activity page with real-time SSE event stream (tool calls, LLM thinking, run lifecycle), filterable by run_id via URL query param - EventBus pub/sub + StreamingCallbackHandler wired into ReAct engine - Settings page: drag-and-drop tab reorder, hide/show toggles, icon picker (57 FluentUI icons), persisted to localStorage - Agent templates dropdown (KBA from tickets, worklog stats, next step advisor) pre-fills the create agent form - AgentRunPage now shows filtered run history with detail dialog and link to Activity page filtered by run_id - 19 new Playwright E2E tests (8 activity + 11 settings) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add Support Workflow canvas page with interactive editor Purely browser-side workflow visualization using HTML Canvas: - 5 node types: Start, End, Action, Decision, Wait (each with distinct shape and color) - Drag-and-drop to reposition nodes - Shift+drag to create connections between nodes - Double-click to rename nodes inline - Animate button shows flowing dots along edges - Toolbar to add/delete nodes, reset to default workflow - Default workflow: Ticket Created → Auto-Classify → Priority decision → L1/L2 paths → Resolved decision → Close/Reopen - 9 Playwright E2E tests with screenshots Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: metro-map workflow with presets, color picker, agent assignment Rewrite WorkflowPage as metro-map style inspired by Incident & Problem Solving methodology: - 3 workflow presets: Incident Solving, Problem Solving, Change Mgmt - Metro station circle nodes with thick colored edge lines - Edge color inherited from outgoing node - Click node → dialog with color picker (8 colors) and agent selector (10 agent presets) - Agent indicator dot on nodes with assigned agents - Color legend auto-generated from used colors - 12 Playwright E2E tests covering presets, node config, animation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: friendlier workflow editor — connect mode, double-click add, dialog edges - Connect Mode toggle button: click source node then target to draw edge (no shift key needed). Crosshair cursor + green '+' hint on target. - Double-click empty canvas area to add a node at that position - Node dialog now has 'Connect to…' section with buttons for each unconnected node — draw edges without touching the canvas - Add Node button opens config dialog immediately for the new node - Dynamic help text updates based on current mode - Escape key exits connect mode - Updated Playwright tests for new UX Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add 'Improve my Prompt' button to Agent Fabric LLM-powered prompt improvement following 2025 best practices: - Backend: /api/workbench/improve-prompt endpoint + service method that rewrites prompts with clear role, goals, numbered steps, tool references, output format, and constraints - Frontend: '✨ Improve my Prompt' button below the system prompt textarea, disabled when empty, replaces prompt with improved version - 4 Playwright E2E tests with before/after screenshots Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: prompt improvement skips output format (handled by schema) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: improve-prompt uses selected tools, not all available Pass tool_names from frontend form state so the LLM only references tools the user actually selected for this agent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: remove maxHeight on tools list to avoid scrolling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: replace worklog template with Topic & Product Analysis Worklog columns in data.csv are all empty/zero. New template analyzes topics, products, services, priority distribution, and group workload using data that actually exists in the CSV. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Signed-off-by: Andre Bossard <anbossar@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

#25) Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

…ionality Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Interactive Jupyter notebook series teaching prompt optimization with DSPy. Organized by learning concepts from Grokking Simplicity and A Philosophy of Software Design. Structure: - 8 notebooks (00-07): Introduction → Data/Calc/Actions → Deep Modules → Evaluation as Spec → Optimizer as Compiler → Domain Tuning → Agentic → Finale - 20 tasks across 4 tiers: Basics, Reasoning, Composition, Agentic - dspy_tasks/ library: data.py (DATA), calculations.py (CALCULATIONS), actions.py (ACTIONS), tools.py, visualize.py (ipywidgets + Plotly) - 16 JSON datasets (13 generic + 3 CSV-derived from ticket data) - 165 passing pytest tests covering signatures, metrics, and registry Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

litellm PyPI versions 1.82.7 and 1.82.8 were compromised by attacker TeamPCP with credential-stealing malware. See: BerriAI/litellm#24518 Pinned to known-safe versions: - backend: litellm==1.82.1 - notebooks: litellm==1.82.6 Do NOT upgrade until BerriAI confirms PyPI is clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

NB02: removed 'Beliebige Aufgabe' section (belongs in NB03) NB03: 6-step arc with task catalog + domain tuning: 1. See all 20 tasks 2. Pick any task and optimize it 3. Load real ticket data 4. Run generic prompt → mediocre 5. Tune with domain data → much better 6. Takeaway: your data is your moat Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dump_state() can return a list or dict depending on DSPy version. Handle both gracefully with type checks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Updated task definitions in notebooks to use more descriptive field names (e.g., 'query' to 'question'). - Changed the default task in domain tuning notebook from "sentiment" to "plan_execute". - Improved agent behavior optimization by refining prompts and adding explanations for model choices. - Enhanced search functionality in tools to provide better ticket search results and counts. - Updated calculations for plan quality and self-correct accuracy to align with new output structures. - Added MIPROv2 optimization step to improve agent responses based on vague prompts. - Adjusted dataset for search agent to include more complex queries and answers. - Updated kernel specifications across notebooks to use Python 3.13.12.

…agent optimization examples. Change model to 'github_copilot/gpt-4o-mini' for faster performance. Enhance explanations for prompt optimization and MIPROv2. Adjust markdown formatting and update kernel specifications across notebooks.

…enhance takeaway insights

…ot and notebooks directory Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Copilot

Pull request overview

Adds an “DSPy/LLM playground” developer experience layer (docs + editor config) while also introducing substantial backend functionality for agent tooling, auto-generation scheduling, and persistence.

Changes:

Added extensive documentation for KBA Drafter, Agent Builder, CSV AI guidance, and Ubuntu setup.
Introduced Agent Builder module + workbench integration, including REST routes, tool registry, evaluation, and SSE event streaming.
Added auto-generation scheduler/service/models and migrated tasks.py storage from in-memory to SQLite/SQLModel, plus expanded backend dependencies and tests.

Reviewed changes

Copilot reviewed 89 out of 227 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
docs/KBA_DRAFTER.md	New KBA Drafter architecture + API + ops documentation.
docs/INSTALL_UBUNTU.md	New Ubuntu 22.04 prerequisite install guide.
docs/CSV_AI_GUIDANCE.md	New guidance for agents using CSV ticket tools.
docs/AGENT_BUILDER.md	New Agent Builder module documentation with diagrams.
backend/workbench_integration.py	Wires project operations into Agent Builder ToolRegistry + services.
backend/usecase_demo.py	Adds in-memory background runner with markdown→JSON row extraction.
backend/tests/test_workbench_integration_e2e.py	E2E-style REST verification for workbench integration with mocked LLM.
backend/tests/test_usecase_demo.py	Async tests for UsecaseDemoRunService execution + timeout handling.
backend/tests/test_search_questions.py	Adds tests for search-questions schema + cleaning behavior.
backend/tests/test_litellm_integration.py	Adds opt-in live LiteLLM/Copilot integration tests (skipped if unauthenticated).
backend/tests/test_kba_schema.py	Adds JSON Schema validation tests for legacy KBA schema.
backend/tests/test_agents.py	Adds basic tests for operation registry + langchain tool conversion.
backend/tests/conftest.py	Ensures backend import path is set up for tests.
backend/test_auto_gen.py	Adds a manual “quick test” script for auto-generation components.
backend/tasks.py	Migrates tasks to SQLModel/SQLite; adds engine/session helpers and DB init on import.
backend/scheduler.py	Adds APScheduler-based auto-generation scheduler + manual trigger.
backend/requirements.txt	Expands backend dependencies for agents, SQLModel, schedulers, LLM tooling.
backend/pytest.ini	Adds pytest config for async mode and discovery.
backend/mcp_handler.py	Adds MCP JSON-RPC handler routing to unified operations.
backend/kba_schemas.py	Adds legacy Draft-07 JSON Schema + example for KBA output.
backend/kba_prompts.py	Adds prompt builders for KBA generation, retries, markdown fallback, search questions.
backend/kba_output_models.py	Adds Pydantic models/validators for structured KBA output + search questions.
backend/kba_exceptions.py	Adds explicit KBA exception hierarchy.
backend/kba_audit.py	Adds audit logging service for KBA draft lifecycle.
backend/kb_published/KB-81C44260-gerätewechsel-für-vpn-probleme-im-efd-durchführen.md	Adds a published KBA markdown artifact.
backend/auto_gen_service.py	Implements ticket selection + sequential draft generation and settings updates.
backend/auto_gen_models.py	Adds settings DTOs + SQLModel persistence table + run result model.
backend/api_decorators.py	Extends Operation with MCP arg parsing, JSON serialization, LangChain tool conversion.
backend/agent_workbench/init.py	Adds backward-compat shim re-exporting Agent Builder API.
backend/agent_builder/tools/schema_converter.py	Converts JSON Schema → Pydantic args models (pure).
backend/agent_builder/tools/registry.py	Tool registry for dependency-injected StructuredTools.
backend/agent_builder/tools/mcp_adapter.py	Adapts external MCP tools into LangChain StructuredTools.
backend/agent_builder/tools/init.py	Re-exports tool helpers.
backend/agent_builder/tests/test_service.py	Adds WorkbenchService CRUD/tool introspection tests.
backend/agent_builder/tests/test_schema_converter.py	Adds schema converter tests.
backend/agent_builder/tests/test_registry.py	Adds ToolRegistry tests.
backend/agent_builder/tests/test_prompt_builder.py	Adds prompt builder tests (schema/defaults/efficiency mode).
backend/agent_builder/tests/test_persistence.py	Adds repository tests with real temp SQLite DB.
backend/agent_builder/tests/test_models.py	Adds extensive model validation/roundtrip tests.
backend/agent_builder/tests/test_evaluator.py	Adds evaluator tests (criteria types + scoring + llm_judge guard).
backend/agent_builder/tests/test_engine.py	Adds extract_tools_used tests.
backend/agent_builder/routes.py	Adds Quart blueprint routes for workbench CRUD/runs/eval + SSE stream.
backend/agent_builder/persistence/repository.py	Adds persistence repository for agents/runs/evaluations.
backend/agent_builder/persistence/database.py	Adds engine builder + lightweight SQLite migrations.
backend/agent_builder/persistence/init.py	Re-exports persistence public API.
backend/agent_builder/models/run.py	Adds SQLModel run table + JSON property accessors + DTO.
backend/agent_builder/models/evaluation.py	Adds criteria/evaluation models + SQLModel table.
backend/agent_builder/models/chat.py	Adds chat request/response models.
backend/agent_builder/models/agent.py	Adds agent definition models + JSON-backed fields + create/update DTOs.
backend/agent_builder/models/init.py	Re-exports models API.
backend/agent_builder/evaluator.py	Implements criteria evaluation + optional llm_judge + scoring.
backend/agent_builder/engine/react_runner.py	Implements LLM construction, ReAct agent execution, tool usage extraction.
backend/agent_builder/engine/prompt_builder.py	Implements structured output schema prompting + efficiency mode prompt.
backend/agent_builder/engine/event_bus.py	Adds in-process pub/sub event bus with history for SSE.
backend/agent_builder/engine/callbacks.py	Adds logging + SSE publishing callbacks for tools/LLM calls.
backend/agent_builder/engine/init.py	Re-exports engine API.
backend/agent_builder/chat_service.py	Adds one-shot chat service using shared ReAct runner.
backend/agent_builder/init.py	Defines Agent Builder public API surface.
backend/=3.10.4	Adds a file containing pip install output (likely accidental).
CLAUDE.md	Adds Claude Code guidance: setup/run/test/architecture notes.
.vscode/launch.json	Adds VS Code launch configs for backend/frontend/notebooks/browser.
.vscode/extensions.json	Adds recommended VS Code extensions.
.github/copilot-instructions.md	Rewrites Copilot guidance for CSV-ticket learning scope and boundaries.
.gitattributes	Enables nbstripout + notebook diff driver configuration.
.env.example	Adds documented environment variable template for LLM/KBA features.
.dockerignore	Updates ignore rules to align with `.venv` naming.
.claude/settings.local.json	Updates Claude allowed command to source `.venv`.

Copilot · 2026-03-25T08:58:28Z

backend/scheduler.py

+            Result dictionary with generation stats
+        """
+        logger.info("Manual trigger of auto-generation")
+        result = await self._run_auto_generation()


run_now() assumes _run_auto_generation() always returns an object with model_dump(), but _run_auto_generation() returns None when auto-generation is disabled (and also on exceptions). This will raise an AttributeError on manual runs; return a consistent result shape (e.g., an AutoGenRunResult with success=False + reason) or handle None explicitly in run_now().

Suggested change

Result dictionary with generation stats

"""

logger.info("Manual trigger of auto-generation")

result = await self._run_auto_generation()

Result dictionary with generation stats. If auto-generation

did not run (e.g., disabled or failed), returns a failure

dictionary with a reason message.

"""

logger.info("Manual trigger of auto-generation")

result = await self._run_auto_generation()

if result is None:

# Ensure we always return a dictionary, even when auto-generation

# is disabled or fails with an exception.

return {

"success": False,

"reason": "Auto-generation did not run; see logs for details.",

}

Copilot · 2026-03-25T08:58:29Z

backend/scheduler.py

+            settings = self.auto_gen_service.get_settings()
+            if not settings.enabled:
+                logger.info("Auto-generation is disabled, skipping run")
+                return None


run_now() assumes _run_auto_generation() always returns an object with model_dump(), but _run_auto_generation() returns None when auto-generation is disabled (and also on exceptions). This will raise an AttributeError on manual runs; return a consistent result shape (e.g., an AutoGenRunResult with success=False + reason) or handle None explicitly in run_now().

Copilot · 2026-03-25T08:58:29Z

backend/agent_builder/engine/callbacks.py

+        def on_tool_end(self, output: str, *, run_id: Any = None, **kwargs: Any) -> None:
+            started = self._start_times.pop(run_id, None)
+            duration_ms = int((perf_counter() - started) * 1000) if started is not None else None
+            preview = output[:500] if isinstance(output, str) else str(output)[:500]
+            event_bus.publish(AgentEvent(
+                run_id=run_id_outer,
+                event_type="tool_end",
+                data={"tool_name": kwargs.get("name", ""), "output": preview, "duration_ms": duration_ms},
+            ))


The SSE tool_end event reports tool_name using kwargs.get("name"), but on_tool_end typically doesn’t receive the tool name in kwargs; this will often publish empty tool names and break UI/telemetry correlation. Persist the tool name from on_tool_start keyed by callback run_id (similar to _start_times) and reuse it in on_tool_end/on_tool_error.

Copilot · 2026-03-25T08:58:29Z

backend/agent_builder/persistence/repository.py

+            count = session.exec(select(AgentRun)).all().__len__()
+            session.exec(sql_delete(AgentRun))
+            session.commit()
+            return count


delete_all_runs() loads all runs into memory just to compute the count. Prefer a COUNT(*) query (or use a delete statement with rowcount if supported by the driver) to avoid O(n) memory/time for large tables.

Suggested change

count = session.exec(select(AgentRun)).all().__len__()

session.exec(sql_delete(AgentRun))

session.commit()

return count

result = session.exec(sql_delete(AgentRun))

session.commit()

# result.rowcount may be None if not supported; normalize to int

return int(result.rowcount or 0)

Copilot · 2026-03-25T08:58:30Z

backend/=3.10.4

+Collecting APScheduler
+  Downloading apscheduler-3.11.2-py3-none-any.whl.metadata (6.4 kB)
+Collecting tzlocal>=3.0 (from APScheduler)
+  Downloading tzlocal-5.3.1-py3-none-any.whl.metadata (7.6 kB)
+Downloading apscheduler-3.11.2-py3-none-any.whl (64 kB)
+Downloading tzlocal-5.3.1-py3-none-any.whl (18 kB)
+Installing collected packages: tzlocal, APScheduler
+
+Successfully installed APScheduler-3.11.2 tzlocal-5.3.1


This looks like accidental pip output committed as a file (backend/=3.10.4). It will confuse contributors and pollute the repo; remove it and consider adding a guard (e.g., .gitignore patterns) to prevent similar artifacts from being committed.

Suggested change

Collecting APScheduler

Downloading apscheduler-3.11.2-py3-none-any.whl.metadata (6.4 kB)

Collecting tzlocal>=3.0 (from APScheduler)

Downloading tzlocal-5.3.1-py3-none-any.whl.metadata (7.6 kB)

Downloading apscheduler-3.11.2-py3-none-any.whl (64 kB)

Downloading tzlocal-5.3.1-py3-none-any.whl (18 kB)

Installing collected packages: tzlocal, APScheduler

Successfully installed APScheduler-3.11.2 tzlocal-5.3.1

Copilot · 2026-03-25T08:58:30Z

backend/tasks.py

+# Database path - use environment variable or default to data/tasks.db
+DB_PATH = Path(__file__).parent / "data" / "tasks.db"
+DB_PATH.parent.mkdir(parents=True, exist_ok=True)
+
+# Create SQLAlchemy engine
+DATABASE_URL = f"sqlite:///{DB_PATH}"
+engine = create_engine(DATABASE_URL, echo=False)


The comment says the DB path can be set via environment variable, but the implementation always uses Path(__file__).parent / "data" / "tasks.db". Either read an env var (e.g., TASKS_DB_PATH) or adjust the comment so behavior is accurate.

Copilot · 2026-03-25T08:58:30Z

backend/tasks.py

    def get_stats() -> TaskStats:
        """
-        Get task statistics.
-
-        Consolidated stats calculation - we could split into
-        count_total(), count_completed(), count_pending(), but
-        that's unnecessary fragmentation. This is clearer.
+        Get task statistics using SQLModel queries.
        """
-        all_tasks = list(_tasks_db.values())
-        completed = sum(1 for task in all_tasks if task.completed)
-
-        return TaskStats(
-            total=len(all_tasks),
-            completed=completed,
-            pending=len(all_tasks) - completed
-        )
+        with get_session() as session:
+            total = len(session.exec(select(Task)).all())
+            completed = len(session.exec(select(Task).where(Task.completed == True)).all())  # noqa: E712
+
+            return TaskStats(
+                total=total,
+                completed=completed,
+                pending=total - completed
+            )


get_stats() materializes full result sets just to count rows, which will scale poorly as tasks grow. Use COUNT(*) aggregation queries to compute total and completed without loading rows into Python.

Copilot · 2026-03-25T08:58:30Z

backend/tasks.py

+DB_PATH.parent.mkdir(parents=True, exist_ok=True)
+
+# Create SQLAlchemy engine
+DATABASE_URL = f"sqlite:///{DB_PATH}"
+engine = create_engine(DATABASE_URL, echo=False)
+
+
+def init_db():
+    """Initialize database - create all tables."""
+    SQLModel.metadata.create_all(engine)
+
+
+def get_session():
+    """Get database session."""
+    return Session(engine)
+
+
+# Initialize database on module import
+init_db()




Initializing/creating the SQLite DB on module import introduces side effects during import (creates directories/files, runs DDL) which can be problematic in test discovery, tooling, or multi-process deployments. Consider moving initialization to an explicit app startup hook (or a lazy “init once” path) so imports remain side-effect free.

Suggested change

DB_PATH.parent.mkdir(parents=True, exist_ok=True)

# Create SQLAlchemy engine

DATABASE_URL = f"sqlite:///{DB_PATH}"

engine = create_engine(DATABASE_URL, echo=False)

def init_db():

"""Initialize database - create all tables."""

SQLModel.metadata.create_all(engine)

def get_session():

"""Get database session."""

return Session(engine)

# Initialize database on module import

init_db()

# Create SQLAlchemy engine

DATABASE_URL = f"sqlite:///{DB_PATH}"

engine = create_engine(DATABASE_URL, echo=False)

_DB_INITIALIZED = False

def init_db() -> None:

"""Initialize database - ensure directory exists and create all tables.

This function is idempotent and safe to call multiple times.

"""

global _DB_INITIALIZED

if _DB_INITIALIZED:

return

# Create parent directory for the SQLite file if it does not exist.

DB_PATH.parent.mkdir(parents=True, exist_ok=True)

# Create all tables defined on SQLModel metadata.

SQLModel.metadata.create_all(engine)

_DB_INITIALIZED = True

def get_session() -> Session:

"""Get database session, initializing the database on first use."""

init_db()

return Session(engine)

# Database initialization is performed lazily when a session is first requested.

Copilot · 2026-03-25T08:58:31Z

backend/api_decorators.py

+        elif hasattr(result, 'model_dump'):
+            return json.dumps(result.model_dump(mode='json'), indent=2)
+        elif isinstance(result, bool):
+            return f"Success: {result}"


serialize_result() returns a non-JSON string for bool results ("Success: True"), while other branches serialize JSON. For MCP clients expecting consistent JSON text (especially if they parse tool outputs), this inconsistency can cause failures; prefer serializing booleans as JSON (e.g., true/false with json.dumps).

Suggested change

return f"Success: {result}"

return json.dumps(result, indent=2)

Copilot · 2026-03-25T08:58:31Z

backend/tasks.py

 # ============================================================================

-class Task(BaseModel):
+class Task(SQLModel, table=True):


The PR description emphasizes onboarding/documentation improvements, but the diff includes major functional/backend changes (e.g., migrating tasks persistence to SQLite/SQLModel, adding agent builder modules, scheduler/auto-generation). Consider updating the PR description to explicitly call out these runtime-affecting changes (or splitting into separate PRs) so reviewers can scope risk and release impact accurately.

- Cleaned up README.md by removing unnecessary blank lines. - Updated test_llm_service.py to mock OpenAI API key for better error handling. - Added multiple new screenshot files for documentation and demo purposes. - Minor formatting adjustments in setup.sh to remove extra blank lines. - Changed app.spec.js to wait for the full page load instead of DOM content loaded for more reliable test execution. Signed-off-by: Andre Bossard <anbossar@microsoft.com>

…M service tests and evaluation notebook Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Copilot

Pull request overview

Copilot reviewed 68 out of 79 changed files in this pull request and generated 2 comments.

Copilot · 2026-03-25T09:17:36Z

notebooks/tests/test_e2e.py

+    def test_sentiment_baseline_scores(self):
+        """Full baseline run on sentiment should produce a score > 0."""
+        result = run_baseline("sentiment", get_default_model(), max_eval=5)
+        assert result.score > 0.0
+        assert len(result.individual_scores) == 5
+        assert result.elapsed_seconds > 0


run_baseline is defined as run_baseline(task_id: str, *, max_eval: Optional[int] = None) and uses the currently configured DSPy LM. These calls pass a model as a positional arg, which will raise TypeError: run_baseline() takes 1 positional argument but 2 were given. Configure the model via configure_dspy(model=...) (or dspy.context) and call run_baseline("sentiment", max_eval=...) instead.

Copilot · 2026-03-25T09:17:36Z

notebooks/tests/test_e2e.py

+    def test_bootstrap_fewshot_runs(self):
+        """BootstrapFewShot optimization should complete and return scores."""
+        result = run_optimization(
+            "sentiment",
+            get_default_model(),
+            "BootstrapFewShot",
+            max_eval=5,
+        )
+        assert result.baseline_score >= 0.0


run_optimization is defined as run_optimization(task_id: str, optimizer: str = "BootstrapFewShot", *, max_eval=None, instructions=None) and does not take a model argument. Passing get_default_model() as the second positional argument will be interpreted as the optimizer string, and passing "BootstrapFewShot" as the third positional argument will raise a TypeError. Configure the LM separately (e.g., configure_dspy(get_default_model())) and call run_optimization("sentiment", optimizer="BootstrapFewShot", max_eval=5) (or pass instructions= if intended).

…ChatService and WorkbenchService, and enhance LLM selection tests Signed-off-by: Andre Bossard <anbossar@microsoft.com>

… script Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Copilot

Pull request overview

Copilot reviewed 74 out of 85 changed files in this pull request and generated no new comments.

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

…ntRunPage and RunsSidePanel components Signed-off-by: Andre Bossard <anbossar@microsoft.com>

…on and optimization notebooks

Copilot

Pull request overview

Copilot reviewed 76 out of 87 changed files in this pull request and generated 19 comments.

Copilot · 2026-03-25T10:15:32Z

notebooks/tests/test_e2e.py

+    def test_ticket_routing_baseline(self):
+        """Ticket routing baseline on real CSV-derived data."""
+        result = run_baseline("ticket_routing", get_default_model(), max_eval=3)
+        assert result.score >= 0.0
+        assert len(result.individual_scores) == 3


Same issue as above: run_baseline is keyword-only after task_id, so passing get_default_model() positionally will raise TypeError. Configure the model via configure_dspy(model=...) before calling run_baseline(...), or change the action signature to accept a model.

Copilot · 2026-03-25T10:15:32Z

notebooks/tests/test_e2e.py

+    def test_math_baseline_scores(self):
+        """Math baseline should run and score without errors."""
+        result = run_baseline("math_word", get_default_model(), max_eval=3)
+        assert result.score >= 0.0
+        assert result.llm_calls > 0


Same issue as above: run_baseline doesn’t accept a positional model argument (only task_id + keyword args). This call will raise a TypeError unless you configure the model separately via configure_dspy(model=...).

Copilot · 2026-03-25T10:15:32Z

notebooks/tests/test_e2e.py

+        scores = {}
+        for model in chat_models:
+            result = run_baseline("sentiment", model, max_eval=3)
+            scores[model] = result.score


This loop calls run_baseline("sentiment", model, max_eval=3) but run_baseline does not accept a positional model argument. This will raise TypeError; configure the model with configure_dspy(model) before each baseline run (or change run_baseline to accept a model).

Copilot · 2026-03-25T10:15:33Z

notebooks/tests/test_e2e.py

+"""
+End-to-end tests for the DSPy Playground — runs real LLM calls via LiteLLM.
+
+These tests hit the actual Copilot/LiteLLM backend. They verify the full
+pipeline: config → DSPy module → LiteLLM → model API → metric scoring.
+
+Run:  cd notebooks && python -m pytest tests/ -v
+"""


These tests are explicitly making real LLM calls (network + cost + rate limits), which makes the default test suite flaky and non-deterministic in CI. Consider gating them behind an env var (e.g. RUN_LIVE_LLM_TESTS=1) and pytest.skip by default, or marking them (e.g. @pytest.mark.live) and excluding that marker in CI.

Copilot · 2026-03-25T10:15:33Z

notebooks/_predict_group.py

+with open('/Users/abossard/Desktop/projects/python-quart-vite-react/csv/data.csv', encoding='utf-8', errors='replace') as f:
+    reader = csv.DictReader(f)
+    rows = list(reader)


This script uses an absolute, developer-specific file path (/Users/.../csv/data.csv), which makes it non-portable. Use a repo-relative Path(...) to csv/data.csv instead.

Copilot · 2026-03-25T10:15:35Z

notebooks/dspy_tasks/actions.py

+def _evaluate_examples(module, examples, metric_fn, timeout_per_example: int = 30) -> list[dict]:
+    import signal
+
+    class TimeoutError(Exception):
+        pass
+
+    def _handler(signum, frame):
+        raise TimeoutError("LLM call timed out")
+
+    results = []
+    for i, ex in enumerate(examples):
+        print(f"  [{i+1}/{len(examples)}]", end=" ", flush=True)
+        try:
+            old_handler = signal.signal(signal.SIGALRM, _handler)
+            signal.alarm(timeout_per_example)
+            input_kwargs = {k: ex[k] for k in ex.inputs().keys()}
+            prediction = module(**input_kwargs)
+            signal.alarm(0)
+            signal.signal(signal.SIGALRM, old_handler)
+            score = float(metric_fn(ex, prediction) or 0.0)


signal.SIGALRM is not available on Windows and alarm-based timeouts only work reliably in the main thread. As written, this helper will crash on platforms without SIGALRM. Consider guarding with hasattr(signal, "SIGALRM") and falling back to no timeout (or another timeout mechanism) when unavailable.

Copilot · 2026-03-25T10:15:36Z

notebooks/appendix_a_grokking_simplicity.ipynb

+    "from dspy_tasks.config import configure_dspy\n",
+    "from dspy_tasks.data import ClassifySentiment\n",
+    "import dspy\n",
+    "\n",
+    "lm = \n",
+    "tricky_reviews = [\n",
+    "    (\"Super, schon wieder ein Produkt das nach einer Woche kaputt geht. Genau was ich brauchte.\", \"negative\"),\n",


This notebook cell contains an incomplete statement (lm =) which will cause a SyntaxError when executed. Remove it or replace it with the intended model configuration (e.g. call configure_dspy(...) and assign the returned LM if you need it).

Copilot · 2026-03-25T10:15:36Z

notebooks/appendix_b_deep_modules.ipynb

+   "source": [
+    "\n",
+    "# SHALLOW: Just predict\n",
+    "shallow = dspy.Predict(TranslateEnDe)\n",
+    "result_shallow = shallow(english_text=\"The cat sat on the mat while the dog chased its tail.\")\n",
+    "\n",
+    "# DEEP: Chain of Thought\n",
+    "deep = dspy.ChainOfThought(TranslateEnDe)\n",
+    "result_deep = deep(english_text=\"The cat sat on the mat while the dog chased its tail.\")\n",


This cell uses dspy and TranslateEnDe but neither is imported in the notebook prior to use, so execution will fail with NameError. Add the missing imports (e.g. import dspy and an import for TranslateEnDe).

Copilot · 2026-03-25T10:15:36Z

notebooks/_check_fields.py

+with open('/Users/abossard/Desktop/projects/python-quart-vite-react/csv/data.csv', encoding='utf-8', errors='replace') as f:
+    reader = csv.DictReader(f)
+    rows = list(reader)
+


This script uses an absolute, developer-specific file path (/Users/.../csv/data.csv), which makes it non-portable. Use a repo-relative Path(...) to csv/data.csv instead.

Copilot · 2026-03-25T10:15:36Z

notebooks/_check_more_fields.py

+with open('/Users/abossard/Desktop/projects/python-quart-vite-react/csv/data.csv', encoding='utf-8', errors='replace') as f:
+    reader = csv.DictReader(f)
+    rows = list(reader)


This script uses an absolute, developer-specific file path (/Users/.../csv/data.csv), which makes it non-portable. Use a repo-relative Path(...) to csv/data.csv instead.

…e widget resolution in SchemaRenderer Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Copilot AI and others added 30 commits November 19, 2025 08:14

Add comprehensive Ubuntu installation guide for 22.04 and 24.04 LTS

8f12fbc

Co-authored-by: abossard <86611+abossard@users.noreply.github.com>

Fix footer to say "Target Platforms" instead of falsely claiming "Tes…

2d92c16

…ted On" Co-authored-by: abossard <86611+abossard@users.noreply.github.com>

Simplify guide: Ubuntu 22.04 only, one method per tool, Python 3.13, …

473534c

…Node 20 LTS Co-authored-by: abossard <86611+abossard@users.noreply.github.com>

Add Chromium installation instructions and verification step to Ubunt…

2fee264

…u guide

Merge pull request #2 from abossard/copilot/research-install-prerequi…

d709c9f

…sites-ubuntu

Add launch configuration for Python Quart backend and frontend develo…

19d2b97

…pment

Update .gitignore to include vscode-chromium-profile and exclude laun…

fbad6d5

…ch.json

Add VSCode extensions recommendations for Python and JavaScript devel…

83c6b31

…opment

Update LEARNING.md

e26dd63

refactor: remove agent service initialization and related endpoints (#7)

90e2fca

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

refactor: update architecture documentation and improve environment v…

3ea7722

…ariable handling in agents.py (#11) Signed-off-by: Andre Bossard <anbossar@microsoft.com>

fix: update API proxy target from localhost to 127.0.0.1 in vite.conf…

3d6e660

…ig.js (#16) Co-authored-by: luca Spring <luca.spring@bit.admin.ch>

fix: enhance CSV ticket handling and update LLM backend initialization (

db2b6b6

#25) Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Implement code changes to enhance functionality and improve performance

e97ba25

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

feat: enhance UI components and improve test coverage for agent funct…

d7e3b72

…ionality Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Add model selection to agent workbench

c61bb79

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

abossard and others added 7 commits March 24, 2026 20:08

Fix _format_optimized_prompt for different dump_state return types

4a12acc

dump_state() can return a list or dict depending on DSPy version. Handle both gracefully with type checks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Refactor domain tuning notebook: streamline optimization section and …

0a6f565

…enhance takeaway insights

Refactor environment loading: support .env files from both project ro…

854097a

…ot and notebooks directory Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Copilot AI review requested due to automatic review settings March 25, 2026 08:56

abossard self-assigned this Mar 25, 2026

Copilot AI reviewed Mar 25, 2026

View reviewed changes

abossard added 3 commits March 25, 2026 10:12

Refactor test files: streamline imports and enhance readability in LL…

5c41d08

…M service tests and evaluation notebook Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Remove unused screenshot files to declutter the repository

6e547a9

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Copilot AI review requested due to automatic review settings March 25, 2026 09:13

Copilot started reviewing on behalf of abossard March 25, 2026 09:13 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

abossard added 2 commits March 25, 2026 10:24

Refactor model selection logic: streamline default model handling in …

32a9b70

…ChatService and WorkbenchService, and enhance LLM selection tests Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Simplify frontend setup: streamline npm installation process in setup…

6b8b345

… script Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Copilot AI review requested due to automatic review settings March 25, 2026 09:29

Copilot started reviewing on behalf of abossard March 25, 2026 09:30 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

abossard added 3 commits March 25, 2026 10:40

Update model configuration and Python version in evaluation notebook

5dda3b6

Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Enhance dialog layout: update dimensions and overflow handling in Age…

ce69985

…ntRunPage and RunsSidePanel components Signed-off-by: Andre Bossard <anbossar@microsoft.com>

Update model selection to use "github_copilot/gpt-5-mini" in evaluati…

8ca510f

…on and optimization notebooks

Copilot AI review requested due to automatic review settings March 25, 2026 10:10

Copilot started reviewing on behalf of abossard March 25, 2026 10:10 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

Enhance widget rendering logic: add primitive value checks and improv…

de17ed7

…e widget resolution in SchemaRenderer Signed-off-by: Andre Bossard <anbossar@microsoft.com>

abossard merged commit 593f45f into main Mar 25, 2026
2 checks passed

abossard deleted the dspy-playground branch March 25, 2026 13:57

-            Result dictionary with generation stats
-        """
-        logger.info("Manual trigger of auto-generation")
-        result = await self._run_auto_generation()
+            Result dictionary with generation stats. If auto-generation
+            did not run (e.g., disabled or failed), returns a failure
+            dictionary with a reason message.
+        """
+        logger.info("Manual trigger of auto-generation")
+        result = await self._run_auto_generation()
+        if result is None:
+            # Ensure we always return a dictionary, even when auto-generation
+            # is disabled or fails with an exception.
+            return {
+                "success": False,
+                "reason": "Auto-generation did not run; see logs for details.",
+            }

	return f"Success: {result}"
	return json.dumps(result, indent=2)

Conversation

abossard commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!