Conversation
- Langfuse sessions now correctly show root input and output - Simplified the Langfuse session code - Session (short term) memory works now, and agent can answer questions about previous queries too.
…de_metrics option; introduce raw_count_runs_per_experiment and corresponding tool wrapper
…app entry point and improve error handling in tool wrappers
…g list_runs_tool to include metrics, and adding error handling middleware; introduce manual test checklist for MLflow agent workflows.
There was a problem hiding this comment.
Pull request overview
This PR updates the MLflow agent CLI to support structured, schema-driven assistant outputs, adds Langfuse tracing setup utilities, and improves the CLI UI/UX and documentation around manual testing.
Changes:
- Added a BlockResponse JSON schema + middleware/hooks to support structured agent outputs and friendlier tool error handling.
- Refactored MLflow run listing to reduce token usage (single-experiment runs, optional metric/param previews) and added a “count runs per experiment” tool.
- Introduced Langfuse tracing setup utilities, a new CLI entrypoint (
src/app.py), and updated CLI rendering + docs/assets.
Reviewed changes
Copilot reviewed 11 out of 14 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/mlflow_tools/schemas.py | Updates tool arg schemas (single experiment run listing, metrics inclusion flag, count-runs params). |
| src/mlflow_tools/data_access.py | Implements updated MLflow tool behaviors, adds count-runs helper/tool, adjusts tool wrapper return serialization. |
| src/llm/tracing.py | Adds centralized Langfuse setup helper used by the CLI agent. |
| src/app.py | New CLI entrypoint wrapper that runs the agent and attempts to finish/flush Langfuse on interrupt. |
| src/agent/langchain_agent.py | Wires structured response strategy, middleware, tracing metadata, and improves interactive input/output loop. |
| src/agent/console_ui.py | Adds user message printing and BlockResponse rendering; updates welcome banner styling. |
| src/agent/agent_middleware.py | Adds BlockResponse JSON schema and tool-call middleware for error handling/schema forcing. |
| run_agent.sh | Points the CLI launcher to src/app.py. |
| README.md | Updates project banner image reference. |
| docs/superpowers/specs/2026-05-21-mlflow-agent-manual-test-checklist-design.md | Adds a comprehensive manual test checklist for MLflow agent workflows. |
| diary.md | Updates project notes/diary entries. |
| .gitignore | Ignores additional agent/workflow artifacts. |
| src/agent/init.py | Present as part of the agent package (no content changes). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+60
to
+71
|
|
||
| # Add glow layer (dim version as shadow) | ||
| banner_glow = "\n".join(banner_lines) | ||
| t.append(banner_glow + "\n\n\n", style="dim #5533ff") | ||
|
|
||
| # Overlay gradient banner on top | ||
| t_banner = Text() | ||
| for line, color in zip(banner_lines, colors): | ||
| t_banner.append(line + "\n", style=f"bold {color}") | ||
|
|
||
| t = Text() | ||
| for line, color in zip(banner_lines, colors): |
Comment on lines
+3
to
+4
| sys.path.append('./src/agent') # Add agent directory to path for imports | ||
|
|
Comment on lines
+8
to
+21
| try: | ||
| main() | ||
| except KeyboardInterrupt: | ||
| print("\nInterrupted. Finishing Langfuse run if present...") | ||
| try: | ||
| if lf_run is not None: | ||
| if hasattr(lf_run, 'finish'): | ||
| lf_run.finish() | ||
| elif fuse_client is not None and hasattr(fuse_client, 'finish_run'): | ||
| fuse_client.finish_run(conversation_id) | ||
| if fuse_client is not None: | ||
| fuse_client.flush() | ||
| except Exception: | ||
| pass |
| def setup_langfuse(config): | ||
| """ | ||
| Set up Langfuse tracing based on config and environment variables. | ||
| Returns: langfuse_handler, fuse_client, conversation_id, lf_run, FLUSH_PER_QUERY |
Comment on lines
87
to
100
| def raw_list_runs( | ||
| experiment_ids: List[str], | ||
| experiment_id: str, | ||
| status: Optional[List[str]] = None, | ||
| start_time: Optional[int] = None, | ||
| end_time: Optional[int] = None, | ||
| order_by: Optional[str] = None, | ||
| max_results: int = 100, | ||
| include_metrics: bool = False, | ||
| ) -> List[Dict[str, Any]]: | ||
| """Return summarized runs for given experiments. | ||
|
|
||
| Note: MLflowClient.search_runs accepts experiment_ids and order_by; more complex | ||
| filtering can be added later. | ||
| """ |
Comment on lines
+127
to
+138
| # New: Count runs per experiment ID | ||
| def raw_count_runs_per_experiment(experiment_ids: List[str]) -> Dict[str, int]: | ||
| """Return a dict of experiment_id -> number of runs.""" | ||
| counts = {} | ||
| for exp_id in experiment_ids: | ||
| try: | ||
| runs = client.search_runs([exp_id], max_results=50000) | ||
| counts[exp_id] = len(runs) | ||
| except MlflowException as e: | ||
| logging.error(f"Error counting runs for experiment {exp_id}: {e}") | ||
| counts[exp_id] = -1 | ||
| return counts |
Comment on lines
+57
to
+59
| return ToolMessage( | ||
| content=f"Tool error: Please check your input and try again. ({str(e)})", | ||
| tool_call_id=request.tool_call["id"] |
Comment on lines
+190
to
+192
| # User pressed Ctrl-C; continue the loop to allow graceful exit | ||
| print("\nInterrupted. Goodbye!") | ||
| continue |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces several improvements and additions across documentation, user interface, agent middleware, and development workflow. The main highlights are the addition of a comprehensive manual test checklist for MLflow agent workflows, enhancements to the CLI welcome banner and user message formatting, the introduction of agent middleware for structured tool responses, and a minor update to the project banner.
Documentation & Testing
2026-05-21-mlflow-agent-manual-test-checklist-design.md) covering real-world MLflow agent scenarios, including acceptance criteria and prompt templates for manual verification.User Interface Enhancements
print_welcomewith a purple-to-blue gradient and glow effect for a more visually appealing introduction.print_userto clearly display user messages with a "You" tag for better distinction between user and agent output.Agent Middleware & Structured Output
agent_middleware.py) that enforces a strict JSON schema for tool responses and provides custom error handling, ensuring agent outputs are consistently structured and user-friendly.Development & Workflow
run_agent.shto usesrc/app.pyinstead of the previous script, aligning with the new application structure.Project Branding
README.mdto a new version for improved branding.