-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem
The HED assistant is generating invalid HED tags (e.g., Muscle-artifact) without validating them. This defeats the core value proposition of providing precise, validated HED annotations.
Example: When asked "How can I annotate muscle artifacts for EEG data?", the assistant:
- Generates
Muscle-artifactwhich is not a valid HED tag (the correct tag isEMG-artifact) - Provides no validation of the generated annotations
- Constructs elaborate BIDS examples using fake tags
Root Causes
Three compounding issues:
1. Prompt softening reduced tool-calling pressure
Commit 4202d79 changed the system prompt from "use tools liberally, prefer calling a tool over making assumptions" to "use when in doubt." This gives the model an escape route: if it is confidently (but incorrectly) generating tags, it skips tool calls.
2. suggest_hed_tags silently returns empty results in production
The suggest_hed_tags tool depends on hed-suggest CLI being installed. In the Docker container, this CLI is not available, so the tool always returns {"term": []} for every search term. The model gets no schema-grounded suggestions and falls back to parametric knowledge (which includes hallucinated tags).
3. validate_hed_string depends on external API
The validation tool calls hedtools.org/hed/services_submit. On any failure, it returns "Use examples from documentation instead", which the model interprets as permission to skip validation entirely.
Expected Behavior
- The assistant MUST validate every HED annotation before presenting it to users
- If validation tools are unavailable, the assistant should clearly state it cannot validate rather than presenting unvalidated tags as correct
- The
suggest_hed_tagstool should either work in production or be removed/replaced
Affected Files
src/assistants/hed/config.yaml- System prompt (tool usage instructions)src/assistants/hed/tools.py-suggest_hed_tags(silent empty return),validate_hed_string(fallback message)src/agents/base.py- Tool invocation (model decides whether to call tools)
Proposed Fix
- Revert prompt to strong tool-calling imperative: "ALWAYS validate HED annotations before showing them. Never present unvalidated tags."
- Make
suggest_hed_tagsfailure visible: Return an error message instead of empty lists when CLI is not found - Fix validation fallback: Change the error message to "Could not validate. DO NOT present unvalidated HED tags to users."
- Add guardrail: Consider a post-processing check that flags responses containing HED-like tags that weren't validated