Skip to content

feat: add DingTalk media message support (image, file, voice, video)#392

Merged
wisdomqin merged 54 commits intodataelement:enhancefrom
nap-liu:pr/dingtalk-media-support-v2
Apr 13, 2026
Merged

feat: add DingTalk media message support (image, file, voice, video)#392
wisdomqin merged 54 commits intodataelement:enhancefrom
nap-liu:pr/dingtalk-media-support-v2

Conversation

@nap-liu
Copy link
Copy Markdown

@nap-liu nap-liu commented Apr 13, 2026

Summary

DingTalk media message support — extracted from PR #370 per reviewer feedback, excluding the image context rehydration feature (#343).

New files:

  • backend/app/services/dingtalk_token.py — Global access_token cache with auto-refresh before expiry
  • backend/app/services/dingtalk_reaction.py — Thinking indicator (reaction emoji) shown during LLM processing

Modified files:

  • backend/app/services/dingtalk_stream.py — Media download pipeline (picture, richText, audio, video, file), auto-reconnect with exponential backoff, media upload & send helpers
  • backend/app/api/dingtalk.py — Accept image_base64_list and saved_file_paths in message processing, forward media to LLM with vision support
  • backend/app/services/dingtalk_service.pydownload_dingtalk_media convenience wrapper

Supersedes: PR #370 (media + image context bundle), PR #369 (reconnect-only, now included here)

Test plan

  • Configure a DingTalk bot in websocket mode and verify stream connects
  • Send text message → verify normal reply
  • Send image → verify download, base64 encoding, and LLM vision response
  • Send file/audio/video → verify download and save to workspace/uploads
  • Kill network briefly → verify auto-reconnect with backoff
  • Verify thinking reaction appears while LLM is processing

🤖 Generated with Claude Code

39499740 and others added 30 commits April 8, 2026 10:16
…num (dataelement#316)

- relationships.py: fix tool name from send_agent_message to send_message_to_agent
  in relationships.md generation (LLM was guided to call non-existent tool)
- gateway.py: fix tool name in OpenClaw A2A prompt to prevent potential
  infinite recursion when LLM tries the similarly-named real tool
- tool_seeder.py: unify msg_type enum to match agent_tools.py definition
  [notify, consult, task_delegate] instead of [chat, task_request, info_share]

Closes dataelement#315
…t chevron (dataelement#321)

* feat(ui): polish chat sidebar — segment control, session items, select chevron

- Replace underline tabs with segment control for My Sessions / Other Users toggle
  - 28px height, accent-colored active state (primary button style in dark mode, white in light mode)
  - Uses design tokens (--segment-active-bg/--segment-active-text) that follow admin theme config
- Rework session item delete button
  - Inline flow layout instead of absolute positioning; title text shrinks gracefully on hover
  - Trash icon (14px) with larger 24px hit area; message count hides on hover
  - Hover transitions at 0.35s ease-in-out
- Global select element restyling
  - Hide native dropdown arrow (appearance: none), add Tabler chevron-down icon via CSS variable
  - Icon colored to --text-secondary per theme; proper right padding prevents text overlap
  - Fix inline style using background shorthand that was overriding background-image

Made-with: Cursor

* fix(ui): only show unpin icon when hovering the pin button itself

Previously, hovering anywhere on a pinned agent sidebar item would
swap the pin icon to unpin and turn it red. Now the pin icon stays
unchanged on item hover; the unpin icon and red color only appear
when the mouse is directly over the pin button.

Made-with: Cursor

* fix(ui): unify new-session button style and fix icon/text vertical alignment

Extract shared .new-session-btn CSS class so both the regular user and
admin new-session buttons render identically. Move hover styles from
inline JS to CSS. Add display:block on SVG icon and line-height:1 on
the button to fix vertical centering.

Made-with: Cursor
DingTalk enforces ~20 QPS per app per API interface. The
DingTalkOrgSyncAdapter was issuing requests in tight loops without
any delay, causing QPS throttle errors (subcode=90018) during org
sync for organizations with many departments.

- Add 60ms delay between requests in fetch_departments() and
  fetch_users() to stay safely under the 20 QPS limit
- Move asyncio import to module level (was locally imported in
  FeishuOrgSyncAdapter)
- Consistent with FeishuOrgSyncAdapter which already uses
  Semaphore(15) for rate limiting

Fixes dataelement#373
feat(ui): implement drag-and-drop file upload across application
Add Exa (exa.ai) as a search provider in two ways:
- Standalone exa_search tool with full feature support (category
  filtering, domain filtering, content modes: text/highlights/summary)
- New engine option in the existing web_search tool for simple use

Files changed:
- backend/app/services/tool_seeder.py: Exa search tool definition + web_search engine option
- backend/app/services/agent_tools.py: _search_exa (simple) and _exa_search (full) functions
- backend/app/config.py: EXA_API_KEY setting
- .env.example: EXA_API_KEY documentation
feat: add Exa AI-powered search tool
Split web_search's multi-engine selector into independent tools:
- duckduckgo_search  (free, no API key)
- tavily_search      (Tavily API key)
- google_search      (Google Custom Search, API_KEY:CX_ID)
- bing_search        (Azure Bing API key)

Each tool has its own config_schema (API key, language, etc.)
and delegates to the existing private _search_* implementations.
web_search is kept for backward compatibility.
exa_search and jina_search were already standalone.
This reverts commit 41e343e, reversing
changes made to f673c61.
The INSERT path for builtin tools was missing parameters_schema,
causing newly-seeded tools (e.g. duckduckgo_search, tavily_search)
to have null schema in DB. The LLM then saw no parameters and
would call the tool without arguments, getting 'Please provide
search keywords' every time.

Also: deprecate web_search (is_default=False) in favor of the new
standalone search engine tools.
The INSERT path for builtin tools was missing parameters_schema,
causing newly-seeded tools (e.g. duckduckgo_search, tavily_search)
to have null schema in DB. The LLM then saw no parameters and
would call the tool without arguments, getting 'Please provide
search keywords' every time.

Also: deprecate web_search (is_default=False) in favor of the new
standalone search engine tools.
DingTalk enforces ~20 QPS per app per API interface. The
DingTalkOrgSyncAdapter was issuing requests in tight loops without
any delay, causing QPS throttle errors (subcode=90018) during org
sync for organizations with many departments.

- Add 60ms delay between requests in fetch_departments() and
  fetch_users() to stay safely under the 20 QPS limit
- Move asyncio import to module level (was locally imported in
  FeishuOrgSyncAdapter)
- Consistent with FeishuOrgSyncAdapter which already uses
  Semaphore(15) for rate limiting

Fixes dataelement#373
fix: add rate limiting to DingTalk org sync API calls
Instead of re-reading images from disk on every LLM call (rehydration),
store the [image_data:data:image/...;base64,...] marker directly in the
chat_messages.content column when the user sends an image.

This makes message history self-contained: subsequent conversation turns
already have the image data available without any disk dependency.
The existing [image_data:] stripping logic in call_llm() handles
non-vision models (no double work needed).

Also keeps display_content for session titles so base64 is never
exposed in the UI.
Saved messages now contain [image_data:data:image/...;base64,xxx]
markers for multi-turn vision context. The parseMessage function
was not stripping them, causing:
  1. Raw base64 rendered as visible text below the image thumbnail
  2. Scroll broken by the massive inline string

Fix: in parseMessage(), after stripping the [file:] prefix,
detect and remove any [image_data:...] markers from the displayed
content, and extract the data URL into imageUrl so the image
thumbnail still appears in history.
Root cause: .main-content uses min-height: 100vh (grows with content),
so .chat-messages overflow-y: auto never triggers.

Fix:
- Add .main-content.chat-page with height: 100vh + overflow: hidden
- Detect chat route in Layout.tsx (useMatch) to apply the modifier
- Chat root div now fills 100% height (flex column)
- Restore padding via scoped CSS rules for page-header and chat-container
Root cause: the live WebSocket chat wrapper div had flex:1 but no
minHeight:0. In CSS flexbox, without minHeight:0 a child can grow to
its content height (1373px) ignoring the parent's constrained flex
boundary (457px = 100vh - 206px). This prevented overflow-y:auto on
the inner chatContainerRef from ever triggering a scroll.

Fix: add minHeight:0 and overflow:hidden to the wrapper div so the
flex chain correctly constrains the scrollable message area.
…precated /chat route

Changes:
1. AgentDetail ChatMessageItem: strip [image_data:data:url] markers from
   displayed text, extract image data URLs, and render them as thumbnails.
   Applies to both live WebSocket messages and history loaded from DB.

2. App.tsx: remove deprecated 'agents/:id/chat' route and Chat import.
   The chat interface lives in AgentDetail as a tab (#chat), not as a
   standalone route. Keeping this route caused repeated confusion where
   fixes were applied to the wrong component.
…/task_delegate)

Implement async A2A communication patterns as proposed in upstream dataelement#310.

- notify: fire-and-forget, saves message and wakes target asynchronously,
  returns immediately
- task_delegate: async with callback, creates focus item + on_message
  trigger on source agent, wakes target asynchronously, source agent is
  notified when target completes
- consult: synchronous request-response (unchanged from original behaviour)

Add helper functions:
- _resolve_a2a_target: shared agent lookup logic
- _ensure_a2a_session: shared session creation logic
- _create_on_message_trigger: programmatic trigger creation
- _append_focus_item: write focus items to agent workspace
- _wake_agent_async: wake target agent via trigger invocation path
- trigger_daemon.wake_agent_with_context: public API for waking agents

Update tool descriptions in agent_tools.py and tool_seeder.py to
document the new msg_type behaviours.

Refs: dataelement#310
11 tests covering:
- notify: returns immediately, wakes target async
- task_delegate: creates focus item + on_message trigger, wakes target async
- consult: calls LLM synchronously, returns reply
- default msg_type falls back to notify
- error cases: missing agent_name, no relationship
- helper functions: _append_focus_item, _create_on_message_trigger,
  _wake_agent_async
- OpenClaw targets still use gateway queue regardless of msg_type
Two issues found during integration testing:

1. Wake storm: wake_agent_with_context had no chain depth or dedup
   protection, causing A→B→A→B infinite loops. Now tracks per-pair
   chain depth (_A2A_WAKE_CHAIN) with max 3 hops, and respects the
   existing 30s DEDUP_WINDOW before waking an agent.

2. Notify silent failure: if log_activity or _wake_agent_async threw
   an exception, the entire tool call failed silently — the LLM never
   received a tool result and produced no user-visible reply. Both
   notify and task_delegate paths now wrap auxiliary operations in
   try/except so the return string always reaches the LLM.

- wake_agent_with_context accepts from_agent_id for chain tracking
- _wake_agent_async passes from_agent_id through
- All auxiliary calls (log_activity, focus, trigger, wake) are
  individually wrapped with error handling
a2a_wake is an internal A2A mechanism. Its Reflection Session output
should not be pushed to the user's active chat session or WebSocket.
Only user-facing triggers (on_message from users, webhooks, cron, etc.)
should produce visible notifications.
A2A on_message triggers created by task_delegate now have:
- max_fires=1: auto-disable after receiving one reply
- expires_at: 24h TTL as safety net

Without max_fires, on_message triggers monitoring another agent's
assistant messages would fire indefinitely in a loop:
A wakes → replies (assistant msg) → B's trigger fires → B replies
→ A's trigger fires → ... ad infinitum.

This was an existing system issue exposed by the async A2A feature.
The two storm triggers (wait_manager_meeting_followup and
wait_xiaozhi_meeting_sync_r2) were manually disabled in the DB.
39499740 and others added 23 commits April 13, 2026 00:07
The DEDUP_WINDOW (30s) was preventing legitimate message deliveries
from waking the target agent. This caused task_delegate callbacks to
be silently dropped:

1. A delegates task to B via task_delegate
2. B completes and sends notify reply to A via send_message_to_agent
3. A was recently woken by the trigger daemon → DEDUP skips A's wake
4. A never receives B's reply → user sees no result

Fix: All _wake_agent_async calls from send_message_to_agent now pass
skip_dedup=True. Chain depth protection (max 3 hops) is still active
to prevent storms, but individual message deliveries are never dropped.

wake_agent_with_context gains a skip_dedup parameter that bypasses
the _last_invoke dedup check while keeping chain depth protection.
…tion

Problem: When users said 'summarize tasks assigned to Manager' without
specifying msg_type, the LLM defaulted to notify (fire-and-forget),
so the user never got the results back.

Changes:
- msg_type is now a required parameter — LLM must choose explicitly
  every time it calls send_message_to_agent
- Tool description now includes concrete examples for each type:
  notify = one-way announcement, consult = quick question,
  task_delegate = delegate work and get results back
- Explicit guidance: 'When the user asks another agent to perform a
  task, use task_delegate, NOT notify'
- Updated tool_seeder.py to match

This ensures the LLM will auto-select task_delegate when the user
asks another agent to do work, without the user needing to specify
the msg_type manually.
The key insight: the LLM should decide msg_type based on ONE question:
'Does the target need to DO WORK and return results?'

- If yes → task_delegate (most common for user requests)
- If just FYI → notify
- If quick factual question → consult
- When in doubt → prefer task_delegate (safer, guarantees result)

Added concrete verb examples (analyze, research, summarize, write,
compare, plan, review, find out, confirm) so the LLM can match
ambiguous user phrases like 'check with X', 'look into Y', 'get back
to me on Z' to task_delegate without explicit keywords.
Before: ⚡ 触发器触发 wait_manager_api_key_reply
After:  ⚡ 监听经理关于API Key的回复

The trigger name is an internal identifier, meaningless to users.
Now uses the trigger's reason field (human-readable description)
as the notification headline, with a 80-char truncation limit.
Before: ⚡ 经理 is expected to reply after completing a delegated task...
After:  ⚡ 等待经理完成任务并回复

The trigger reason is an internal instruction for the agent (in English),
not suitable for user-facing notifications. Added a notification_summary
field in trigger config (_notification_summary) that stores a concise,
user-friendly headline. The notification builder checks this field first
before falling back to the reason field.
The a2a_wait trigger's Reflection Session output was being pushed
directly to the user's chat, but the agent didn't know its output
was user-facing. So it wrote internal monologue (trigger management,
focus state, reasoning process) that confused the user.

Updated the trigger reason to explicitly state:
- 'Your reply will be shown to the user'
- 'Do NOT mention triggers, focus items, internal state, or reasoning'
- 'Just give the user the actionable outcome'

Before:
  ⚡ 等待经理完成任务并回复
  处理完毕。经理的汇总确认:触发器 a2await经理 已取消...
  Focus item wait经理task 已为 [x] 完成状态...

After (expected):
  ⚡ 等待经理完成任务并回复
  经理已完成任务汇总,结果如下:...
Prompt improvement + regex post-processing to ensure user-facing
notifications read naturally.

1. Prompt: Explicitly list banned terms (trigger name, focus item,
   a2a_wait, task_delegate, etc.) and tell agent to write as if
   talking to a colleague.

2. Post-processing: Regex filter strips any remaining internal
   identifiers from the notification text before pushing to user.
   Patterns cover: a2a_wait_*, wait_*_task, focus_item, trigger
   status lines, bullet points with internal terms, etc.

Before:
  ✅ a2await经理 触发器已取消
  ✅ wait经理task focus项已为完成状态
  📋 核查结论:无新增任务

After (expected):
  经理确认:无新增任务或变化,唯一待办仍是API Key问题。
Added patterns to catch remaining leaks:
- resolve_* identifiers (e.g. resolve_smithery_api_key)
- 已静默清理触发器 / 已静默处理完毕
- 继续待命 / 待命
- Trailing punctuation cleanup

Expected before: 已静默清理触发器。唯一活跃待办仍为resolve_smithery_api_key。继续待命。
Expected after:  唯一活跃待办仍为API Key问题。
a2a_wake (notify) triggers don't need the full 50-round tool loop.
Limiting to 2 rounds saves significant tokens since the agent only
needs to: read the message → maybe update memory or take one action.

Token savings per notify:
- Before: up to 50 rounds of tool calls (each round = full context)
- After: max 2 rounds (read message + one action)

Also added max_tool_rounds_override parameter to call_llm() so
callers can cap the tool loop without modifying the agent's DB config.
New field on Agent model: a2a_async_enabled (boolean, default=False)

- When False (default): send_message_to_agent silently converts
  notify and task_delegate to consult. Behavior identical to before
  this PR — all A2A communication is synchronous.
- When True: full async A2A features (notify, task_delegate, consult)

This allows safe rollout:
1. Deploy the code — all agents work exactly as before (flag off)
2. Enable per-agent via API: PATCH /agents/{id} {"a2a_async_enabled": true}
3. If issues arise, flip the flag back to false — instant rollback

Changes:
- Agent model: new column a2a_async_enabled
- AgentUpdate schema: accepts a2a_async_enabled in PATCH
- AgentOut schema: returns a2a_async_enabled in GET
- _send_message_to_agent: checks flag before branching
- Alembic migration: add_a2a_async_enabled
- Tests: feature flag off/on scenarios
Reasons for company-level instead of agent-level:
1. A2A communication involves two agents — if Alice has it on but Bob
   doesn't, which mode should be used? Company-level = consistent behavior
2. Simpler admin UX: one toggle for the whole company
3. Matches existing pattern (min_heartbeat_interval_minutes is also
   tenant-level)

Changes:
- Removed a2a_async_enabled from Agent model and schemas
- Added a2a_async_enabled to Tenant model (default False)
- _send_message_to_agent now queries tenant.a2a_async_enabled
- Updated all 13 tests with tenant mock
- Migration: drop agents.a2a_async_enabled, add tenants.a2a_async_enabled
Added a toggle switch in EnterpriseSettings > Company Info tab:

- Title: 'Agent-to-Agent Async Communication' with BETA badge
- Description: explains the three modes and that disabling restores
  the previous synchronous behavior
- Toggle: on/off with confirmation dialog when enabling
- Confirmation dialog lists known issues:
  • Agent replies may contain internal terms
  • task_delegate callbacks may be delayed
  • Token consumption will increase
  • Agent loops may occur
  • Instructions to disable if issues arise

Backend changes:
- TenantOut schema: added a2a_async_enabled field
- TenantUpdate schema: added a2a_async_enabled field
- PUT /tenants/{id} now accepts a2a_async_enabled

Frontend changes:
- Toggle switch in Company Management section
- Reads from currentTenant.a2a_async_enabled
- Updates via PUT /tenants/{id}
- Confirmation dialog on enable (window.confirm)
Added enterprise.a2aAsync translations to both language files:

en.json:
- title: Agent-to-Agent Async Communication
- description: explains three modes and fallback behavior
- enableWarning: confirmation dialog with known issues

zh.json:
- title: 数字员工间异步通信
- description: 中文说明三种模式
- enableWarning: 中文确认弹框,包含已知问题列表

Fixed duplicate keys in zh.json enterprise.tabs section.
Alembic requires explicit 'revision' and 'down_revision' variables
in each migration file. The previous version only had them in the
docstring, which alembic couldn't parse.

Also fixed: DB had stale alembic_version pointing to a deleted
migration (add_llm_concurrency_group). Updated to point to the
actual latest revision (d9cbd43b62e5).
1. restart.sh: Added automatic 'alembic upgrade head' before backend
   start. New DB columns will now be applied on every restart, no more
   manual ALTER TABLE needed.

2. SQL wildcard injection fix (MEDIUM):
   - trigger_daemon.py: Sanitize from_agent_name and from_user_name
     before ilike interpolation (same pattern as agent_tools.py)

3. Deprecation fix (LOW):
   - asyncio.get_event_loop() → asyncio.get_running_loop()

4. Memory leak fix (LOW):
   - Added _cleanup_stale_invoke_cache() to periodically evict
     stale entries from _last_invoke dict (runs every ~60s)

5. Regex scope restriction (conflict prevention):
   - Internal term regex filter now ONLY applies to a2a_wait_*
     triggers, not to all trigger notifications. Prevents false
     positives on user on_message, heartbeat, cron, etc.

6. Conflicts analysis (all clear):
   - OpenClaw path: early return before msg_type branching ✅
   - max_tool_rounds_override: defaults None, only for a2a_wake ✅
   - msg_type required: code defaults to 'notify', flag forces 'consult' ✅
   - _notification_summary: safe .get() read, transparent to existing code ✅
   - Tenant DB query: PK lookup <1ms, acceptable at current scale ✅
When a2a_async_enabled is False, the send_message_to_agent tool
schema is dynamically simplified to remove the msg_type parameter.
This prevents the LLM from selecting notify/task_delegate modes
(which get silently overridden to consult) and confusing users
who see the raw tool call arguments in the chat UI.
Agent-to-agent sessions store creator's user_id, causing them to be
filtered out from the Other Users admin view. Exempt source_channel=agent
sessions from the user_id filter so they always appear.
- DingTalk org sync rate limiting
- Multimodal image context persistence + chat display
- A2A async communication (notify / task_delegate / consult)
- Feature flag (tenant-level, default OFF)
- Original A2A code by haoyi (39499740)
- Add dingtalk_token.py: global access_token cache with auto-refresh
- Add dingtalk_reaction.py: thinking indicator (reaction) during LLM processing
- Enhance dingtalk_stream.py: media download pipeline, auto-reconnect with
  exponential backoff, support for picture/richText/audio/video/file messages
- Update dingtalk.py: accept image_base64_list and saved_file_paths in message
  processing, forward media to LLM with vision support
- Update dingtalk_service.py: add download_dingtalk_media convenience wrapper

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wisdomqin wisdomqin changed the base branch from main to enhance April 13, 2026 14:49
@wisdomqin wisdomqin merged commit ff408fb into dataelement:enhance Apr 13, 2026
@wisdomqin
Copy link
Copy Markdown
Contributor

🎉 Merged! Thanks @nap-liu for the solid contribution — the DingTalk media support, token cache, thinking reaction, and auto-reconnect are all well-implemented. We're staging this on our dev environment for verification before it goes to main. Appreciate you following our feedback and splitting this out cleanly from #370.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants