feat: add DingTalk media message support (image, file, voice, video) by nap-liu · Pull Request #392 · dataelement/Clawith

nap-liu · 2026-04-13T03:55:23Z

Summary

DingTalk media message support — extracted from PR #370 per reviewer feedback, excluding the image context rehydration feature (#343).

New files:

backend/app/services/dingtalk_token.py — Global access_token cache with auto-refresh before expiry
backend/app/services/dingtalk_reaction.py — Thinking indicator (reaction emoji) shown during LLM processing

Modified files:

backend/app/services/dingtalk_stream.py — Media download pipeline (picture, richText, audio, video, file), auto-reconnect with exponential backoff, media upload & send helpers
backend/app/api/dingtalk.py — Accept image_base64_list and saved_file_paths in message processing, forward media to LLM with vision support
backend/app/services/dingtalk_service.py — download_dingtalk_media convenience wrapper

Supersedes: PR #370 (media + image context bundle), PR #369 (reconnect-only, now included here)

Test plan

Configure a DingTalk bot in websocket mode and verify stream connects
Send text message → verify normal reply
Send image → verify download, base64 encoding, and LLM vision response
Send file/audio/video → verify download and save to workspace/uploads
Kill network briefly → verify auto-reconnect with backoff
Verify thinking reaction appears while LLM is processing

🤖 Generated with Claude Code

…num (dataelement#316) - relationships.py: fix tool name from send_agent_message to send_message_to_agent in relationships.md generation (LLM was guided to call non-existent tool) - gateway.py: fix tool name in OpenClaw A2A prompt to prevent potential infinite recursion when LLM tries the similarly-named real tool - tool_seeder.py: unify msg_type enum to match agent_tools.py definition [notify, consult, task_delegate] instead of [chat, task_request, info_share] Closes dataelement#315

…t chevron (dataelement#321) * feat(ui): polish chat sidebar — segment control, session items, select chevron - Replace underline tabs with segment control for My Sessions / Other Users toggle - 28px height, accent-colored active state (primary button style in dark mode, white in light mode) - Uses design tokens (--segment-active-bg/--segment-active-text) that follow admin theme config - Rework session item delete button - Inline flow layout instead of absolute positioning; title text shrinks gracefully on hover - Trash icon (14px) with larger 24px hit area; message count hides on hover - Hover transitions at 0.35s ease-in-out - Global select element restyling - Hide native dropdown arrow (appearance: none), add Tabler chevron-down icon via CSS variable - Icon colored to --text-secondary per theme; proper right padding prevents text overlap - Fix inline style using background shorthand that was overriding background-image Made-with: Cursor * fix(ui): only show unpin icon when hovering the pin button itself Previously, hovering anywhere on a pinned agent sidebar item would swap the pin icon to unpin and turn it red. Now the pin icon stays unchanged on item hover; the unpin icon and red color only appear when the mouse is directly over the pin button. Made-with: Cursor * fix(ui): unify new-session button style and fix icon/text vertical alignment Extract shared .new-session-btn CSS class so both the regular user and admin new-session buttons render identically. Move hover styles from inline JS to CSS. Add display:block on SVG icon and line-height:1 on the button to fix vertical centering. Made-with: Cursor

DingTalk enforces ~20 QPS per app per API interface. The DingTalkOrgSyncAdapter was issuing requests in tight loops without any delay, causing QPS throttle errors (subcode=90018) during org sync for organizations with many departments. - Add 60ms delay between requests in fetch_departments() and fetch_users() to stay safely under the 20 QPS limit - Move asyncio import to module level (was locally imported in FeishuOrgSyncAdapter) - Consistent with FeishuOrgSyncAdapter which already uses Semaphore(15) for rate limiting Fixes dataelement#373

feat(ui): implement drag-and-drop file upload across application

Add Exa (exa.ai) as a search provider in two ways: - Standalone exa_search tool with full feature support (category filtering, domain filtering, content modes: text/highlights/summary) - New engine option in the existing web_search tool for simple use Files changed: - backend/app/services/tool_seeder.py: Exa search tool definition + web_search engine option - backend/app/services/agent_tools.py: _search_exa (simple) and _exa_search (full) functions - backend/app/config.py: EXA_API_KEY setting - .env.example: EXA_API_KEY documentation

feat: add Exa AI-powered search tool

Split web_search's multi-engine selector into independent tools: - duckduckgo_search (free, no API key) - tavily_search (Tavily API key) - google_search (Google Custom Search, API_KEY:CX_ID) - bing_search (Azure Bing API key) Each tool has its own config_schema (API key, language, etc.) and delegates to the existing private _search_* implementations. web_search is kept for backward compatibility. exa_search and jina_search were already standalone.

fix Docker access port to 3008

This reverts commit 41e343e, reversing changes made to f673c61.

This reverts commit 63c2452.

The INSERT path for builtin tools was missing parameters_schema, causing newly-seeded tools (e.g. duckduckgo_search, tavily_search) to have null schema in DB. The LLM then saw no parameters and would call the tool without arguments, getting 'Please provide search keywords' every time. Also: deprecate web_search (is_default=False) in favor of the new standalone search engine tools.

DingTalk enforces ~20 QPS per app per API interface. The DingTalkOrgSyncAdapter was issuing requests in tight loops without any delay, causing QPS throttle errors (subcode=90018) during org sync for organizations with many departments. - Add 60ms delay between requests in fetch_departments() and fetch_users() to stay safely under the 20 QPS limit - Move asyncio import to module level (was locally imported in FeishuOrgSyncAdapter) - Consistent with FeishuOrgSyncAdapter which already uses Semaphore(15) for rate limiting Fixes dataelement#373

fix: add rate limiting to DingTalk org sync API calls

Instead of re-reading images from disk on every LLM call (rehydration), store the [image_data:data:image/...;base64,...] marker directly in the chat_messages.content column when the user sends an image. This makes message history self-contained: subsequent conversation turns already have the image data available without any disk dependency. The existing [image_data:] stripping logic in call_llm() handles non-vision models (no double work needed). Also keeps display_content for session titles so base64 is never exposed in the UI.

Saved messages now contain [image_data:data:image/...;base64,xxx] markers for multi-turn vision context. The parseMessage function was not stripping them, causing: 1. Raw base64 rendered as visible text below the image thumbnail 2. Scroll broken by the massive inline string Fix: in parseMessage(), after stripping the [file:] prefix, detect and remove any [image_data:...] markers from the displayed content, and extract the data URL into imageUrl so the image thumbnail still appears in history.

Root cause: .main-content uses min-height: 100vh (grows with content), so .chat-messages overflow-y: auto never triggers. Fix: - Add .main-content.chat-page with height: 100vh + overflow: hidden - Detect chat route in Layout.tsx (useMatch) to apply the modifier - Chat root div now fills 100% height (flex column) - Restore padding via scoped CSS rules for page-header and chat-container

Root cause: the live WebSocket chat wrapper div had flex:1 but no minHeight:0. In CSS flexbox, without minHeight:0 a child can grow to its content height (1373px) ignoring the parent's constrained flex boundary (457px = 100vh - 206px). This prevented overflow-y:auto on the inner chatContainerRef from ever triggering a scroll. Fix: add minHeight:0 and overflow:hidden to the wrapper div so the flex chain correctly constrains the scrollable message area.

…precated /chat route Changes: 1. AgentDetail ChatMessageItem: strip [image_data:data:url] markers from displayed text, extract image data URLs, and render them as thumbnails. Applies to both live WebSocket messages and history loaded from DB. 2. App.tsx: remove deprecated 'agents/:id/chat' route and Chat import. The chat interface lives in AgentDetail as a tab (#chat), not as a standalone route. Keeping this route caused repeated confusion where fixes were applied to the wrong component.

…render thumbnail when imageUrl absent

…/task_delegate) Implement async A2A communication patterns as proposed in upstream dataelement#310. - notify: fire-and-forget, saves message and wakes target asynchronously, returns immediately - task_delegate: async with callback, creates focus item + on_message trigger on source agent, wakes target asynchronously, source agent is notified when target completes - consult: synchronous request-response (unchanged from original behaviour) Add helper functions: - _resolve_a2a_target: shared agent lookup logic - _ensure_a2a_session: shared session creation logic - _create_on_message_trigger: programmatic trigger creation - _append_focus_item: write focus items to agent workspace - _wake_agent_async: wake target agent via trigger invocation path - trigger_daemon.wake_agent_with_context: public API for waking agents Update tool descriptions in agent_tools.py and tool_seeder.py to document the new msg_type behaviours. Refs: dataelement#310

11 tests covering: - notify: returns immediately, wakes target async - task_delegate: creates focus item + on_message trigger, wakes target async - consult: calls LLM synchronously, returns reply - default msg_type falls back to notify - error cases: missing agent_name, no relationship - helper functions: _append_focus_item, _create_on_message_trigger, _wake_agent_async - OpenClaw targets still use gateway queue regardless of msg_type

Two issues found during integration testing: 1. Wake storm: wake_agent_with_context had no chain depth or dedup protection, causing A→B→A→B infinite loops. Now tracks per-pair chain depth (_A2A_WAKE_CHAIN) with max 3 hops, and respects the existing 30s DEDUP_WINDOW before waking an agent. 2. Notify silent failure: if log_activity or _wake_agent_async threw an exception, the entire tool call failed silently — the LLM never received a tool result and produced no user-visible reply. Both notify and task_delegate paths now wrap auxiliary operations in try/except so the return string always reaches the LLM. - wake_agent_with_context accepts from_agent_id for chain tracking - _wake_agent_async passes from_agent_id through - All auxiliary calls (log_activity, focus, trigger, wake) are individually wrapped with error handling

a2a_wake is an internal A2A mechanism. Its Reflection Session output should not be pushed to the user's active chat session or WebSocket. Only user-facing triggers (on_message from users, webhooks, cron, etc.) should produce visible notifications.

A2A on_message triggers created by task_delegate now have: - max_fires=1: auto-disable after receiving one reply - expires_at: 24h TTL as safety net Without max_fires, on_message triggers monitoring another agent's assistant messages would fire indefinitely in a loop: A wakes → replies (assistant msg) → B's trigger fires → B replies → A's trigger fires → ... ad infinitum. This was an existing system issue exposed by the async A2A feature. The two storm triggers (wait_manager_meeting_followup and wait_xiaozhi_meeting_sync_r2) were manually disabled in the DB.

The DEDUP_WINDOW (30s) was preventing legitimate message deliveries from waking the target agent. This caused task_delegate callbacks to be silently dropped: 1. A delegates task to B via task_delegate 2. B completes and sends notify reply to A via send_message_to_agent 3. A was recently woken by the trigger daemon → DEDUP skips A's wake 4. A never receives B's reply → user sees no result Fix: All _wake_agent_async calls from send_message_to_agent now pass skip_dedup=True. Chain depth protection (max 3 hops) is still active to prevent storms, but individual message deliveries are never dropped. wake_agent_with_context gains a skip_dedup parameter that bypasses the _last_invoke dedup check while keeping chain depth protection.

…tion Problem: When users said 'summarize tasks assigned to Manager' without specifying msg_type, the LLM defaulted to notify (fire-and-forget), so the user never got the results back. Changes: - msg_type is now a required parameter — LLM must choose explicitly every time it calls send_message_to_agent - Tool description now includes concrete examples for each type: notify = one-way announcement, consult = quick question, task_delegate = delegate work and get results back - Explicit guidance: 'When the user asks another agent to perform a task, use task_delegate, NOT notify' - Updated tool_seeder.py to match This ensures the LLM will auto-select task_delegate when the user asks another agent to do work, without the user needing to specify the msg_type manually.

The key insight: the LLM should decide msg_type based on ONE question: 'Does the target need to DO WORK and return results?' - If yes → task_delegate (most common for user requests) - If just FYI → notify - If quick factual question → consult - When in doubt → prefer task_delegate (safer, guarantees result) Added concrete verb examples (analyze, research, summarize, write, compare, plan, review, find out, confirm) so the LLM can match ambiguous user phrases like 'check with X', 'look into Y', 'get back to me on Z' to task_delegate without explicit keywords.

Before: ⚡ 触发器触发 wait_manager_api_key_reply After: ⚡ 监听经理关于API Key的回复 The trigger name is an internal identifier, meaningless to users. Now uses the trigger's reason field (human-readable description) as the notification headline, with a 80-char truncation limit.

Before: ⚡ 经理 is expected to reply after completing a delegated task... After: ⚡ 等待经理完成任务并回复 The trigger reason is an internal instruction for the agent (in English), not suitable for user-facing notifications. Added a notification_summary field in trigger config (_notification_summary) that stores a concise, user-friendly headline. The notification builder checks this field first before falling back to the reason field.

The a2a_wait trigger's Reflection Session output was being pushed directly to the user's chat, but the agent didn't know its output was user-facing. So it wrote internal monologue (trigger management, focus state, reasoning process) that confused the user. Updated the trigger reason to explicitly state: - 'Your reply will be shown to the user' - 'Do NOT mention triggers, focus items, internal state, or reasoning' - 'Just give the user the actionable outcome' Before: ⚡ 等待经理完成任务并回复处理完毕。经理的汇总确认：触发器 a2await经理已取消... Focus item wait经理task 已为 [x] 完成状态... After (expected): ⚡ 等待经理完成任务并回复经理已完成任务汇总，结果如下：...

Prompt improvement + regex post-processing to ensure user-facing notifications read naturally. 1. Prompt: Explicitly list banned terms (trigger name, focus item, a2a_wait, task_delegate, etc.) and tell agent to write as if talking to a colleague. 2. Post-processing: Regex filter strips any remaining internal identifiers from the notification text before pushing to user. Patterns cover: a2a_wait_*, wait_*_task, focus_item, trigger status lines, bullet points with internal terms, etc. Before: ✅ a2await经理触发器已取消 ✅ wait经理task focus项已为完成状态 📋 核查结论：无新增任务 After (expected): 经理确认：无新增任务或变化，唯一待办仍是API Key问题。

Added patterns to catch remaining leaks: - resolve_* identifiers (e.g. resolve_smithery_api_key) - 已静默清理触发器 / 已静默处理完毕 - 继续待命 / 待命 - Trailing punctuation cleanup Expected before: 已静默清理触发器。唯一活跃待办仍为resolve_smithery_api_key。继续待命。 Expected after: 唯一活跃待办仍为API Key问题。

a2a_wake (notify) triggers don't need the full 50-round tool loop. Limiting to 2 rounds saves significant tokens since the agent only needs to: read the message → maybe update memory or take one action. Token savings per notify: - Before: up to 50 rounds of tool calls (each round = full context) - After: max 2 rounds (read message + one action) Also added max_tool_rounds_override parameter to call_llm() so callers can cap the tool loop without modifying the agent's DB config.

New field on Agent model: a2a_async_enabled (boolean, default=False) - When False (default): send_message_to_agent silently converts notify and task_delegate to consult. Behavior identical to before this PR — all A2A communication is synchronous. - When True: full async A2A features (notify, task_delegate, consult) This allows safe rollout: 1. Deploy the code — all agents work exactly as before (flag off) 2. Enable per-agent via API: PATCH /agents/{id} {"a2a_async_enabled": true} 3. If issues arise, flip the flag back to false — instant rollback Changes: - Agent model: new column a2a_async_enabled - AgentUpdate schema: accepts a2a_async_enabled in PATCH - AgentOut schema: returns a2a_async_enabled in GET - _send_message_to_agent: checks flag before branching - Alembic migration: add_a2a_async_enabled - Tests: feature flag off/on scenarios

Reasons for company-level instead of agent-level: 1. A2A communication involves two agents — if Alice has it on but Bob doesn't, which mode should be used? Company-level = consistent behavior 2. Simpler admin UX: one toggle for the whole company 3. Matches existing pattern (min_heartbeat_interval_minutes is also tenant-level) Changes: - Removed a2a_async_enabled from Agent model and schemas - Added a2a_async_enabled to Tenant model (default False) - _send_message_to_agent now queries tenant.a2a_async_enabled - Updated all 13 tests with tenant mock - Migration: drop agents.a2a_async_enabled, add tenants.a2a_async_enabled

Added a toggle switch in EnterpriseSettings > Company Info tab: - Title: 'Agent-to-Agent Async Communication' with BETA badge - Description: explains the three modes and that disabling restores the previous synchronous behavior - Toggle: on/off with confirmation dialog when enabling - Confirmation dialog lists known issues: • Agent replies may contain internal terms • task_delegate callbacks may be delayed • Token consumption will increase • Agent loops may occur • Instructions to disable if issues arise Backend changes: - TenantOut schema: added a2a_async_enabled field - TenantUpdate schema: added a2a_async_enabled field - PUT /tenants/{id} now accepts a2a_async_enabled Frontend changes: - Toggle switch in Company Management section - Reads from currentTenant.a2a_async_enabled - Updates via PUT /tenants/{id} - Confirmation dialog on enable (window.confirm)

Added enterprise.a2aAsync translations to both language files: en.json: - title: Agent-to-Agent Async Communication - description: explains three modes and fallback behavior - enableWarning: confirmation dialog with known issues zh.json: - title: 数字员工间异步通信 - description: 中文说明三种模式 - enableWarning: 中文确认弹框，包含已知问题列表 Fixed duplicate keys in zh.json enterprise.tabs section.

Alembic requires explicit 'revision' and 'down_revision' variables in each migration file. The previous version only had them in the docstring, which alembic couldn't parse. Also fixed: DB had stale alembic_version pointing to a deleted migration (add_llm_concurrency_group). Updated to point to the actual latest revision (d9cbd43b62e5).

1. restart.sh: Added automatic 'alembic upgrade head' before backend start. New DB columns will now be applied on every restart, no more manual ALTER TABLE needed. 2. SQL wildcard injection fix (MEDIUM): - trigger_daemon.py: Sanitize from_agent_name and from_user_name before ilike interpolation (same pattern as agent_tools.py) 3. Deprecation fix (LOW): - asyncio.get_event_loop() → asyncio.get_running_loop() 4. Memory leak fix (LOW): - Added _cleanup_stale_invoke_cache() to periodically evict stale entries from _last_invoke dict (runs every ~60s) 5. Regex scope restriction (conflict prevention): - Internal term regex filter now ONLY applies to a2a_wait_* triggers, not to all trigger notifications. Prevents false positives on user on_message, heartbeat, cron, etc. 6. Conflicts analysis (all clear): - OpenClaw path: early return before msg_type branching ✅ - max_tool_rounds_override: defaults None, only for a2a_wake ✅ - msg_type required: code defaults to 'notify', flag forces 'consult' ✅ - _notification_summary: safe .get() read, transparent to existing code ✅ - Tenant DB query: PK lookup <1ms, acceptable at current scale ✅

…r Zone

When a2a_async_enabled is False, the send_message_to_agent tool schema is dynamically simplified to remove the msg_type parameter. This prevents the LLM from selecting notify/task_delegate modes (which get silently overridden to consult) and confusing users who see the raw tool call arguments in the chat UI.

Agent-to-agent sessions store creator's user_id, causing them to be filtered out from the Other Users admin view. Exempt source_channel=agent sessions from the user_id filter so they always appear.

- DingTalk org sync rate limiting - Multimodal image context persistence + chat display - A2A async communication (notify / task_delegate / consult) - Feature flag (tenant-level, default OFF) - Original A2A code by haoyi (39499740)

- Add dingtalk_token.py: global access_token cache with auto-refresh - Add dingtalk_reaction.py: thinking indicator (reaction) during LLM processing - Enhance dingtalk_stream.py: media download pipeline, auto-reconnect with exponential backoff, support for picture/richText/audio/video/file messages - Update dingtalk.py: accept image_base64_list and saved_file_paths in message processing, forward media to LLM with vision support - Update dingtalk_service.py: add download_dingtalk_media convenience wrapper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wisdomqin · 2026-04-13T14:49:59Z

🎉 Merged! Thanks @nap-liu for the solid contribution — the DingTalk media support, token cache, thinking reaction, and auto-reconnect are all well-implemented. We're staging this on our dev environment for verification before it goes to main. Appreciate you following our feedback and splitting this out cleanly from #370.

39499740 and others added 30 commits April 8, 2026 10:16

feat(ui): implement drag-and-drop file upload across application

d2585f8

Merge pull request dataelement#378 from dataelement/yutong03

5193737

feat(ui): implement drag-and-drop file upload across application

fix Docker access port to 3008

e873678

feat: add Exa AI-powered search tool (dataelement#390)

f673c61

feat: add Exa AI-powered search tool

fix: Docker access port to 3008 (dataelement#388)

41e343e

fix Docker access port to 3008

fix Docker access port to 3008

1928805

Revert "fix: Docker access port to 3008 (dataelement#388)"

63c2452

This reverts commit 41e343e, reversing changes made to f673c61.

Revert "Revert "fix: Docker access port to 3008 (dataelement#388)""

c933484

This reverts commit 63c2452.

Merge branch 'bugfix'

c5870d3

fix: add rate limiting to DingTalk org sync API calls (dataelement#374)

2dd3d02

fix: add rate limiting to DingTalk org sync API calls

fix(chat): deduplicate image display - strip markers always but only …

2559313

…render thumbnail when imageUrl absent

chore: remove uv.lock generated by test dependency install

eedcfef

39499740 and others added 23 commits April 13, 2026 00:07

ui: move A2A async toggle to bottom of Company Info tab, before Dange…

5a7c6fe

…r Zone

fix: show agent-to-agent sessions in Other Users tab

1459a3c

Agent-to-agent sessions store creator's user_id, causing them to be filtered out from the Other Users admin view. Exempt source_channel=agent sessions from the user_id filter so they always appear.

release: v1.8.3-beta

278f503

chore: update Helm appVersion to 1.8.3-beta

1214576

docs: update release notes title with feature keywords

36c4464

This was referenced Apr 13, 2026

PRG-08: DingTalk reconnect (from #341) #369

Closed

PRG-07: Media & image context bundle (from #343 #334) #370

Closed

wisdomqin changed the base branch from main to enhance April 13, 2026 14:49

wisdomqin merged commit ff408fb into dataelement:enhance Apr 13, 2026

wisdomqin added a commit that referenced this pull request Apr 13, 2026

Merge enhance into release: DingTalk media message support (#392)

1cbe619

nap-liu mentioned this pull request Apr 14, 2026

feat: rehydrate image context in multi-turn conversations for vision models #334

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add DingTalk media message support (image, file, voice, video)#392

feat: add DingTalk media message support (image, file, voice, video)#392
wisdomqin merged 54 commits intodataelement:enhancefrom
nap-liu:pr/dingtalk-media-support-v2

nap-liu commented Apr 13, 2026

Uh oh!

wisdomqin commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

nap-liu commented Apr 13, 2026

Summary

Test plan

Uh oh!

wisdomqin commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants