fix: orpheus TTS default + auto-port-fallback (1.0.0-alpha.2)#72
Merged
Conversation
… tools - SkillLoader: track which provider created vectors, use SAME provider for matching - generateEmbedding: accepts forceProvider param to match vector provider - TF-IDF vectors use 0.05 threshold, OpenAI uses 0.25 — no more mixing - CORE_TOOLS: added sendFile + replyWithFile (main agent can send files to users)
…re, file sharing instruction in SOUL.md
…Call, reload, transcribeAudio
…lio Console setup
…gent sees what each can do (~334 tokens)
…wnAgent description is the source
…m Settings Tool Config
…#30) ## Summary ### Core Tools & Agent Architecture - Dynamic profile list with unique tools per profile (~334 tokens) — main agent sees what each profile can do - Orphan tools fixed: generateImage→designer, gitTool→coder/devops, makeVoiceCall→meeting-attendant, reload→sysadmin, transcribeAudio→assistant/researcher - Removed duplicate tools: replyWithFile (duplicate of sendFile), searchFiles/searchContent (duplicates of glob/grep) - sendFile added to CORE_TOOLS — main agent can send files directly - SOUL.md: "use sendFile to send actual file, not content as text" - Sub-agent rules: "Save output to files — main agent handles sending" - Removed hardcoded profile list from SOUL.md — dynamic list in spawnAgent is the source - MCP agent: removed teamTask (sub-agents don't orchestrate) - Cleaned all replyWithFile references from permissions, TaskRunner, channels ### Cron Agent - listPresets action — agent discovers delivery presets dynamically - Cron agent context: 5 instruction lines (autonomy, retry, presets, history, delivery) - Previous run status injected - SOUL.md: scheduling section with preset discovery flow ### Embedding Fix - Skill vectors store which provider created them - Matching uses SAME provider as creation (TF-IDF vectors → TF-IDF query → 0.05 threshold) - Prevents mismatch when vault loads OpenAI key after embeddings init ### Bug Fixes - Skill paths use full path (no ~ tilde) — prevents readFile ENOENT - Removed plugin tools from Settings → Tool Config (configure via Plugins page only) - WhatsApp auto-configure webhook URL from tunnel ### Setup Wizard - Removed plugin-specific key prompts — points to Settings → Plugins ### Version - Bumped to 2026.1.0-beta.15 ## Test plan - [ ] Main agent spawnAgent shows profile list with tools in description - [ ] Sub-agents don't have orchestration tools - [ ] sendFile works from main agent on Telegram/WhatsApp - [ ] Skill paths resolve correctly (no ~ error) - [ ] TF-IDF embedding threshold works (0.05 not 0.25) - [ ] Cron jobs show instruction context in logs - [ ] Settings page shows only ElevenLabs in Tool Config (not plugin tools)
…odel (orpheus) STT: auto-detects Groq (free) → OpenAI. whisper-large-v3-turbo default on Groq. TTS: fixed from decommissioned playai-tts-arabic to canopylabs/orpheus-v1-english. Both configurable via provider/model params or env vars (STT_MODEL, TTS_GROQ_MODEL).
… are not commands rule
…rors, generic message for tool failures
…instead of crashing to user
…, cost, all preserved on retry
… linked automatically First global channel message creates the admin tenant + marks it globalAdmin. Subsequent global channel messages reuse the same tenant. All channel identities auto-linked to the admin tenant. Cross-channel sending works: Telegram admin can send files to Discord.
- cronTool: don't auto-set "announce" delivery when deliveryPreset is provided — preset takes priority. Display actual job.delivery.mode instead of local variable. - CronExecutor: task instructions explicitly forbid manual delivery (sendFile/sendMessage) — delivery is automatic post-completion. - CronExecutor: _freshMeta resolves instanceKey for per-tenant channel instances so delivery routes through tenant's own bot, not the (possibly non-running) global channel.
- Pencil icon on each job card opens edit dialog pre-filled with current values - Edit dialog supports all fields: name, schedule, task input, model, retries, timeout, delivery - Delivery mode editable: none, preset, multi-target, webhook - Schedule type changeable between cron/every/at
Agent needs to use sendFile/sendMessage for task execution. Automatic delivery is supplementary, not a replacement.
Agent now sees exactly which channels are available for delivery: - Preset: channel names from preset targets - Multi-target: unique channel list - Announce: single channel name
imageAnalysis on different frames, generateImage with different prompts, etc. are distinct operations — not loops.
…cles - imageAnalysis: use resolveDefaultModel() instead of hardcoded preference list - No more fallback to gpt-5-mini — always uses the configured default model - LoopDetector: skip ping-pong detection when both tools are value-sensitive with different params (e.g. imageAnalysis→executeCommand verify-fix cycle)
…er, loop detection, skills, crew
## Summary - **Media Studio crew** — merged video-editor + media-creator into one crew. Generates AI images/videos/music, edits existing videos via Remotion (38 rule files), adds captions, transitions, effects, titles. - **Provider failover** — classifies errors (transient vs permanent), exponential backoff retry, automatic provider cooldown and fallback. - **Loop detection** — 4 strategies (exact repeat, ping-pong, semantic repeat, polling) with VALUE_SENSITIVE_TOOLS for legitimate multi-call workflows (imageAnalysis on different frames, readFile on different paths). - **Live status** — typing indicators on Discord/Telegram, status reactions tracking task progress. - **Prompt cache stability** — deterministic sorting of tools/skills for Anthropic cache hits. - **Crew routing** — system prompt now shows crew capabilities (plugin tools) so main agent picks the right crew. - **60 skills** — added github, discord-ops, slack-ops, remotion, pdf-editing, google-workspace, macos-automation. - **New tools** — generateVideo, generateMusic, imageOps, createPoll, textToSpeech, transcribeAudio. - **Channel enhancements** — sendTyping, editMessage, deleteMessage, sendThreadReply, sendPoll, sendEmbed on Discord/Telegram/Slack. - **Discord catch-up** — fetches missed DMs via REST API on restart, persists last-seen timestamp. - **Media persistence** — files saved to data/media/ instead of /tmp/. - **imageAnalysis** — uses configured default model instead of hardcoded fallback chain. - **Chat UI** — single-session, renders images/video/audio inline. - **Version** — 2026.1.2-beta.0 ## Test plan - [ ] Send "create an image of X" on Discord → should route to video-editor crew, generate image, send back - [ ] Send video editing prompt → crew reads Remotion skill + rules, creates project, renders MP4 - [ ] Trigger 429 from provider → verify retry with backoff, then fallback - [ ] imageAnalysis on multiple frames → verify no LoopDetector blocking - [ ] Restart server → verify Discord catch-up fetches missed messages - [ ] Check typing indicator visible on Discord during task processing 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…on Studio - Import 53 OpenClaw skills as subdirectories with SKILL.md (word-for-word) - Remove OpenClaw-specific skills (clawhub, canvas, taskflow, healthcheck, etc.) - Remove duplicates and CLI-bound skills we don't ship - Rebrand all OpenClaw references to Daemora in metadata and content - SkillLoader: fix subdirectory skill name parsing (foo/SKILL.md → "foo") - Rename video-editor crew to "Motion Studio" with creation-focused description - Update remotion skill to clean format with proper rule paths
UI: - Add "Clear" button to Chat page top-right corner - Calls DELETE /api/sessions/main, recreates session, clears messages - Disabled when loading or no messages - Confirms before deletion - Rebuild dist Security (issue #46 — command injection findings): - clipboard.js: replace shell echo|pipe with spawnSync stdin (no shell) - iMessageChannel.js: replace osascript -e shell escape with stdin via spawnSync - screenCapture.js: parseInt region coords + duration to prevent injection - cli.js: validate daemon logs --lines arg as integer All changes use spawnSync with shell:false where untrusted input flows into subprocess args. No shell parsing means no escape-based injection.
The skill list shown in the system prompt is the matched subset, not all of them. Agents were guessing counts from context (e.g. saying '20 skills' when there are 58 total). Now they're told explicitly to use listDirectory when asked for exact counts or full lists.
…ver final response
## Summary - **Imported 53 OpenClaw skills** word-for-word as subdirectories with SKILL.md (github, coding-agent, discord, slack, notion, obsidian, tmux, trello, video-frames, things-mac, apple-notes, apple-reminders, summarize, weather, model-usage, skill-creator, etc.) - **Removed OpenClaw-specific skills** (clawhub, canvas, taskflow, healthcheck, mcporter, node-connect, blogwatcher, blucli, eightctl, gifgrep, songsee, ordercli, peekaboo, wacli, xurl, himalaya, bluebubbles, imsg, nano-pdf, openhue, oracle, sherpa-onnx-tts, goplaces, 1password, bear-notes, gh-issues, sag, gog, openai-whisper, sonoscli, spotify-player, gemini, voice-call, session-logs) - **Removed duplicates** (document-pdf.md, pdf-editing.md — kept pdf.md) - **Rebranded** all OpenClaw references → Daemora across skills and helper scripts - **Renamed crew** `video-editor` → "Motion Studio" with creation-focused description (branded videos, animated mockups, freelancer demos, motion graphics, social reels, charts, 3D, audiograms, plus editing existing recordings) - **Updated remotion skill** to clean format with rule paths under `crew/video-editor/rules/` - **Fixed SkillLoader**: - Subdirectory skill name parsing (`foo/SKILL.md` → "foo" instead of `foo/SKILL`) - Path-based lookup for `<dir>/SKILL.md` (uses parent dir name) - **Version**: 2026.1.2-beta.1 ## Test plan - [x] SkillLoader loads 58 skills cleanly (0 with empty descriptions) - [x] All key skill names resolve: remotion, github, coding-agent, discord, slack, notion, obsidian, video-frames, tmux, apple-notes - [x] Path-based lookup works: `skills/discord/SKILL.md` resolves to discord skill - [x] Crew plugin.json validates: 18 tools, 3 skills, "Motion Studio" name - [x] Remotion skill: 37 rule references all resolve to existing files (38 .md in rules/) - [x] No "openclaw" references anywhere in skills/ - [x] Live test: send video creation prompt → main agent picks Motion Studio crew → reads remotion skill → reads rule files → renders MP4 🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Summary - Move the embedded demo video to appear right after the intro paragraph, above the "What Daemora Can Do" section - Better hook for first-time visitors — they see Daemora in action immediately 🤖 Generated with [Claude Code](https://claude.com/claude-code)
UI: - Cron: remove "Multi-Target (pick tenants)" — was leftover from removed multi-tenant system - Cron: replace tenant tree picker with channel destinations (multi-select active channels) - Cron: pulls from /api/channels/destinations like Watchers does - Cron: presets dialog also uses channel destinations - Edit job: maps server "multi" mode → UI "channels" mode + restores selected destinations - SchedulePicker: minute dropdowns now show all 60 minutes (was 00/15/30/45) Backend: - CronExecutor: use sessionId="main" so cron runs are part of continuous chat history instead of isolated per-job sessions
The mid-loop steerQueue drain at onStepFinish was consuming items but silently throwing them away. Follow-ups sent while a task was running just disappeared, leaving the user waiting forever. Fix: don't drain mid-step. After generateText finishes, check the steerQueue. If items present, append them to the conversation and re-enter the loop with the follow-up so the agent acknowledges and continues working.
…step) Previous fix waited for the entire generateText call to finish before injecting the follow-up — could be 30+ steps. Now we use stopWhen to gracefully halt after the current step completes (~1-5s) and re-enter the loop with the follow-up appended. Completed steps are preserved in result.response.messages so no work is lost.
Streaming (channel-agnostic, reusable for future channels):
- AgentLoop: new `streaming` param. When true, use streamText instead of
generateText. Drains textStream emitting `text:delta` events on EventBus
tagged with taskId. Resolves text/usage/response/finishReason via
PromiseLike. Returns identical shape to generateText path so all
downstream logic (steerQueue, retry, fallback, persistence) works the
same.
- BaseChannel: new `supportsStreaming` getter (default false). Channels
override to opt in.
- HttpChannel: returns `supportsStreaming = true`.
- TaskRunner: passes `streaming: ch.supportsStreaming` based on the
task's originating channel.
- SSE endpoint (/api/tasks/:id/stream): forwards `text:delta` and
`text:end` events filtered by taskId.
- Chat.tsx: listens for `text:delta`, appends to a streaming assistant
message. First delta creates the message, subsequent deltas append.
text:end resets the streaming flag so the next iteration starts a fresh
message.
Future channels (Discord/Slack/Telegram) just need to:
1. Set `get supportsStreaming() { return true; }`
2. Subscribe to `text:delta` events on EventBus (filter by taskId)
3. Implement OpenClaw-style buffer-and-edit pattern (1-1.2s throttle)
UI follow-ups:
- Chat.tsx: input no longer disabled while task is running. Send button
always works. handleSend detects mid-task follow-up and routes via
/api/chat (backend already pushes to steerQueue and AgentLoop picks it
up at next step boundary).
- Status pill: don't render the floating "PROCESSING..." bubble below an
active streaming message — show a tiny inline indicator instead. Big
pill only shows before the first token arrives.
Remotion rules (4 new):
- crew/video-editor/rules/cursor-and-clicks.md
- crew/video-editor/rules/focus-zoom.md
- crew/video-editor/rules/theme-switching.md
- crew/video-editor/rules/ui-chrome.md
- skills/remotion.md updated with index entries pointing to them
…ty hardening (#49) ## Summary A batch of improvements built on top of PR #48. The biggest one is **token-level streaming for HTTP/UI** built as a generic, reusable pattern so future channels just opt in. ### Streaming (channel-agnostic) - AgentLoop accepts a `streaming` flag. When true, uses `streamText` instead of `generateText`. Drains the textStream and emits `text:delta` events on EventBus tagged with taskId. - BaseChannel exposes a `supportsStreaming` getter (default false). HttpChannel sets it true. - TaskRunner reads `ch.supportsStreaming` from the originating channel and passes it through. - SSE endpoint forwards `text:delta` and `text:end` events filtered by taskId. - Chat.tsx listens for `text:delta` and appends to a streaming assistant message. - Future channels (Discord/Slack/Telegram) just need to override `supportsStreaming = true` and subscribe to `text:delta` with their own buffer-and-edit logic. Zero AgentLoop changes needed for them. ### UI follow-ups (the input was blocking the user) - Chat.tsx input no longer disabled while a task is running. - Mid-task follow-ups route through `/api/chat`. Backend pushes them onto the running task's `steerQueue`. - AgentLoop's `stopWhen` now also halts when the steerQueue has items, so the next step boundary picks up the follow-up (~1-5s wait, not the whole task). - Status pill: when an active streaming message exists, the floating "PROCESSING..." pill is hidden and replaced with a tiny inline indicator. Big pill only shows before the first token arrives. ### Scheduler / Cron - Removed leftover "Multi-Target (pick tenants)" option from the multi-tenant cleanup. - Cron now uses the same channel-destinations picker as Watchers (`/api/channels/destinations`). - SchedulePicker minute selectors show all 60 minutes (was hardcoded to 00/15/30/45). - CronExecutor uses `sessionId = "main"` so scheduled jobs run in the same session as your chat (continuous history). ### Skills + crew - Imported 53 OpenClaw skills as subdirectories with SKILL.md (word-for-word). Removed OpenClaw-specific skills (clawhub, canvas, taskflow, healthcheck, etc.) and CLI-bound ones we don't ship. Rebranded all "OpenClaw" → "Daemora" in metadata and content. - Renamed video-editor crew → "Motion Studio" with a creation-focused description (branded videos, mockups, motion graphics, demo reels). - 4 new Remotion rules: cursor-and-clicks, focus-zoom, theme-switching, ui-chrome. - SkillLoader: fixed subdirectory skill name parsing so `foo/SKILL.md` becomes "foo". - Crew section in system prompt now also shows plugin tools so the main agent picks the right crew for media tasks. ### Tools / instructions - replyToUser description tightened — only for mid-task progress, never the final response. - Agent instructed to use `listDirectory` for exact skill/crew counts instead of guessing from the matched subset shown in the prompt. - imageAnalysis now uses the configured default model (gpt-5.4 in your case) instead of falling back to a hardcoded preference list. - LoopDetector: media tools (`imageAnalysis`, `imageOps`, `generateImage`, `generateMusic`, `generateVideo`) added to `VALUE_SENSITIVE_TOOLS` so calls on different inputs aren't flagged as semantic repeats. Verify-fix cycles (e.g. `imageAnalysis` → `executeCommand` → `imageAnalysis`) no longer get blocked as ping-pong. ### Security (closes #46 — automated scan findings) - `clipboard.js` — write now uses `spawnSync` with stdin instead of `echo \${text} | pbcopy/clip`. No shell, no injection. - `iMessageChannel.js` — osascript now invoked via stdin instead of `osascript -e '\${script}'`. No more incomplete escape vulnerabilities. - `screenCapture.js` — region coords and duration coerced via `parseInt` to prevent injection through agent params. - `cli.js` — daemon logs `--lines` arg validated as integer. ### UI - Chat: themed AlertDialog for clear-history confirmation (was native browser confirm). - Chat: top-right "Clear" button to reset history. - Chat: removed multi-session sidebar; single "main" session, channel-bound. - README: embedded demo video moved above "What Daemora Can Do". ## Test plan - [x] All modules load (smoke test) - [x] SkillLoader: 58 skills, 0 with empty descriptions, key skills resolve by name and path - [x] HttpChannel.supportsStreaming = true; DiscordChannel = false (default) - [x] AgentLoop syntax + streaming branch wires through correctly (static review of streamText API contract against ai SDK 6 types) - [x] UI builds clean - [ ] Live: send chat message via UI, verify token-level streaming - [ ] Live: send follow-up while task running, verify it gets injected at next step - [ ] Live: verify Discord/Telegram still work unchanged (supportsStreaming = false) - [ ] Live: verify cron job runs in main session, channel destinations save correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Streaming (channel-agnostic, OpenClaw draft-stream-loop port): - New src/channels/StreamingEditor.js - direct port of OpenClaw's createDraftStreamLoop. Pure caller-driven utility: trailing-edge throttle with dynamic delay, in-flight coalescing, drain loop that picks up new text without setTimeout latency. - DiscordChannel: supportsStreaming=true. Subscribes to text:delta and text:end, buffers, calls loop.update(buffer). 1.2s throttle, 1990 char limit. Falls back to chunked delivery on stream failure. - TelegramChannel: same pattern. 1s throttle, 4090 char limit. - SlackChannel: same pattern. 1s throttle, 3800 char limit. - HTTP/UI streaming continues to work via SSE text:delta forwarding. - Channels that don't override supportsStreaming stay non-streaming. UI hot-reload fix: - src/index.js: index.html now read with mtime cache instead of cached on module load. UI rebuilds (with new asset hashes) are picked up on next request without a server restart. Embeddings batched + parallel: - src/utils/Embeddings.js: new generateEmbeddings(texts[]) using Vercel AI SDK's embedMany. One HTTP request handles up to 2048 inputs. - src/skills/SkillLoader.js: embedSkills() now collects all pending skills into one batch call. ~30s sequential -> ~500ms batched. Falls back to parallel per-skill (concurrency 8) if batch fails. Version: 2026.1.2-beta.2
) ## Summary ### Streaming on Discord / Telegram / Slack (OpenClaw port) - **New `src/channels/StreamingEditor.js`** — direct port of OpenClaw's `createDraftStreamLoop` (`agents/openclaw/src/channels/draft-stream-loop.ts`). Pure caller-driven utility: - Trailing-edge throttle with dynamic delay (`Math.max(0, throttleMs - elapsed)`) - In-flight coalescing — never overlaps API calls - Drain loop — picks up new text mid-send without waiting for another setTimeout - Skip if unchanged - Stops on send failure (returns false) - Each channel subscribes to `text:delta` and `text:end` itself, buffers tokens, and calls `loop.update(buffer)`. - After task completes: removes listeners, awaits `loop.flush()`, calls `loop.stop()`. - The streamed message IS the final delivery — no duplicate send. - Falls back to normal chunked delivery on any stream failure (overflow, API error, rate limit). | Channel | Streaming | Throttle | Max chars | |---|---|---|---| | HTTP/UI (SSE) | ✅ | n/a | n/a | | Discord | ✅ | 1.2s | 1990 | | Telegram | ✅ | 1s | 4090 | | Slack | ✅ | 1s | 3800 | | WhatsApp / Email / iMessage / etc. | ❌ (final reply only) | — | — | ### UI hot-reload (fixes blank-page after rebuild) - `src/index.js`: index.html is now read with an mtime cache instead of cached on module load. Previously, after `npm run build` produced new asset hashes, the running server still served the OLD HTML referencing deleted asset files → blank page until restart. - Now: UI rebuilds are picked up live on the next request. ### Batched embeddings (fixes slow startup) - `src/utils/Embeddings.js`: new `generateEmbeddings(texts[], provider?)` that uses Vercel AI SDK's `embedMany`. Handles OpenAI, Google, Ollama, TF-IDF. - `src/skills/SkillLoader.js`: `embedSkills()` now collects all pending skills into a **single batch call**. Falls back to parallel per-skill (concurrency 8) if batch fails. - **Result**: ~30s sequential → **~500ms batched** for 58 skills. ### Version - Bumped to `2026.1.2-beta.2` ## Test plan - [x] All channel modules load + `supportsStreaming` returns expected values - [x] `createStreamingLoop` smoke test sends progressive updates correctly - [x] Server module loads cleanly - [x] UI builds successfully - [ ] Live: send a long-response message on Discord → verify smooth streaming, no duplicate final reply - [ ] Live: send a long-response message on Telegram → same - [ ] Live: send a long-response message on Slack (in a thread) → same - [ ] Live: rebuild UI while server runs → refresh browser → loads new bundle without restart - [ ] Live: clear `data/skill-embeddings.json` and restart → verify batched embedding completes in <1s
Restores the full JS commit ancestry under main so all original contributors stay in the contributors list. Uses 'ours' strategy — the TypeScript working tree is preserved entirely, no JS files come back. Only the commit graph changes: legacy/main (cec4289 and its 916 ancestors) is recorded as a parent of this merge.
…t dist-tag Adds a post-publish step that runs: npm dist-tag add daemora@<version> latest so plain `npm install daemora` resolves to the freshly published version instead of the legacy 2026.1.2-beta.2 JS release pinned to `latest` from before the rewrite.
…1.0.0-alpha.2 - Voice: replace deprecated `playai-tts-english` default with `canopylabs/orpheus-v1-english` (PlayAI was sunset 2025-12-23 on Groq). Default voice updated to `troy` to match Orpheus's accepted set [autumn, diana, hannah, austin, daniel, troy]. Fixes 404 "model_not_found" on voice worker startup. - GroqTTS: drop dead playai entries from sample-rate / voice maps. - providers.ts: register orpheus-v1-english + orpheus-arabic-saudi as the Groq TTS choices in the model picker. - start.ts: on EADDRINUSE bind to port 0 (random) instead of crashing, and propagate the actual bound port to tunnel.start, banner URL, and autoOpen so the printed URL matches the real port. - Bump package.json to 1.0.0-alpha.2 so the next Publish workflow run ships these fixes and re-points npm `latest` at the alpha (publish.yml already promotes whatever it publishes to latest).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes two runtime errors that surfaced after installing
daemora@alphaand bumps the version so the next Publish run repoints npmlatestat the alpha.Test plan