Skip to content

v2026.05.22#312

Merged
xiami762 merged 526 commits into
mainfrom
dev
May 22, 2026
Merged

v2026.05.22#312
xiami762 merged 526 commits into
mainfrom
dev

Conversation

@stephamie7
Copy link
Copy Markdown
Contributor

No description provided.

xiami762 and others added 30 commits April 27, 2026 13:11
Wire the OneSEC, NGTIP, and Qingteng API handlers to respect the configured SSL verification toggle so private deployments can bypass certificate checks when needed. Add regression tests that cover both enabled and disabled verification paths and isolate Qingteng tests from local machine config.
Introduce a unified entry skill for all `onesig_*` tools, mirroring the
style of `onesec-use` / `qingteng-use`. Any task that mentions OneSIG /
SIG / Secure Internet Gateway must load this skill first instead of
calling the tools directly.

- SKILL.md: API-vs-browser decision flow and write-action confirmation
  protocol; declares the skill as the single decision entry-point.
- references/api-reference.md: action routing table for the six grouped
  tools (`onesig_login` / `assets` / `device` / `helper` / `monitoring`
  / `strategy`), business primary keys (uniqueId / uid / pid / groupId /
  ruleId / srcIp / server+port, etc.), high-frequency call examples,
  binary/file endpoints (`document_preview`, multipart uploads), the
  RSA-OAEP auto-encrypted fields (`password`, `dupPassword`), the
  mandatory `type=physical` flag for API Key endpoints, and the fact
  that `ips_rule_create` is actually a query.
- references/browser-workflow.md: console navigation map and
  `agent-browser` operating rules for the fallback path when the API is
  unavailable.

The reference has been cross-checked against the vendor SIGWEBAPI docs
and `onesig.handler.py`, fixing legacy drifts such as
assetType -> type, name -> username, fileName -> id,
port_protect_group_port_list -> port_protect_port_list, and the
top-level placement of `condition` / `comments` in `whitelist_add`.

Made-with: Cursor
…pi_prefix

Three interlocking changes that make OneSIG v2.5.x easier to bring online
out of the box and align SSL / cookie behaviour with the other built-in
providers (onesec / ngtip / qingteng).

- Persistent login session: after a successful login the aiohttp
  CookieJar is serialised into `~/.flocks/config/.secret.json` under a
  key shaped like `onesig_session_cookie__<sha1[:12]>`. On process
  restart, if at least one cookie is still alive the jar is rehydrated
  in-place and the captcha -> pubkey -> /v3/login -> /v3/account chain
  is skipped entirely; the very first business request that returns
  401 / responseCode 1019..1022 still triggers exactly one auto re-login
  to preserve previous behaviour. `logout()` wipes the on-disk entry,
  `close()` does not. The `persist_cookies` toggle defaults to True and
  honours camelCase / `custom_settings` / `ONESIG_PERSIST_COOKIES`
  fallbacks.
- SSL verification defaults to OFF, matching onesec / ngtip / qingteng.
  Five sources are recognised in priority order: `verify_ssl`,
  `ssl_verify`, `verifySsl`, `custom_settings.verify_ssl`, and the
  `ONESIG_VERIFY_SSL` env var, falling back to False. The
  `verify_ssl` field is removed from `provider.credential_fields` so
  the global "SSL verify" switch on `ServiceDetailPanel` becomes the
  single source of truth.
- `api_prefix` default changes from `"/api"` to `""`, matching the
  common v2.5.x deployments where nginx already routes `/v3/...` to
  the backend. Reverse-proxy deployments can set it back to `"/api"`.
  `_provider.yaml` notes are updated with the 404 -> flip-prefix
  troubleshooting tip.

`tests/tool/test_onesig_api_tool.py` covers: the three `verify_ssl`
aliases ending up as the right `ssl=` argument on
`aiohttp.session.request`, cookie snapshot purity (round-trip and
expired-cookie filtering), `__init__` load / `login()` save /
`logout()` delete, and the persisted-cookie path bypassing the full
captcha -> pubkey -> login -> account chain.

Made-with: Cursor
Some providers (onesig in particular) reuse the persisted `base_url`
as `default_value` on the metadata endpoint. The previous reload logic
in `ServiceDetailPanel` treated `value === effectiveDefault` as a
"placeholder" case and cleared the input, so reopening a configured
service showed an empty API URL and saving immediately overwrote the
backend record with an empty string.

The form now renders whatever the backend returns under `fields`
(falling back to legacy keys only) and never compares against
`default_value`. `ServiceDetailPanelApi.test.tsx` adds a regression
test where metadata returns the same value in both `default_value` and
`fields`, asserting the form still shows the persisted URL.

Made-with: Cursor
…irmware

Older OneSIG v2.5 builds reject RSA-OAEP ciphertext on POST /v3/login and
only accept the raw password, so we ship a sibling plugin (registered as
service_id `onesig_v2_5_older_api`) that mirrors the standard `onesig`
handler with one targeted change: the captcha → pubkey → encrypt → login
chain is collapsed to captcha → login, sending `self.config.password` in
the clear. All other paths — cookie session, persistence, captcha / TOTP
fallback, 401 / 1019..1022 auto-relogin, and RSA-OAEP encryption of
sensitive *non-login* write fields (change_password, user_create,
user_delete, audit-log purge, interface password, device-upgrade
password) — are kept identical to the encrypted variant so the two can
coexist on the same flocks instance without rewriting business code.

Namespacing keeps the two plugins isolated end-to-end: SERVICE_ID,
secret_id (`onesig_v2_5_older_password`), persisted-cookie prefix
(`onesig_v2_5_older_session_cookie__`), output filename prefix, and env
vars (`ONESIG_V2_5_OLDER_*`, with `ONESIG_*` retained as fallback) are
all distinct, so the persisted cookie jars and credentials of the new
and legacy plugins never clobber each other.

The skill (`plugins/skills/onesig-use`) and existing tests are left
untouched — they continue to document the standard encrypted flow.

Made-with: Cursor
…ount lockout

The "Test connectivity" button looped over `service_tools[:5]` and ran
`_build_param_sets()` (4–6 enum-driven actions per tool) for each one.
For session-based services (OneSIG / OneSec / Qingteng / ...) every
failed attempt triggers a fresh login round-trip, so a single click on
wrong credentials could fire ~30 failed logins and trip the server-side
account lockout.

Changes
* Add `_is_action_dispatch_login_probe()`: classifies tools whose name
  ends in `_login` and whose required `action` parameter is an enum
  (e.g. `onesig_login`, `onesec_login`, `onesig_v2_5_older_login`).
* `_tool_sort_key()` now puts those at the top alongside parameter-free
  login probes (priority -1).
* Replace the nested probe loop with a **single tool, single param set**
  call. For action-dispatch login tools we force `action="test"`, which
  the handler's `_dispatch_group` special-cases to a read-only call from
  `_CONNECTIVITY_TEST_ACTIONS[group]` (onesig → `get_account`, onesec →
  `common_threat_type_list`, ...). No `login` / `logout` /
  `change_password` / `get_pubkey` enum value is ever invoked.
* Surface the lockout-prevention rationale in the failure message so
  users know a single failed probe is not a definitive verdict.

Tradeoff: a non-auth failure on the chosen probe (e.g. a synthesized
param value the endpoint rejects) shows as a false negative; the user
re-tests after fixing config — strictly better than discovering the
service account got locked.

Tests
* Replace `test_failed_attempts_are_aggregated_in_message` (which
  asserted the removed multi-attempt aggregation) with
  `test_single_attempt_only_to_avoid_account_lockout`, asserting
  `execute.await_count == 1` and the new "为避免连续失败导致账号锁定"
  message.
* New `test_action_dispatch_login_tool_uses_test_action` pins the
  contract that `onesig_login`-style tools are picked over their
  sibling `_assets` / `_monitoring` groups and invoked with exactly
  `{"action": "test"}`.

Made-with: Cursor
Avoid rendering generic MCP tools with category fields as delegate task cards, while preserving child-session links and URL-driven session selection behavior.
…pdate (#198)

- Add normalizeNpmRegistry and resolveAgentBrowserNpmRegistry helpers
- Pass explicit registry via npm_config_registry when spawning npm update -g
- Refresh bundled agent-browser core skill content and add trust-boundaries ref
- Add unit tests for registry normalization and resolution precedence
* add post-login notification modal

* fix notification dismissal behavior
…ies (#194)

- Default WebUI to same-origin /api proxy for non-loopback backends; opt-in
  direct VITE_* URLs via FLOCKS_WEBUI_DIRECT_BACKEND_URLS.
- Resolve session from cookie early in apply_auth_for_request; broaden
  browser-like detection for SSE/reverse-proxy (session cookie, Mozilla UA).
- Enable xfwd on Vite dev proxies so X-Forwarded-* reach the backend.
- Document LAN/reverse-proxy behavior in README; extend CLI and auth tests.
* feat(tdp-api): semantic tool params and handler mappings

- Expand TDP tool YAML schemas with top-level filters, keyword, pagination
- Extend tdp.handler.py to map params to condition/page/fuzzy for APIs
- Update tdp-use skill and api-reference for preferred calling patterns
- Add Skyeye API plugin regression test

* feat(tdp): semantic tool params, log SQL guard, handler fixes

- Extend TDP YAML tools and api-reference for clearer agent-facing params
- Handler: interface risk condition defaults, disposal_log_list action,
  incident timeline show_attack, log_search default sql + reject full SQL
- service_asset_list: document time range as N/A for inventory APIs
- Tests: tdp_api_tools, skyeye plugins; add integration live config test
- Increase file read defaults: 2000 lines, 2000 chars/line, 20 KB cap
- Raise registry truncation: 1000 lines, 100 KB, 100k hard max chars
- Add tests for read limits and truncation constants
- Align skill tool tests with registry defaults
…tion (#201)

Emit llm before/after hook stages around model calls and expose normalized query metadata. Improve tool discovery with canonical alias matching and select:name batching so known tools can be loaded deterministically in one call.
- Persist messages under message:<session_id> and message_parts:<session_id>
  instead of per-message keys so WebUI/Message.list_with_parts work.
- Normalize legacy exported parts (tool state, reasoning time, etc.).
- Add tests for aggregated storage format and legacy reasoning payloads.
Align the skill identifier and header with the agent-browser naming, and remove outdated quickstart/reference text to keep documentation focused on current usage.
…ks (#205)

- Restart loads last recorded backend/webui host and port when CLI and
  env omit them; CLI and env still override runtime defaults.
- On Windows, validate tracked PIDs against command line and image name
  so stale PID reuse does not kill unrelated processes or skip cleanup.
- Add tests for restart defaults and Windows PID reuse scenarios.
Wires the QAX NGSOC-BD/NGSOC-LV V4.0 (R4.15.1) WebAPI into the flocks
plugin framework, modelled after the onesig integration but simplified
for NGSOC's static NGSOC-Access-Token header auth (no captcha / cookie /
RSA negotiation).

Coverage (manual sections 5.1 - 5.8):
- alarms      14 actions  (list / detail / dispose / PCAP export / AI judge SSE / judgment record)
- assets       3 actions  (asset detail, group id list, group list)
- vuls         5 actions  (per-asset vuls + config, vul / web / weak-pwd lists)
- risks        2 actions  (asset risk list, per-asset risk score)
- users        1 action   (account nicknames)
- workorders   3 actions  (status update, list, detail)
- bigscreens   6 actions  (vuln top, asset top, threat type, attack ip, attack list, victim survey)
- storage      1 action   (binary download persisted under ~/.flocks/workspace/outputs)

Highlights of the handler design:
- Declarative ActionSpec routing (rest_keys / query_keys / body_keys /
  passthrough_query / passthrough_body / binary / accept) shared by
  every group entry point; method-agnostic passthrough_query so POST
  endpoints that mix body + query (e.g. /risks/asset/asset-risks) ship
  every field correctly.
- Long-lived aiohttp.ClientSession per device with double-checked
  locking; SSL verify defaults to False (private appliance reality)
  with verify_ssl / ssl_verify / verifySsl / NGSOC_VERIFY_SSL all
  honored for parity with sibling api plugins.
- Boolean query params lowercased to "true" / "false" (NGSOC's
  Spring backend treats "True" as false) while JSON bodies keep
  native Python booleans.
- Binary downloads prefer the user's fileName query param (sanitized
  against ../ traversal and Windows-style paths) and fall back to a
  Content-Type-derived extension.
- action="test" routes to a no-arg connectivity probe per group
  (users / bigscreens / assets) for fast credential validation.
- /app-ai-alarm-judgment/judge SSE response is surfaced as a raw
  event_stream string instead of forcing JSON envelope parsing.

Tests: 67 pytest cases covering SSL precedence, token injection, URL
composition + REST placeholder validation, ActionSpec request building,
required-param validation, envelope unwrapping, binary persistence,
group dispatch (action="test", unknown action), boolean coercion,
body / query field separation, filename sanitization, and YAML
manifest loading for all 8 groups.

Made-with: Cursor
- Rename plugin dir sangfor_sip -> sangfor_sip_v92 (avoid `.` in dir
  name; service_id stays "sangfor_sip" so existing config & secrets
  continue to work unchanged).
- Add `version: "9.2"` and `defaults.product_version: "9.2"` to
  _provider.yaml, mirroring ngsoc convention.
- Fix login auth: `desc` field now sent as `auth_desc` (was empty
  string, breaking sha1 signature) and login URL appends
  `?verify=false` per 92-version spec.
- Correct endpoint names to lowercase per 92-version spec:
  riskBusiness->riskbusiness, secEvent->riskevent,
  weakPassword->weakpasswd, vulInfo->hole,
  plainTextInfo->plaintexttransmission.
- Add risk_terminal action (/data/riskterminal) and expose it via
  sangfor_sip_risk.yaml `action` parameter.
- Cap maxCount in _fetch_data: 10000 normal / 5000 vulnerability;
  align assets.yaml default & description accordingly.
- Fix terminal classfy1_id description: 2 -> 2,7,8.
- Add `_resolve_verify_ssl` helper that reads `verify_ssl` /
  `ssl_verify` / `custom_settings.verify_ssl` (matches the
  onesec/qingteng/ngtip pattern from PR #193). The bottom
  "SSL 验证" form toggle writes to `custom_settings.verify_ssl`,
  which the handler previously ignored — causing the toggle to
  have no effect at runtime.
- Drop the standalone `verify_ssl` credential_field from
  _provider.yaml so the WebUI no longer renders a duplicate
  text input next to the bottom SSL toggle.
- Refresh _provider.yaml notes to describe the new SSL toggle
  resolution precedence.
…ription)

Surface the optional `version` field declared in `_provider.yaml` (e.g.
`sangfor_sip_v92` → 9.2) to both the WebUI and the agent-facing tool
schema, so operators see which upstream API version a service binds to
and the model can pick version-appropriate parameters at call time.

Backend:
- Add reusable `extract_provider_version(provider_cfg)` helper that
  reads top-level `version`, falling back to `defaults.product_version`
  / `defaults.version`, and coerces to str (handles YAML floats).
- `ToolInfo` gains optional `provider_version`; `yaml_to_tool` fills it
  from the loaded `_provider.yaml`.
- `_load_provider_yaml_metadata` and `_build_api_service_summary` use
  the same helper so `GET /api/provider/{id}/metadata` and
  `GET /api/provider/api-services` both return `version`.
- `APIServiceSummary` adds `version: Optional[str]`.
- Session runner appends `\n\n[Provider: <name> | Version: <ver>]` to
  each tool's description before sending it to the LLM, gated on the
  tool actually carrying a `provider_version`. Annotation is built by
  the new module-level `_annotate_with_provider_version` helper to keep
  it pure and testable; original `ToolInfo` is never mutated.

Frontend:
- `APIServiceSummary` type gains `version?: string`.
- API service card renders a `v9.2` badge next to the existing API tag,
  with `^v` prefix stripped to avoid a `vv9.2` double-prefix when an
  operator writes `version: "v9.2"` in the YAML.
- Detail panel already supported `metadata?.version` via existing
  i18n key `serviceInfo.version`, no change needed there.

Tests:
- 3 new cases in `test_tool_plugin.py` cover top-level `version`,
  fallback to `defaults.product_version`, and the absent case.
- New `tests/session/test_runner_provider_version.py` (7 cases) pins
  the description-annotation contract: presence/absence of version,
  empty/None description, missing provider, no mutation, whitespace
  handling, and tools that don't declare the attr at all (builtin/MCP).

Made-with: Cursor
feat(provider): expose service version end-to-end (UI + LLM tool desc…
…ed names

Following the `sangfor_sip_v92` precedent, declare the deployed product
version on every supported third-party API plugin (top-level `version:`
plus `defaults.product_version:` for backward compatibility) and rename
the plugin directories to include a `_v<dot-replaced-version>` suffix so
the on-disk layout reflects the targeted release.

Versions captured:
- qingteng           -> qingteng_v3_4_1_66
- sangfor_xdr        -> sangfor_xdr_v2_2
- ngtip_api          -> ngtip_v5_1_5
- onesig             -> onesig_v2_5_3_D20260321
- onesec             -> onesec_v2_8_2
- tdp_api            -> tdp_v3_3_10
- skyeye_api         -> skyeye_v4_0_14_0_SP2

`service_id` (and the `provider:` references inside each tool YAML) is
intentionally left unchanged so existing `flocks.json` configs and
`{secret:*_api_key}` references keep working without migration.

Tests that hard-code the old plugin paths are updated accordingly.

Made-with: Cursor
- Updater: npm ci when package-lock exists; 300s timeouts with explicit
  TimeoutExpired handling; retry uv sync without default-index after mirror
  failure; surface restart/build failure exceptions in UI messages
- Server: filter noisy successful polling from request logs; log duration and
  errors; stream tail reads for large log files
- CLI: disable uvicorn access logs (app middleware owns request lines)
- WeCom: bridge SDK logger to Flocks and drop debug/info noise
- WebUI: increase focus-triggered update check min gap to 10 minutes
- Tests: cover log routes, request filters, updater retries and timeouts
Allow multiple versions of the same API product to coexist in
flocks.json under distinct `<service_id>_v<version>` storage keys,
so updating a plugin to a new version no longer overwrites the
previous version's credentials.

Core changes:
- Add `flocks/config/api_service_versioning.py` with derive/discover
  helpers, bidirectional legacy<->storage_key resolution, shadowing
  detection, and an idempotent copy-only migration that backs up
  flocks.json before its first write.
- Promote `info.provider` to the storage_key in `tool_loader` while
  preserving the unversioned `service_id` on the Tool instance for
  legacy lookups.
- Make `ConfigWriter.get_api_service_raw` version-aware: prefer the
  versioned shadow when an unversioned id is requested, fall back to
  the legacy id when a storage_key has no entry yet (covers
  partially-upgraded environments and isolated tests). Warn on
  asymmetric writes that target a shadowed legacy id.
- Run migration in the lifespan startup before `ToolRegistry.init` so
  freshly loaded tools observe the post-migration layout.
- Hide shadowed legacy entries from the API service listing endpoint
  to avoid duplicate rows in the WebUI after migration.
- Make the provider-route metadata loader resolve a `provider_id`
  against each candidate `_provider.yaml`'s derived storage_key,
  not just its directory name. Without this, plugins whose dir was
  renamed to a shorter form (e.g. `tdp_v3_3_10` for service_id
  `tdp_api`) returned no metadata to the WebUI.

Tests:
- New `tests/config/test_api_service_versioning.py` (41 cases) covers
  derivation, descriptor discovery, legacy resolution, shadowing,
  migration idempotency + backup, ConfigWriter fallback (incl. the
  `"api_services": null` defensive path), and the regression where
  the metadata loader must accept storage_keys whose directory name
  differs.
- Existing tool-plugin tests refreshed for the new `info.provider`
  values (`tdp_api_v3_3_10`, `qingteng_v3_4_1_66`, etc.) and the
  versioned plugin directory paths.

Made-with: Cursor
Add the local Hub catalog, backend install APIs, WebUI browsing experience, and validation coverage so bundled plugins can be discovered and installed globally.

Made-with: Cursor
Limit native tool discovery to direct payload files so provider metadata or nested files do not make a tool package appear installable by mistake.

Made-with: Cursor
…versioning

Centralize the storage-key resolution logic that used to be inlined in
ConfigWriter and route handlers. Domain rules now live in a single
module with a shorter, more intuitive name.

Module rename:
- flocks/config/api_service_versioning.py -> flocks/config/api_versioning.py
  (and the matching tests/config/test_*.py). Aligns length with sibling
  config.py / config_writer.py.

New helpers in api_versioning:
- resolve_api_service(service_id, services): three-step shadow / direct /
  legacy lookup; the only place this rule lives.
- warn_if_shadowing_legacy(service_id, services): structured warning when
  a write targets a legacy id whose versioned shadow already exists.

Slimmed call sites:
- ConfigWriter.get_api_service_raw shrinks from ~45 lines to ~12, just
  reads flocks.json then delegates to resolve_api_service. Falls back to
  a plain dict lookup if the import fails so a versioning bug cannot
  break credential reads.
- ConfigWriter.set_api_service uses setdefault and delegates the
  shadow-warning to the new helper. Log key renamed to
  api_service.write.shadowed_legacy.
- _load_provider_yaml_metadata in server/routes/provider.py now reuses
  discover_api_service_descriptors instead of reimplementing the plugin
  directory walk. Drops ~45 lines and removes the duplicate matching
  logic between provider.py and api_versioning.py.

Cleanups:
- Drop is_legacy_shadowed (one-line wrapper around shadowed_legacy_ids
  with no production callers); tests use the batch API directly.
- Tighten _extract_version's tail return.
- Drop the verbose null-handling commentary in get_api_service_raw; a
  single isinstance check covers null / non-dict garbage.

Net diff: +125 / -185 across 6 files, 41 versioning tests + 234 directly
affected tests still pass, no lint regressions.

Made-with: Cursor
duguwanglong and others added 25 commits May 18, 2026 11:55
- skill: block disabled skills from `delegate_task(load_skills=...)` so
  the toggle in the Skills UI can no longer be bypassed via subagent
  injection. Disabled skills are now reported as "not found" to avoid
  signalling their existence to the LLM.
- device: persist `enabled=false` for storage_keys whose last device
  instance was just removed. `sync_service_tool_state` now accepts a
  `deleted_storage_keys` hint (used by `route_delete_device`) and also
  sweeps the api_services config for entries whose backing devices no
  longer exist, so historical dirty state self-heals on next sync. The
  startup `_sync_all` additionally walks api_services to pick up
  service_ids that have zero remaining DB rows. Idempotent: writes are
  skipped when the config already matches.
- agent: invalidate the per-worker agent cache when another uvicorn
  worker toggles a skill. `Agent.state` now checks the mtime of
  `skill_settings.json` before serving cached prompts, so a PATCH from
  one worker is observed across all workers without IPC.

Co-authored-by: Cursor <cursoragent@cursor.com>
Each chat part was wrapped in its own `<div key>`, so the thinking
block's `mt-2 first:mt-0` always saw itself as the first child of its
wrapper and collapsed the top gap to zero. The result was that a tool
card followed by a thinking block (or vice versa) looked glued
together, while two consecutive thinkings or two consecutive tools
still had 8px between them — visibly uneven spacing.

Move the inter-part gap to the wrapper itself (`mt-2 first:mt-0`) and
strip the redundant `mt-2` from `ChatToolPart` (`<details>` + waiting-
for-answer branch) and the thinking block. With a single source of
truth at the wrapper level, every adjacent pair of parts now has a
uniform 8px gap and the first part still sits flush with the message
header.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ebar_and_add_shortcuts

Merge `origin/dev` into the PR 131 sidebar/shortcuts branch.  Nine files
had textual conflicts; their resolution preserves PR 131's disabled-skill
fixes and UI redesign while picking up dev's slash-command refactor and
skill page throttled refresh.

Conflict resolutions (9 files):
- flocks/command/handler.py: drop legacy 240-line if-chain; route every
  direct slash command through `run_direct_command` (dev refactor).
- flocks/command/direct.py (non-conflict, paired change): port PR 131's
  user-vs-agent visibility split to the new entry point.  `/skills`
  shows the full inventory with `[disabled]` markers when called from a
  user surface (CLI/TUI/WebUI) and stays "enabled-only" when invoked by
  the agent via the `slash_command` tool.
- flocks/session/runner.py: drop the obsolete `_build_system_prompts`
  method (now centralised in `SessionPrompt.build_system_prompts` on
  dev); migrate the PR 131 device-asset hint into a small
  `_build_device_asset_hint` helper, appended after the cached prompt
  list so live device state never pollutes the prompt cache.  Keep
  PR 131's per-turn `skill` tool description refresh in the schema
  builder so disabled skills cannot leak into the tool index.
- flocks/tool/system/skill.py: keep PR 131's enabled-only description
  refresh in the wrapper; merge dev's clarifying docstring.
- flocks/tool/system/slash_command.py: drop inlined `/skills`/`/workflows`
  handlers in favour of `run_direct_command` (dev refactor).
- webui/src/pages/Tool/index.tsx: use the narrower `refreshToolData()`
  on enabled-toggle (dev), avoiding redundant MCP refreshes.
- webui/src/pages/Session/index.tsx: restore dev's
  `readLastSelectedSessionId` effect so the existing
  `writeLastSelectedSessionId` writer has a matching reader.
- webui/src/pages/Skill/index.tsx: union of both sides — keep PR 131's
  pagination/source-filter/toggle state and lucide icons, adopt dev's
  throttled `refreshSkillsAndFetch` (already referenced downstream),
  and fix one stray `toast.error` → `showErrorToast` left over from the
  rename.
- webui/src/pages/Workflow/index.tsx: keep PR 131's toolbar-based
  refresh/create actions and `WorkflowSection` component; drop dev's
  undefined `isUserManaged` helper in favour of the still-defined
  `isBuiltin`.
- webui/src/components/common/SessionChat.tsx: keep PR 131's redesigned
  tool card (status pill + `ChevronDown` + dark code block) and absorb
  dev's `truncateToolDisplayText`/`buildToolInputSummary` helpers plus
  hover-title fallback for long inputs.

Hidden semantic conflicts (silently auto-merged but failing tests):
- webui/src/components/common/SessionChat.test.ts: update className
  assertions — PR 131 moved `max-w-2xl` from the inner bubble to the
  outer `max-w-[50%]` container, so the inner bubble now only owns
  `w-auto`/`w-full`.
- webui/src/pages/Workflow/index.tsx (a11y): wrap each `WorkflowSection`
  in `<section aria-label={title}>` so the dev-side region-role tests
  keep passing without rolling back the redesign.

Pre-existing PR 131 debt also tidied up in this commit:
- webui/src/pages/Tool/ToolDetailDrawer.test.tsx: add the missing
  `listFixtures` stub (the new fixtures effect was added to
  `Tool/index.tsx` without a matching mock).
- webui/src/pages/Skill/SkillSheet.test.tsx: change the Edit-mode
  fixture to `source: 'user'` so the editable code path is exercised.
  `source: 'project'` skills remain read-only on purpose to prevent
  the UI from overwriting repo-tracked files; the read-only branch is
  still covered by `should show name field in edit mode`.

Verification: 192/192 webui vitests pass; `tsc --noEmit` clean;
`py_compile` clean on all five touched Python modules.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Skill list: make the entire name+description+icon block clickable to
  open SkillSheet, matching the Hub catalog row pattern
- Tool list (all tabs): make name+description blocks clickable links;
  use group/name scoped hover colour per tab accent (slate/blue/purple)
- MCP / API / Local tabs: replace fixed `minmax(0,1fr)` name column with
  proportional `minmax(min, Xfr)` grid so excess width is shared across
  all columns instead of collecting as a single gap after the name;
  extract shared constant to `components/gridLayout.ts` so all three
  tabs stay in sync
- ToolTable (all-tools tab): same proportional grid treatment; name capped
  at 420 px, font reduced one step (text-xs) to suit the denser row
- SessionChat user bubble footer: wrap bubble+footer in a `flex flex-col`
  container so the footer stretches to the bubble's intrinsic width,
  keeping the timestamp on the bubble's left edge and action buttons on
  the right edge regardless of message length

Co-authored-by: Cursor <cursoragent@cursor.com>
…rtcuts

feat(webui): 侧边栏布局、版本信息与快捷键优化
* feat(provider): add interleaved reasoning replay across providers

Support provider-specific reasoning field replay (interleaved/thinking)
in Anthropic and OpenAI SDK paths, with catalog metadata and runner
integration for multi-turn reasoning preservation.

* fix(agent): harden agent.yaml loading against invalid configs

Validate YAML mapping shape and wrap AgentInfo construction in try/except
so malformed model configs are skipped without breaking agent scans.

* feat(session): halt cross-step tool loops and cap default step budget

Add runner-level guards for repeated exact tool calls and long same-tool
streaks, apply a default max tool step limit when agents omit steps, and
resolve message replay against the session's current model pin.

* chore(session): raise default max tool steps to 1000

Agents without an explicit steps limit now get a 1000-step budget
instead of 100 for longer coding and research tasks.
* fix(workflow): compact large outputs and trim execution history

Persist compacted workflow outputs in storage and JSONL audit logs to
avoid bloating SQLite rows. Cap per-workflow execution history at 30
and delete matching JSONL files when pruning old records.

* fix(workflow): compact step inputs and tool result metadata

Extend history compaction to step inputs via compact_step_for_storage,
and return compacted outputs/history in run_workflow ToolResult metadata
so agent context stays bounded alongside SQLite/JSONL storage.
* Classify Python plugin tools as native by source path.

Discover python tool file origins and mark project-scoped tools native while
user ~/.flocks plugins remain non-native; update tool-builder smoke-test docs.

* Rename skill tool to skill_load and refine plugin metadata.

Split on-demand skill loading from flocks_skills management, reconcile
python plugin source/native flags from disk, move tool-builder validator
under scripts/, and update agents, compaction, and tests accordingly.
…ecycle (#298)

Extract shared ToolContext builder, support local/docker publish drivers with
health-aware service status, and make workflow/bash cancellation more reliable.
)

Unify interleaved capability inference across OpenAI-compatible and Anthropic
models, add reasoning transport resolution, remove unused Bedrock SDK, and
improve thinking-block replay in runner and provider options.
…der singleton

The workflow `llm.ask` path shares the process-wide `Provider._providers`
registry with the session/agent runner. Under concurrent load each
workflow LLM call could destabilize an in-flight session call:

* `_prepare_provider` unconditionally rewrote `provider._config` and
  forced `provider._client = None`, dropping the session's live
  httpx connection pool and silently flipping `custom_settings`
  (notably `trust_env`) that the session had set.
* `Provider._ensure_initialized` flipped `_initialized = True`
  before the registry was populated, so a concurrent caller could
  take the fast path and observe `Provider.get(...) is None` for
  built-in providers.
* `Provider.apply_config` is called on every session step and on
  every workflow `llm.ask`; it unconditionally re-`configure`d the
  provider and rebuilt `provider._config_models`, racing readers
  on the mutable list across event loops.
* The same `httpx.AsyncClient` could be inherited across event loops
  (session: uvicorn main loop, workflow: `flocks-workflow-llm-loop`),
  triggering "got Future attached to a different loop" or silent
  hangs.

Changes
-------

`flocks/workflow/llm.py`:
* Serialize `_prepare_provider` per-provider via a `threading.Lock`
  keyed by `provider_id`.
* Make reconfigure idempotent: skip `provider.configure(...)` and the
  client reset when the desired `ProviderConfig` (api_key / base_url
  / custom_settings) already matches what the provider holds.
* Only override `custom_settings['trust_env']` when the user
  explicitly set `workflow.llm.trust_env` in flocks config.
* Track the owning event loop of `provider._client` in a
  `WeakKeyDictionary` and force a client reset when the workflow
  loop id differs from the marker, even if the config is unchanged.

`flocks/provider/provider.py`:
* `_ensure_initialized` uses double-checked locking with a
  `threading.Lock` and flips `_initialized = True` only after
  `_load_dynamic_providers()` returns, so concurrent callers
  never observe a partially populated registry.
* `apply_config` compares the desired `ProviderConfig` against the
  existing `provider._config` and skips `provider.configure(...)`
  when unchanged. The `_config_models` rebuild is gated by a
  signature-based equality check and assigned atomically.

Tests
-----

13 new tests, all passing:

* `tests/workflow/test_llm_provider_isolation.py`: idempotency,
  trust_env inheritance vs override, api_key / base_url change
  triggering reconfigure, cross-loop client reset, same-loop
  client reuse, per-provider lock identity.
* `tests/provider/test_provider_lazy_init_thread_safe.py`: races
  20 threads through `_ensure_initialized` behind a `Barrier` and
  asserts every observer sees a fully populated registry.
* `tests/provider/test_provider_apply_config_idempotent.py`:
  no-op path on unchanged config; mutation path on api_key change.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep SkyEye alarm filtering canonical on `hazard_level` while accepting the legacy `threat_level` input to avoid schema precheck failures. Refresh the affected tests and README Docker mirror examples so the branch reflects the current device plugin layout and usage guidance.
…dless (#301)

* feat(browser,web2cli): managed tab lifecycle and multi-operation CLI

Track agent-created tabs in the daemon, expose open_or_attach_tab and
managed_tabs helpers, and refuse closing unmanaged tabs by default.
Extend web2cli spec/CLI generators for multi-operation captures with
subcommands, and restart stale daemons when the IPC protocol is outdated.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(browser-use): add cdp-headless mode and BU_CDP_URL support

Document headless CDP workflow in browser-use skill, treat BU_CDP_WS and
BU_CDP_URL as explicit remote endpoints in setup/doctor, and improve
handshake errors for dedicated headless Chromium instances.

Co-authored-by: Cursor <cursoragent@cursor.com>

* docs(browser-use): clarify headless port and process lifecycle

Require dedicated remote-debugging ports, background browser startup so
the process outlives the shell, and explicit cleanup rules for task-owned
headless Chromium instances.

* fix(browser): reconnect setup for explicit CDP endpoints

Restart the daemon when setup runs with BU_CDP_WS/BU_CDP_URL while an
old daemon is still alive, document PowerShell -c quoting guidance, and
remove the background agent-browser npm auto-update from installation.

Co-authored-by: Cursor <cursoragent@cursor.com>
…olation

fix/workflow llm provider isolation
* fix(session): preserve streamed reasoning content for replay

Accumulate streamed reasoning metadata so provider-facing reasoning_content keeps the full replay payload instead of the last chunk only.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(session): remove same-tool streak loop guard

Keep the loop guard focused on repeated identical tool calls so sessions can continue when the same tool is reused with different arguments.
Move browser daemon socket/port/pid/log paths from system temp to a
stable per-user directory, update related skills/scripts, and fix
tool hook workspace context in stream_processor.
… derivation

Introduce a first-class "device" plugin type in the Hub marketplace,
classified by `integration_type: device` in `_provider.yaml` and
installed under `~/.flocks/plugins/tools/device/`. Device plugins now
surface in the marketplace listing, get recognized during device setup,
and are uninstalled with the correct type.

Also fixes two regressions surfaced by versioned device plugins whose
own `service_id` already contains a `_v...` token (e.g. `onesig_api`
for both v2.5.3 D20260321 and D20250710):

* `storage_key_to_service_id` previously stripped trailing version
  suffixes with a greedy regex, collapsing e.g.
  `onesig_v2_5_3_D20250710_api_v2_5_3_D20250710` to `onesig`. It now
  prefers the exact `ApiServiceDescriptor` cache mapping and falls
  back to a non-greedy regex that removes only the last suffix.
* `row_to_device` recomputes `service_id` on read so historical rows
  with a corrupted column self-heal in the response.
* `device_startup` adds a one-shot migration that rewrites stale
  `device_integrations.service_id` rows.
* Device CRUD routes now derive `service_id` from the row's
  `storage_key` instead of trusting the stored column, keeping
  `sync_service_tool_state` aligned with the descriptor-aware logic.
* Frontend tool filter in the device detail panel now matches on the
  exact versioned `storage_key`, so two versions of the same product
  no longer cross-contaminate the displayed tool list.

Bundled `_provider.yaml` files for `onesig_v2_5_3_D20250710`,
`sangfor_af_v8_0_48` and `sangfor_af_v8_0_85` are tagged with
`integration_type: device`; their legacy `tools/api/` copies are
removed in favour of the canonical `tools/device/` layout. Tests cover
both the new plugin-type discovery path and the service_id derivation
fix.

Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(provider,webui): enable vision for Qwen/Kimi and refresh SessionChat UI

Mark qwen3.6-plus and kimi-k2.6 as vision-capable in the provider catalog and
align WebUI vision gating with those models. Refresh SessionChat bubble layout,
preserve partial streamed text on abort, and add regression tests.

* fix(webui): polish chat scroll, thinking indicator, and sidebar padding

Use stable scrollbar gutter in SessionChat, replace thinking dots with a
Brain icon, and align sidebar logo padding with the chat content column.
…ook-io-llm

Add deepseek-v4-flash model to both ThreatBook LLM providers (CN and IO)
with 200K context window, 128K max output, and CNY ¥1/¥2 per million tokens
pricing (input/output). Update catalog tests to assert the new model's
limits and pricing.

Co-authored-by: Cursor <cursoragent@cursor.com>
feat(catalog): add deepseek-v4-flash to threatbook-cn-llm and threatb…
feat(hub,device): add device plugin type and fix versioned service_id…
#309)

* fix(browser): improve setup flow for stale daemon and remote debugging

Restart stale local daemons during --setup, only prompt inspect on 403
handshake failures, and document the attach/reload workflow in browser-use.

* fix(webui): preserve aborted assistant output and simplify thinking indicator

Mark in-flight assistant messages as stopped on abort or session idle,
freeze running tools, and keep partial streamed text across refetch.
#313)

The fetchData useCallback no longer needs react-hooks/exhaustive-deps suppression.
@xiami762 xiami762 merged commit bfaa415 into main May 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants