Skip to content

feat(agenthub): refine digital assistant preview support#426

Merged
fslongjin merged 1 commit into
TencentCloud:masterfrom
tinklone:feature/agenthub-preview-polish
Jun 2, 2026
Merged

feat(agenthub): refine digital assistant preview support#426
fslongjin merged 1 commit into
TencentCloud:masterfrom
tinklone:feature/agenthub-preview-polish

Conversation

@tinklone
Copy link
Copy Markdown
Collaborator

@tinklone tinklone commented Jun 2, 2026

Summary

Refine the AgentHub digital assistant preview/management experience:

CubeAPI

  • Add snapshot and rollback API routes for AgentHub instances
  • Extend handler logic for instance preview lifecycle

WebUI

  • Add preview button and sandbox-open logic in AgentHub page
  • Add i18n translations (EN/ZH) for new preview UI strings
  • Fix template store to use correct default template

Deploy

  • Add SQL migration file (003_agenthub_instances.sql) for operator manual repair scenarios
  • Update env.example with snapshot config note

Docs

  • Add preview documentation in EN/ZH digital assistant guides

Signed-off-by: maxlong maxlong@tencent.com

@cubesandboxbot
Copy link
Copy Markdown

cubesandboxbot Bot commented Jun 2, 2026

Claude finished @tinklone's task in 51s —— View job


Code Review: feat(agenthub) — refine digital assistant preview support

I've reviewed the 11 changed files across CubeAPI, WebUI, Deploy, and Docs. Overall the diff is well-structured and the changes are clean. Below are my findings, organized by severity.


🔴 Medium: read_agenthub_instance returns AppError on missing agent — unhelpful 500

File: CubeAPI/src/handlers/agenthub.rs:462

let record = read_agenthub_instance(&state, &agent_id).await?;

If agent_id doesn't exist, this bubbles up as a 500 Internal Server Error. For a health-check endpoint that the frontend polls every 3 seconds from the card UI, receiving a 500 for a missing/deleted agent is unergonomic. Consider mapping the read_agenthub_instance error to a 404 for this endpoint, so the frontend can treat "not found" as "not ready" rather than an unexpected failure.


🟡 Minor: Gateway health check is a shallow connectivity test

File: CubeAPI/src/handlers/agenthub.rs:471-479

let ready = match state.http_client.get(url).timeout(Duration::from_secs(3)).send().await {
    Ok(resp) => resp.status().is_success(),
    Err(_) => false,
};

The health check only verifies the proxy returns a 2xx for the sandbox route. This tells you the proxy can reach the sandbox, but not that the OpenClaw gateway itself is actually serving requests. If the gateway process inside the sandbox is crashed or not listening, the proxy may still return 502/503 which would correctly set ready: false, but a 200 from a different upstream handler on the same port could give a false positive. Consider checking for a specific gateway health/status path or a timeout-based TCP connect as a secondary signal.

Also worth noting: OPENCLAW_UI_PORT (18789) is hardcoded in the health URL. If the port is ever configurable per-instance, this check would need updating.


🟡 Minor: Hardcoded "wecom" in bots_available computation

File: `CubeAPI/src/handlers/agenthub.rs:948-952**

let bots_available = ["wecom"]
    .into_iter()
    .filter(|b| !record.bots.iter().any(|v| v == b))
    .map(ToString::to_string)
    .collect();

The only available bot channel is hardcoded to "wecom". If more bot channels are added in the future, this list needs to be updated here and in the filter logic. Consider driving this from configuration or a constant array at the module level to make extension easier.


🟡 Minor: Gateway health polling stops once ready, never re-checks

File: web/src/pages/AgentHub.tsx:647-677

useEffect(() => {
    // ...
    const check = async () => {
      const health = await agentHubApi.getGatewayHealth(agent.id);
      setGatewayReady(health.ready);
      if (!health.ready) {
        timer = window.setTimeout(check, 3000);
      }
    };
    setGatewayReady(false);
    void check();
    // ...
}, [agent.id, agent.sandboxId, isRunning]);

Once gatewayReady is true, the polling stops permanently. If the gateway becomes unreachable after being ready (e.g., the sandbox is paused/resumed), the user will see a stale "ready" state. The "Gateway Manage" button will be enabled but clicking it would open a broken gateway URL.

Two approaches:

  • Re-check when the "Gateway Manage" button is clicked (lazy check)
  • Periodically re-poll even when ready (e.g., every 30s) to detect degraded state

🟢 Suggestion: previewNoticeDismissed state resets on re-render — no persistence

File: `web/src/pages/AgentHub.tsx:99**

const [previewNoticeDismissed, setPreviewNoticeDismissed] = useState(false);

The dismissed state is in-memory only. Navigate away and back, and the notice reappears. If the intent is to show it once per session this is fine — but if a permanent dismiss is desired, consider localStorage.


✅ What's done well

  1. SQL migration (003_agenthub_instances.sql): Clean schema with named lock acquisition for safe concurrent migration. Good use of UNIQUE KEY, composite indexes, and deleted_at for soft-delete support.

  2. Template store port fix (web/src/data/templateStore.ts:71): Adding 8080 to expose_ports is a correct fix — the login env UI was inaccessible without it.

  3. Cleanup in clone_agent_instance (CubeAPI/src/handlers/agenthub.rs:962): Changing bots_available from an empty vec to computed values is a meaningful improvement for the clone flow UX.

  4. Gateway URL hash construction (web/src/pages/AgentHub.tsx:79-84): Properly handles WebSocket URL generation with protocol detection (ws/wss) and token forwarding. The gatewayTokenFromUrl extraction helper is cleanly separated.

  5. i18n coverage: Both EN and ZH locales have the new preview strings, and the docs are updated in both languages too — no localization gap.

  6. Disk space documentation (docs/guide/digital-assistant.md): The disk space sizing guide (3GB per template, 2-3GB per snapshot) is useful operational documentation for anyone deploying this.


Summary

Severity Count
🔴 High 1
🟡 Medium 3
🟢 Suggestion 1

The changes are well-structured and follow existing code conventions. The main concerns are around the health check endpoint's error handling (500 vs 404) and the polling logic not re-checking gateway readiness once established. None are blockers, but the medium-severity items would improve robustness in production use.

- Add snapshot/rollback API routes for AgentHub instances
- Add preview button and sandbox open logic in WebUI
- Fix template store to use correct default
- Update env.example with snapshot config note
- Add preview documentation in EN/ZH guides

Signed-off-by: maxlong <maxlong@tencent.com>
@tinklone tinklone force-pushed the feature/agenthub-preview-polish branch from edcdf80 to ef929cf Compare June 2, 2026 07:47
@cubesandboxbot
Copy link
Copy Markdown

cubesandboxbot Bot commented Jun 2, 2026

Claude finished @tinklone's task in 1m 55s —— View job


Review: feat(agenthub) — refine digital assistant preview support

I've reviewed the 10 changed files across CubeAPI, WebUI, Deploy, and Docs. The changes are generally well-structured and the feature integration is clean. Below are my findings.


🔴 Fix: get_agent_gateway_health returns 400 on missing agent instead of graceful "not ready"

File: CubeAPI/src/handlers/agenthub.rs:462

let record = read_agenthub_instance(&state, &agent_id).await?;

read_agenthub_instance returns AppError::BadRequest (HTTP 400) when the agent ID doesn't exist (line 1533). For a health-check endpoint that the frontend polls from the card UI, a missing agent is not a client error — it's expected state (e.g., the agent was just deleted or hasn't been created yet). Consider:

  • Mapping the error to a 404, OR
  • Gracefully treating missing agents as {ready: false} instead of propagating the error

Fix this →


🟡 Medium: Gateway health polling never re-checks once ready

File: web/src/pages/AgentHub.tsx:647-677

if (!health.ready) {
  timer = window.setTimeout(check, 3000);
}

Once gatewayReady flips to true, polling stops permanently. If the sandbox is paused/resumed or the gateway process crashes after being healthy, the "Gateway Manage" button stays enabled but would open a broken URL. Consider:

  • Re-checking periodically even when ready (e.g., every 30s)
  • Lazy-checking on button click before opening

Fix this →


🟡 Minor: Gateway health check only verifies proxy connectivity, not gateway readiness

File: CubeAPI/src/handlers/agenthub.rs:471-479

let ready = match state.http_client.get(url).timeout(Duration::from_secs(3)).send().await {
    Ok(resp) => resp.status().is_success(),
    Err(_) => false,
};

This checks the proxy can reach the sandbox on port 18789 (OpenClaw UI port). A 2xx only confirms something is listening on that port through the proxy — it doesn't verify the gateway process itself is initialized and serving requests. A 200 from a different upstream handler on the same port (e.g., a static file server) would give a false positive.

Hardcoded port: OPENCLAW_UI_PORT (18789) in the health URL path. If this port becomes configurable per-instance, this endpoint needs updating.


🟢 Suggestion: previewNoticeDismissed is session-only

File: web/src/pages/AgentHub.tsx:99

const [previewNoticeDismissed, setPreviewNoticeDismissed] = useState(false);

In-memory state resets on page navigation. If the intent is to show once per session this is fine. If a permanent dismiss is desired, persist to localStorage.


✅ What's done well

  1. Gateway URL hash construction (web/src/pages/AgentHub.tsx:79-84): Clean WebSocket URL generation with proper protocol detection (ws:/wss:) and token forwarding via hash params. The gatewayTokenFromUrl extraction helper is well-factored.

  2. Template store port fix (web/src/data/templateStore.ts:71): Adding 8080 to expose_ports correctly fixes the inaccessible login env UI.

  3. Cleanup in clone_agent_instance (CubeAPI/src/handlers/agenthub.rs:962): Computing bots_available from actual state (excluding already-configured bots) is a meaningful UX improvement over the previous empty vec.

  4. i18n coverage: Both EN and ZH locales have the new preview strings, and docs are updated in both languages — no localization gap.

  5. Cancellation pattern (web/src/pages/AgentHub.tsx:653,674): The cancelled flag with proper cleanup in the useEffect return is good async hygiene for the polling effect.

  6. Disk space documentation (docs/guide/digital-assistant.md): The sizing guide (3GB per template, 2-3GB per snapshot) is practical operational documentation for deployers.


Outstanding from previous review

The previous review noted hardcoded "wecom" in bots_available computation (CubeAPI/src/handlers/agenthub.rs:948). If more bot channels are added, this list will need updating in multiple places. Consider driving it from a module-level constant or configuration in a follow-up.


Summary

Severity Count
🔴 Fix 1
🟡 Medium 2
🟢 Suggestion 1

The changes are well-structured and follow existing conventions. The main concerns are: (1) the health check endpoint returning an error instead of gracefully handling missing agents, and (2) the polling never re-checking gateway readiness once established. Neither is a blocker, but both would improve production robustness.
| Branch: feature/agenthub-preview-polish

@fslongjin fslongjin merged commit c4b9e43 into TencentCloud:master Jun 2, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants