Skip to content

[MODEL] Near-miss: Claude nearly confirmed "Delete Server" instead of "Delete Snapshot" in autonomous browser agent #50737

@pffs1802

Description

@pffs1802

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

Task

Delete the "pre-rescale-cx33-2026-04-19" snapshot from a Hetzner Cloud CX23 server ("fabiconnect", ID 122271345) after a failed CX23→CX33 rescale attempt.

Setup

  • Agent: Claude Code CLI on Windows 11, operating as autonomous browser agent
  • Tool: chrome-devtools MCP connector via Chrome DevTools Protocol (CDP), Port 9222
  • Target UI: Hetzner Cloud Console (console.hetzner.cloud)
  • Mode: Autonomous — operator (Fabian) observes but does not manually intervene in browser actions
  • Orchestration: Claude-to-Claude (Controller-Claude in web chat directs Agent-Claude in CLI)

Context

The snapshot was created as rollback insurance before attempting a server rescale. CX33 was unavailable at the Nuremberg datacenter. The server was restarted successfully, all 14+ services verified running, and the snapshot was to be deleted to avoid ongoing storage costs (0.73 EUR/Mo).

What Claude Actually Did

Step-by-step chronology

  1. Navigation (3 tool calls): Claude navigated to the server detail page and located the snapshot section. Correct so far.

  2. Wrong click (4 tool calls): Claude announced "Server is ON, snapshot exists. Deleting the snapshot now." — then opened a dropdown and clicked the wrong "Löschen" (Delete) link. The announcement was factually incorrect: the dialog that opened was "Delete Server", not "Delete Snapshot".

  3. Self-correction (2 tool calls): Claude read the DOM snapshot and recognized the dialog title said "Server löschen" (Delete Server) instead of "Snapshot löschen" (Delete Snapshot). Output: "STOP — this is the SERVER delete dialog, not the snapshot! Aborting immediately." Clicked "Abbrechen" (Cancel).

  4. Recovery (3 tool calls): Claude used JavaScript evaluate_script to inspect the DOM and positively identify the correct dropdown by contextual features (the dropdown containing a "Rebuild" option belongs to a snapshot, not a server).

  5. Correct deletion (3 tool calls + 1 file read): Found the correct "Delete Snapshot" dialog, entered the snapshot name, confirmed deletion.

  6. Verification (post-hoc): Checked three independent signals — server status indicator (green/ON), empty snapshot list ("No snapshots exist"), and deletion confirmation message.

Key failure point

Between clicking the wrong "Löschen" link (step 2) and recognizing the error (step 3), 4 chrome-devtools tool calls were made. Claude committed to an action ("Deleting the snapshot now") before verifying what dialog had actually opened. This is a commit-before-verify anti-pattern.

Expected Behavior

Expected: Pre-interaction dialog verification

Before interacting with any confirmation dialog during a destructive action, Claude should:

  1. Click the action trigger (e.g., "Delete" button)
  2. Take a DOM snapshot as a dedicated verification step
  3. Extract and log the dialog title verbatim
  4. Compare the dialog title to the intended action type
  5. Only if the title matches the intended action: proceed with confirmation
  6. If mismatch: abort immediately and report to operator

This adds one tool call of latency but eliminates the commit-before-verify failure mode entirely.

Additionally expected: Abort-and-ask for ambiguous UI

When multiple UI elements of the same type exist for different resource scopes (e.g., "Delete" for server vs. "Delete" for snapshot), the agent should stop and ask the operator rather than proceeding with best-guess selection. The cost of a false pause is far lower than the cost of deleting the wrong resource.

Files Affected

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Haven't tried to reproduce

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation

Impact

High - Significant unwanted changes

Claude Code Version

2.1.114

Platform

Anthropic API

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions