fix(ssh): auto-repair stale pub that does not pair with local priv#3395
fix(ssh): auto-repair stale pub that does not pair with local priv#3395la14-1 merged 2 commits intoOpenRouterTeam:mainfrom
Conversation
…ders When a local SSH .pub file doesn't actually pair with the corresponding .priv (e.g. .pub copied from another machine, regenerated mid-flow, or edited by hand), spawn would still register the .pub with the cloud provider's key store. The registration check passes by fingerprint, the droplet boots with that key in authorized_keys, and SSH then fails with "Permission denied (publickey)" because the local .priv can't prove ownership of the registered .pub. This produced the silent failure mode where users saw "SSH key 'id_ed25519' already registered with DigitalOcean" immediately followed by 33 "Permission denied" retries. Adds verifyKeyPair() which derives the public key from the private key via `ssh-keygen -y -P "" -f priv` and compares it (key type + base64, ignoring the comment field) to the .pub file. discoverSshKeys() now filters out mismatched pairs with a clear warning naming the offending file, and silently skips passphrase-protected or otherwise unverifiable keys (BatchMode SSH can't use them anyway). Bumps CLI to 1.0.37. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Confirmed in the wild: this PR fixes the exact failure reported in Slack (thread) where a user hit Linked from Slack by SPA |
|
Follow-up filed: #3396 to extend this from diagnose → auto-repair (rewrite the stale The Slack user who prompted this (thread) wouldn't be unblocked by diagnosis alone — they'd still have to run Linked from Slack by SPA |
When the local .pub doesn't derive from the matching .priv (stale copy from another machine, etc.), the priv is still authoritative — any .pub that doesn't derive from it is wrong by definition. Previously spawn printed a warning and skipped the pair; now it backs up the stale .pub as .pub.spawn-backup-<timestamp> and rewrites the .pub from the derived key. The next launch uses the correct pub end-to-end, so the droplet boots with a public key that actually pairs with the local priv and SSH handshake succeeds instead of failing 33 times with "Permission denied (publickey)". Passphrase-protected keys (ssh-keygen -y cannot derive without the passphrase) are still skipped silently — nothing to repair with. Bumps CLI to 1.0.38. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed commit
Ready for re-review. Updated from Slack by SPA |
la14-1
left a comment
There was a problem hiding this comment.
LGTM — fixes the Slack-reported hermes launch failure end-to-end. Verify + auto-repair logic is tasteful (priv is authoritative, stale pub backed up with timestamp, passphrase keys silently skipped). Tests 29/29 and biome clean.
Summary
Fixes the silent-failure mode reported in Slack where
spawnregistered a local.pubwith DigitalOcean, the droplet booted with that key inauthorized_keys, and SSH then failed withPermission denied (publickey)33 times because the local.privdidn't actually pair with the registered.pub.Adds two exported helpers in
shared/ssh-keys.ts:verifyKeyPair(priv, pub)— derives the pub from the priv viassh-keygen -y -P "" -f <priv>and compares key-type + base64 (ignoring comment). Returns"match" | "mismatch" | "unverifiable".repairPubFromPriv(priv, pub)— on mismatch, backs up the stale.pubto<pub>.spawn-backup-<timestamp>and rewrites.pubfrom the derived key. The.privis authoritative — any.pubthat doesn't derive from it is wrong by definition, so the rewrite is safe.discoverSshKeys()now runs verify-then-repair on every pair. Passphrase-protected / otherwise unverifiable keys are skipped silently —BatchModeSSH can't use them anyway without an active ssh-agent.Bumps CLI to
1.0.38.Before / After (for the Slack user)
Before:
After:
The orphan stale pub that was previously registered with DigitalOcean stays on the account but is unused. The user can delete it from the DO dashboard if they want.
How the original failure happens
~/.ssh/id_ed25519(priv A) and~/.ssh/id_ed25519.pub(pub B from a different machine, e.g. copied without the matching priv).ensureSshKey()fingerprints.pub(B), finds it on DO, logs "already registered."createServer()attaches all account keys to the droplet, including B.ssh -i id_ed25519 root@droplet→ priv A presents pub A to the server, server only knows B → publickey denied.With this PR, step 1 detects the mismatch, rewrites the local
.pubfrom priv A (now correct pub A is on disk), and registration proceeds with the correct pub.Test plan
bun test src/__tests__/ssh-keys.test.ts src/__tests__/ssh-keys-cov.test.ts— 29/29 passbunx @biomejs/biome check src/— clean (0 errors across 202 files).pubwith derived contents and preserves stale contents in a backupverifyKeyPairstill returns"match" | "mismatch" | "unverifiable"(no signature change)Closes #3396.