Skip to content

fix: SSH-tunneled connections failing to reconnect after idle/sleep#739

Merged
datlechin merged 2 commits intomainfrom
fix/ssh-tunnel-reconnect-after-idle
Apr 14, 2026
Merged

fix: SSH-tunneled connections failing to reconnect after idle/sleep#739
datlechin merged 2 commits intomainfrom
fix/ssh-tunnel-reconnect-after-idle

Conversation

@datlechin
Copy link
Copy Markdown
Collaborator

Summary

Fixes #736 — SSH-tunneled connections through AWS EC2 show password/authentication errors after being idle for days.

Root causes identified and fixed:

  • Health monitor never rebuilt SSH tunnelsreconnectDriver reused a stale effectiveConnection pointing at 127.0.0.1:<dead_port>. Now calls buildEffectiveConnection for SSH sessions to create a fresh tunnel.
  • No OS-level TCP keepalive — SSH sockets had no SO_KEEPALIVE, so AWS NAT gateway idle timeouts (350s) silently dropped connections. Added SO_KEEPALIVE + TCP_KEEPALIVE (60s idle).
  • No sleep/wake handling — Added NSWorkspace.didWakeNotification observer to proactively validate SSH tunnels on wake instead of waiting for the 30s health monitor ping.
  • App Nap throttled keepalives — Added ProcessInfo.beginActivity(.userInitiatedAllowingIdleSystemSleep) while tunnels are active.
  • Nil SSH password silently coerced to empty string — Keychain lookup failures produced a generic "authentication failed" error. Now throws a descriptive error and extracts libssh2_session_last_error detail in PasswordAuthenticator and KeyboardInteractiveAuthenticator.
  • 5-retry cap too low — Increased to 10 retries with 120s max backoff (~10 min total window).
  • No reentrancy guard — Both keepalive death and wake handler could fire handleSSHTunnelDied concurrently. Added recoveringConnectionIds: Set<UUID> guard.
  • Tunnel leak on driver connect failure — If buildEffectiveConnection succeeded but driver.connect() failed, the tunnel was orphaned. Added cleanup.

Changes

File Change
DatabaseManager+Health.swift reconnectDriver rebuilds SSH tunnel; returns ReconnectResult with driver + effectiveConnection; cleans up tunnel on driver failure
DatabaseManager+SSH.swift Reentrancy guard via recoveringConnectionIds; 10 retries / 120s max; disconnect live driver not stale snapshot
DatabaseManager.swift Added recoveringConnectionIds: Set<UUID>
DatabaseManager+SystemEvents.swift NewdidWakeNotification → validates SSH tunnels on wake
AppDelegate.swift Wires up startObservingSystemEvents()
SSHTunnelManager.swift App Nap prevention via ProcessInfo.beginActivity
LibSSH2TunnelFactory.swift SO_KEEPALIVE + TCP_KEEPALIVE on SSH sockets; nil password guard
PasswordAuthenticator.swift Extracts libssh2 error detail on failure
KeyboardInteractiveAuthenticator.swift Extracts libssh2 error detail on failure
CHANGELOG.md Added entry under [Unreleased]

Test plan

  • Connect to a remote database via SSH tunnel (password auth) — verify connection works
  • Leave the connection idle for several minutes, then interact — verify auto-reconnect succeeds
  • Put Mac to sleep with an active SSH tunnel, wake up — verify tunnel recovers automatically
  • Test with an incorrect SSH password in Keychain — verify descriptive error message appears
  • Test with two SSH-tunneled connections, kill one tunnel — verify the other is unaffected
  • Verify local (non-SSH) connections are unaffected by all changes

…736)

The health monitor's reconnectDriver now rebuilds the SSH tunnel instead
of reusing a stale localhost port. Added OS-level TCP keepalive on SSH
sockets, wake-from-sleep tunnel validation, App Nap prevention, nil
password guard with descriptive errors, reentrancy guard for concurrent
recovery, and increased retry budget from 5 to 10 attempts.
Log libssh2 detail at error level but throw .authenticationFailed
instead of .tunnelCreationFailed to avoid misleading "SSH tunnel
creation failed" prefix in user-facing messages. Replace dead ?? ""
fallbacks with guard let after nil password check.
@datlechin datlechin merged commit 1cc688d into main Apr 14, 2026
2 checks passed
@datlechin datlechin deleted the fix/ssh-tunnel-reconnect-after-idle branch April 14, 2026 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: SSH-tunneled staging/production connections show password error after being idle for a few days

1 participant