Summary
Long-lived SSH tunnels disconnect every ~60 minutes when using temporary scoped credentials. The Session Manager plugin successfully establishes an initial StartSession, but around the one-hour mark, it attempts to reconnect/resume the data channel and fails with ResumeSession returning HTTP 403.
This is reproducible with newer session-manager-plugin versions, but not with 1.2.553.0.
Environment
- OS: macOS darwin/arm64
- AWS CLI: 2.15.30
- session-manager-plugin versions:
- 1.2.553.0 = stable ✓
- 1.2.792.0 = fails ✗
- 1.2.804.0 = fails ✗
- Region: us-east-2
- Session Type: SSH tunneling / local port forwarding to RDS through EC2 over Session Manager
Impact
- Long-lived SSH tunnels disconnect every ~60 minutes
- RDS client sessions drop mid-work
- Keepalive settings on SSH do not prevent the disconnect
Exact Behavior
-
Temporary AWS credentials are obtained and exported into the environment:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
AWS_DEFAULT_REGION
-
aws ssm start-session is launched from OpenSSH ProxyCommand
-
Session starts successfully and data channel opens
-
Around 60 minutes later, plugin attempts session resume/reconnect
-
ResumeSession returns 403
-
Plugin retries several times, then websocket closes and SSH tunnel is dropped
Minimal Logic
tempCreds = getScopedTemporaryCredentials(duration≈3600s)
export AWS_ACCESS_KEY_ID=tempCreds.accessKeyId
export AWS_SECRET_ACCESS_KEY=tempCreds.secretAccessKey
export AWS_SESSION_TOKEN=tempCreds.sessionToken
export AWS_DEFAULT_REGION=us-east-2
exec aws ssm start-session \
--region us-east-2 \
--target <instance-id> \
--document-name AWS-StartSSHSession \
--parameters portNumber=22
Results:
- Initial session: ✓ success
- After ~60 min:
- Plugin tries
ResumeSession
ResumeSession → HTTP 403
- Plugin retries
- Websocket closes
- SSH tunnel disconnects
Log Evidence
Sanitized plugin logs show:
INFO Opening websocket connection ...
INFO Successfully opened websocket connection ...
INFO Connected to instance[...] on port: 22
ERROR Reach the retry limit 5 for receive messages.
ERROR Trying to reconnect the session ...
ERROR Resume Session failed: operation error SSM: ResumeSession, https response error StatusCode: 403
ERROR Failed to get token: operation error SSM: ResumeSession, https response error StatusCode: 403
ERROR Error sending stream data message websocket: close sent
More specific failing locations from plugin logs:
websocketchannel.go.245
sessionhandler.go.95
sessionhandler.go.171
sessionhandler.go.190
streaming.go.315
Timing
Most recent repro:
- Session established: 2026-04-09 09:37:49 to 09:37:57
- Failure begins: 2026-04-09 10:37:50 Pacific time
This one-hour timing is consistent across reports.
Tests Already Done
-
✓ SSH keepalives enabled:
ServerAliveInterval=60
ServerAliveCountMax=3
- Same failure still occurs
-
✓ SSM_PLUGIN_SKIP_CLIENT_CONFIGURE=true tested
- Same failure still occurs
-
✓ Downgrading plugin to 1.2.553.0 removes the issue
-
✓ Upgrading back to 1.2.792.0 / 1.2.804.0 reproduces the issue
Questions for AWS Clarification
- Was there a behavioral change in session-manager-plugin after 1.2.553.0 affecting reconnect/resume?
- Is
ResumeSession now expected to require fresh locally available AWS credentials at reconnect time, even when the original StartSession succeeded?
- Is there a known regression in 1.2.764.0+, 1.2.792.0, or 1.2.804.0 around websocket rotation / resume?
- What exact IAM permissions and resource patterns are now required for successful
ResumeSession and ssmmessages:OpenDataChannel during reconnect?
- Is this expected when using short-lived scoped STS credentials exported via environment variables for
aws ssm start-session?
Important Observation
This does not look like an initial session establishment problem. StartSession succeeds. The failure is specifically in the plugin's reconnect/resume path about one hour later.
Summary
Long-lived SSH tunnels disconnect every ~60 minutes when using temporary scoped credentials. The Session Manager plugin successfully establishes an initial
StartSession, but around the one-hour mark, it attempts to reconnect/resume the data channel and fails withResumeSessionreturning HTTP 403.This is reproducible with newer session-manager-plugin versions, but not with 1.2.553.0.
Environment
Impact
Exact Behavior
Temporary AWS credentials are obtained and exported into the environment:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_SESSION_TOKENAWS_DEFAULT_REGIONaws ssm start-sessionis launched from OpenSSH ProxyCommandSession starts successfully and data channel opens
Around 60 minutes later, plugin attempts session resume/reconnect
ResumeSessionreturns 403Plugin retries several times, then websocket closes and SSH tunnel is dropped
Minimal Logic
Results:
ResumeSessionResumeSession→ HTTP 403Log Evidence
Sanitized plugin logs show:
More specific failing locations from plugin logs:
websocketchannel.go.245sessionhandler.go.95sessionhandler.go.171sessionhandler.go.190streaming.go.315Timing
Most recent repro:
This one-hour timing is consistent across reports.
Tests Already Done
✓ SSH keepalives enabled:
ServerAliveInterval=60ServerAliveCountMax=3✓
SSM_PLUGIN_SKIP_CLIENT_CONFIGURE=truetested✓ Downgrading plugin to 1.2.553.0 removes the issue
✓ Upgrading back to 1.2.792.0 / 1.2.804.0 reproduces the issue
Questions for AWS Clarification
ResumeSessionnow expected to require fresh locally available AWS credentials at reconnect time, even when the originalStartSessionsucceeded?ResumeSessionandssmmessages:OpenDataChannelduring reconnect?aws ssm start-session?Important Observation
This does not look like an initial session establishment problem.
StartSessionsucceeds. The failure is specifically in the plugin's reconnect/resume path about one hour later.