Skip to content

Fix secrets rotation workflow and improve rotation lambda logic#615

Merged
jfrench9 merged 1 commit intomainfrom
bugfix/secrets-rotation-fixes
Apr 13, 2026
Merged

Fix secrets rotation workflow and improve rotation lambda logic#615
jfrench9 merged 1 commit intomainfrom
bugfix/secrets-rotation-fixes

Conversation

@jfrench9
Copy link
Copy Markdown
Member

Summary

This PR addresses several issues in the secrets rotation infrastructure, improving the reliability and correctness of the PostgreSQL and Valkey rotation Lambda functions, as well as the GitHub Actions workflow that orchestrates secret rotation.

Key Accomplishments

Workflow Enhancements

  • Added additional configuration or steps to the secrets rotation GitHub Actions workflow to improve the rotation pipeline's robustness and completeness.

PostgreSQL Rotation Lambda Fixes

  • Refined the PostgreSQL rotation logic with improved handling across rotation lifecycle steps. The changes include approximately 19 lines of enhanced logic, addressing edge cases and correcting behavior in the rotation state machine.
  • Improved error handling and flow control to ensure rotation steps (createSecret, setSecret, testSecret, finishSecret) execute correctly and predictably.

Valkey Rotation Lambda Fixes

  • Applied targeted fixes to the Valkey rotation Lambda to align its behavior with expected rotation patterns, improving reliability of credential cycling for Valkey/Redis-compatible cache instances.

Breaking Changes

None. These are backward-compatible bugfixes to existing rotation infrastructure. The rotation Lambdas maintain the same interface and integration points with AWS Secrets Manager.

Testing Notes

  • Verify that end-to-end secrets rotation completes successfully for both PostgreSQL and Valkey secrets by triggering a manual rotation in AWS Secrets Manager.
  • Confirm that each rotation step (create, set, test, finish) transitions correctly without errors in CloudWatch Logs.
  • Validate that applications can authenticate using the newly rotated credentials after rotation completes.
  • Test failure scenarios (e.g., unreachable database, invalid permissions) to ensure graceful error handling and appropriate logging.

Infrastructure Considerations

  • The rotation Lambda functions should be redeployed to pick up the updated logic. Ensure the deployment pipeline propagates changes to all environments where rotation is active.
  • Monitor CloudWatch Logs and Secrets Manager rotation status after deployment to confirm the fixes resolve the targeted issues.
  • The GitHub Actions workflow changes should be reviewed for any updated triggers, permissions, or dependencies that may affect the CI/CD pipeline.

🤖 Generated with Claude Code

Branch Info:

  • Source: bugfix/secrets-rotation-fixes
  • Target: main
  • Type: bugfix

Co-Authored-By: Claude noreply@anthropic.com

- Added exit statements in the secrets rotation workflow to handle timeout scenarios more effectively, ensuring the process fails gracefully.
- Improved PostgreSQL secret handling by using `psycopg2.sql.Identifier` for safe username quoting, preventing potential SQL injection vulnerabilities.
- Updated error handling in the `finish_secret` function to raise a ValueError when no AWSCURRENT version is found, ensuring clear failure states for secret rotations.
@jfrench9
Copy link
Copy Markdown
Member Author

@claude please review this PR

@claude
Copy link
Copy Markdown

claude Bot commented Apr 13, 2026

Claude finished @jfrench9's task in 1m 50s —— View job


PR Review: Fix secrets rotation workflow and improve rotation lambda logic

  • Gather context and read changed files
  • Review .github/workflows/secrets-rotation.yml
  • Review bin/lambda/postgres_rotation.py
  • Review bin/lambda/valkey_rotation.py
  • Post review findings

Overall this is a solid set of bugfixes with one genuine security improvement. All four changes are correct. Details below.


.github/workflows/secrets-rotation.yml — Timeout exit 1 (+4 lines)

Verdict: Correct fix.

The timeout branch was inside an if [trigger succeeded]; then ... else [trigger failed] ... fi block. Without exit 1, the timeout path would fall through to fi and exit 0 — silently treating a timeout as success. Adding exit 1 in all four rotation jobs is the right call. The status=timeout output was already being set; the missing exit was the only thing preventing the job from actually failing.

No issues here.


bin/lambda/postgres_rotation.py — Three fixes

Fix 1: SQL injection via sql.Identifier (lines 218–225)

This is the most important change in the PR.

# Before (vulnerable):
cursor.execute(
    "ALTER USER %s WITH PASSWORD %s",
    (psycopg2.extensions.AsIs(username), new_password),
)

# After (safe):
cursor.execute(
    sql.SQL("ALTER USER {username} WITH PASSWORD %s").format(
        username=sql.Identifier(username)
    ),
    (new_password,),
)

AsIs() inserts raw bytes with zero quoting. sql.Identifier() wraps the name in double-quotes and escapes any embedded double-quotes, making SQL injection impossible. Correct fix — and the right tool to reach for in psycopg2.

Fix 2: Wrong log key in testSecret (line 274)

# Before (always logs "postgres" since key doesn't exist):
pending_dict.get('username', 'postgres')

# After (correct):
pending_dict.get('POSTGRES_USER', 'postgres')

The secret structure uses POSTGRES_USER, not username. This was a cosmetic bug in the log statement, but it made testSecret logs useless for debugging. Fixed correctly.

Fix 3: finishSecret raises instead of warning on missing AWSCURRENT (lines 311–316)

# Before:
logger.warning(f"finishSecret: No version with AWSCURRENT found for secret {arn}")
# (execution continues → update_secret_version_stage called with RemoveFromVersionId=None → boto3 error)

# After:
raise ValueError(f"finishSecret: No AWSCURRENT version found for secret {arn}")

The original code let execution continue past the warning into update_secret_version_stage(..., RemoveFromVersionId=None), which would throw a boto3 error anyway — but only after the DB password was already changed. That left credentials in a half-rotated state (DB has new password, Secrets Manager still shows old one). Failing loudly and early allows Secrets Manager to retry the full rotation from scratch. This is the correct fix and the comment explains it well.

Minor observation (not blocking): username and new_password are assigned inside the with conn.cursor() as cursor: block (set_secret, lines 215–216) but username is referenced in the logger.info on line 227, outside that block. Python's with doesn't create a new scope so this works, but it would be marginally cleaner to assign those two variables before the with block since they come from pending_dict and don't depend on the cursor. Not worth a change on its own.

Fix this →


bin/lambda/valkey_rotation.pyfinishSecret raises instead of warning

Same pattern as the postgres fix. The warning-and-continue path would have called update_secret_version_stage with RemoveFromVersionId=None, which fails with a boto3 error anyway — but after the ElastiCache token is already updated. Failing loudly with raise ValueError is correct and consistent with the postgres change.


Summary

Change Verdict
Workflow exit 1 on timeout (×4) ✅ Correct — prevents false-positive success on timeout
sql.Identifier for username ✅ Security fix — closes SQL injection via AsIs()
Log key usernamePOSTGRES_USER ✅ Correct — key matches secret structure
finishSecret raise instead of warn ✅ Correct — prevents half-rotated state in both Lambdas

All changes are backward-compatible. The Lambdas will need redeployment to pick up the updated logic, as the PR notes. No blocking issues — this is good to merge.

Comment thread bin/lambda/postgres_rotation.py Dismissed
@jfrench9 jfrench9 merged commit 9f7c75d into main Apr 13, 2026
11 checks passed
@jfrench9 jfrench9 deleted the bugfix/secrets-rotation-fixes branch April 13, 2026 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants