Prior Search
What happened?
When wf_tf_deploy fails in a way that leaves a Terraform state lock in DynamoDB, the force-unlock fail-hook cannot release it. It dies during pf config get because SOPS decrypt of *.secrets.yaml fails looking for an AWS profile that exists only on developer laptops.
force-unlock.sh Step 3 calls pf config get --directory $TF_APPLY_DIR, which makes terragrunt evaluate locals, which pulls SOPS-encrypted YAML. SOPS reads sops.aws_profile from the file's metadata header (named by whichever developer last encrypted the file, e.g. production-superuser) and looks for it in the runner's $AWS_CONFIG_FILE, which only contains [profile ci]. SOPS aborts before it ever talks to KMS — the KMS arns in the error are a red herring; the actionable line is could not load AWS config: failed to get shared config | profile, <name>.
deploy.sh solves this with pf wf sops-set-profile . ci at Step 5, rewriting sops.aws_profile in every *.secrets.yaml to ci before any terragrunt evaluation. The equivalent step is missing from force-unlock.sh.
Regression from dcfa7211 (bash→TS CLI refactor). Pre-refactor, force-unlock.sh Step 3 called pf-get-terragrunt-variables, which did not trigger SOPS evaluation. Its replacement pf config get does. deploy.sh already had the sops-set-profile step pre-refactor (renamed in place from pf-sops-set-profile); force-unlock.sh was never given an equivalent step.
The fix is one line: insert pf wf sops-set-profile . ci between Step 2 (the AWS_CONFIG_FILE write) and Step 3 (pf config get).
Steps to Reproduce
- Set up
wf_tf_deploy in a consumer repo whose *.secrets.yaml files were last encrypted by a developer (so sops.aws_profile in the file metadata is something other than ci, e.g. production-superuser).
- Submit a workflow that will fail in a way that leaves a TF state lock — e.g. a
terragrunt apply --all whose underlying module is broken or where vault is unreachable mid-apply.
- Observe the
force-unlock fail-hook firing automatically.
- Watch it abort during
pf config get with a SOPS / KMS error, leaving the lock in DynamoDB.
Relevant log output
arn:aws:kms:us-west-2:<account>:key/mrk-<key-id>||<dev-profile-name>: FAILED
- | could not load AWS config: failed to get shared config
| profile, <dev-profile-name>
Error: could not decrypt sops file
Prior Search
What happened?
When
wf_tf_deployfails in a way that leaves a Terraform state lock in DynamoDB, theforce-unlockfail-hook cannot release it. It dies duringpf config getbecause SOPS decrypt of*.secrets.yamlfails looking for an AWS profile that exists only on developer laptops.force-unlock.shStep 3 callspf config get --directory $TF_APPLY_DIR, which makes terragrunt evaluatelocals, which pulls SOPS-encrypted YAML. SOPS readssops.aws_profilefrom the file's metadata header (named by whichever developer last encrypted the file, e.g.production-superuser) and looks for it in the runner's$AWS_CONFIG_FILE, which only contains[profile ci]. SOPS aborts before it ever talks to KMS — the KMS arns in the error are a red herring; the actionable line iscould not load AWS config: failed to get shared config | profile, <name>.deploy.shsolves this withpf wf sops-set-profile . ciat Step 5, rewritingsops.aws_profilein every*.secrets.yamltocibefore any terragrunt evaluation. The equivalent step is missing fromforce-unlock.sh.Regression from
dcfa7211(bash→TS CLI refactor). Pre-refactor,force-unlock.shStep 3 calledpf-get-terragrunt-variables, which did not trigger SOPS evaluation. Its replacementpf config getdoes.deploy.shalready had thesops-set-profilestep pre-refactor (renamed in place frompf-sops-set-profile);force-unlock.shwas never given an equivalent step.The fix is one line: insert
pf wf sops-set-profile . cibetween Step 2 (theAWS_CONFIG_FILEwrite) and Step 3 (pf config get).Steps to Reproduce
wf_tf_deployin a consumer repo whose*.secrets.yamlfiles were last encrypted by a developer (sosops.aws_profilein the file metadata is something other than ci, e.g.production-superuser).terragrunt apply --allwhose underlying module is broken or where vault is unreachable mid-apply.force-unlockfail-hook firing automatically.pf config getwith a SOPS / KMS error, leaving the lock in DynamoDB.Relevant log output