Conversation
When switching between staging and production, the cert file persists on the volume (path is per-domain) but the ACME account only exists under the previous server's directory (path is per-server). This caused 'certbot renew' to silently do nothing while no new account was created. Check for ACME account existence (matching current staging/production mode) instead of just cert file existence. When the account is missing, fall through to obtain_certificate which registers a new ACME account.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When switching
CERTBOT_STAGINGbetweentrueandfalse(from production to staging), the dstack-ingress container enters a crash loop and HTTPS becomes completely unavailable.Root Cause
Certificate files and ACME account files are stored under different directory structures in
/etc/letsencrypt/:Cert files are stored by domain name (same path regardless of staging/production):
ACME account files are stored by ACME server (different path for staging vs production):
When switching from production to staging (or vice versa), the
cert-dataDocker volume still has the old cert file, but the ACME account only exists under the previous server's directory. This triggers the following failure chain:certificate_exists()returnsTrue(cert file is per-domain, still exists)renewpath instead ofobtain(skips account registration)certbot renew --stagingexits 0 but does nothing useful (no matching renewal config for staging server)generate-evidences.shlooks for ACME account file under the staging path → not found → exits 1entrypoint.shhasset -e→ script exits → nginx never starts → container crash loopsFix
1. Check ACME account existence before deciding renew vs obtain
Add
acme_account_exists()that checks if an account file exists for the current staging/production mode. Theautoaction now requires both cert file AND matching ACME account to take therenewpath. When the account is missing, it falls through toobtain_certificate()which registers a new ACME account.2. Fix false success reporting in
renew_certificate()certbot renewexits 0 even when "No renewals were attempted". Previously the code treated all exit-0 as successful renewal. Now it checks stdout for "No renewals were attempted" before reporting success, and correctly returns(True, False)instead of(True, True).3. Remove unreachable dead code
The "No renewals were attempted" check after the
if/elseblock was dead code (unreachable afterreturnstatements). Removed.Test plan
CERTBOT_STAGING=trueon a fresh volume → should obtain cert via stagingCERTBOT_STAGING=true→ should detect missing staging account and re-obtain