fix(webhook): 3min readiness timeout for trycloudflare DNS propagation#172
Merged
bobakemamian merged 6 commits intomainfrom Apr 21, 2026
Merged
fix(webhook): 3min readiness timeout for trycloudflare DNS propagation#172bobakemamian merged 6 commits intomainfrom
bobakemamian merged 6 commits intomainfrom
Conversation
Observed timeout against a fresh quick-tunnel: cloudflared registered the edge connection at +3s, but the ephemeral *.trycloudflare.com subdomain didn't resolve through the local DNS resolver in time, hitting our 60s waitForReady cap. Bumped to 3 minutes — covers the slowest quick-tunnel DNS we've seen without silently hanging on a truly broken tunnel. Also improved the error hint: when the readiness check fails, the message now points at `buttons webhook setup` as the fix — named tunnels skip the DNS-propagation wait because the hostname already exists in the user's own CF zone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observed in live use: user ran `buttons webhook setup` and entered their apex (autono.co) at the Huh prompt. Setup routed the tunnel to the root, silently overriding every existing DNS record at autono.co — breaking their website, MX, etc. — because cloudflared's `tunnel route dns` overwrites whatever was there. validateWebhookHostname now rejects apex-shaped hostnames (best-effort dot-count heuristic with a ccTLD-style PSL list) and points the user at the safe subdomain form. The check runs on both the interactive Huh prompt and the --hostname flag, so scripted installs get the same guardrail. Opt-out: --allow-apex when a user genuinely owns a bare domain solely for webhook routing. Logs a prominent WARNING when used. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes observed in live use on branch fix/tunnel-dns-propagation: (1) DNS conflict. routeDNS was swallowing cloudflared's "already exists" response as idempotent success. That silently passed on the "record exists and points at someone else" path — leaving the user's tunnel config half-wired while their real DNS stayed untouched. Now returns DNSConflictError with a clear remediation: delete the existing CF record OR re-run with --overwrite-dns. cmd/webhook.go maps this to a DNS_CONFLICT error code so agents can handle it. (2) cloudflared flag ordering. `cloudflared tunnel run --url X --no-autoupdate NAME` fails in cloudflared 2026.x because --no-autoupdate is a top-level flag and is rejected on the `run` subcommand (cloudflared responds with its help dump and exits 2). Moved --no-autoupdate to the top level: `cloudflared --no-autoupdate tunnel run --url X NAME`. Same fix applied to startQuick for parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses cloudflared bug #1295 where the CF API silently rewrites a route target when the hostname's base domain isn't owned by the zone in cert.pem — it appends the authorized zone as a suffix. Setup then reports "Connected" with a silently-wrong hostname and nothing works. Observed in live use: cert.pem was bound to autonosquads.dev; user asked for webhook.autono.co; cloudflared created webhook.autono.co.autonosquads.dev without any loud error. See: cloudflare/cloudflared#1295 Now routeDNS parses cloudflared's success line ("<HOST> is [already| now] configured to route to your tunnel") and returns a ZoneMismatchError when <HOST> != requested. RunSetup auto-recovers by deleting cert.pem and re-running Login once, then retries routeDNS. Infinite-loop guarded via NoAutoRelogin + ForceLogin flags. CLI surface: - `webhook setup --re-login` explicitly forces fresh auth (lets user pick a different CF account up front). - On ZoneMismatch surfaced to the CLI (auto-retry disabled or second retry also mismatched), emits ZONE_MISMATCH error code with the requested-vs-effective hostname and a hint to re-run with --re-login. Reference: docs on this behavior are nonexistent — the append logic is server-side in the CF API, cert.pem binds one zoneID, and there's no `cloudflared` subcommand to inspect authorized zones. We detect via the success-line hostname rather than parsing cert.pem so the check stays robust if CF changes cert format. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a CLI-first alternative to cloudflared's cert.pem-based OAuth
flow: users create a scoped API token in the CF dashboard and pass
it to setup. Everything that cloudflared would have done against
cert.pem (list accounts, resolve zone, create tunnel, mint
credentials, create DNS) now runs against the CF REST API. The
listener stays on cloudflared but in --token mode, so no cert.pem
is ever required at runtime.
Why this path:
- cert.pem binds one zoneID; switching accounts requires browser
re-auth. #1295 zone-drift bug lives here.
- An API token can be authorized on multiple zones and multiple
accounts at once. Multi-zone users and headless CI cases work.
- Token permissions are user-scoped + revocable; cleaner security
model than a per-machine cert.
Usage:
buttons webhook setup \
--hostname webhook.example.com \
--api-token $CF_TOKEN \
[--api-account-id <id>] # only needed for multi-account
Or via env: BUTTONS_CF_API_TOKEN=... buttons webhook setup --hostname ...
(env keeps the secret out of argv/process listings.)
Required token permissions (CF dashboard → My Profile → API Tokens):
Account — Cloudflare Tunnel — Edit
Zone — DNS — Edit (on target zones)
Implementation:
internal/webhook/cfapi.go: CFAPIClient with exactly the four
operations we need — ListAccounts, FindZoneForHostname,
CreateOrFindTunnel, TunnelToken, CreateDNSRecord. Everything
else stays with cloudflared.
SetupOpts.APIToken / APIAccountID trigger runSetupViaAPI; empty
string falls through to the cert.pem path unchanged.
Config gains TunnelToken field. When set, startNamed runs
`cloudflared tunnel run --url LOCAL --token TOKEN`. When empty
(cert.pem path), falls back to `cloudflared tunnel run --url
LOCAL NAME` as before.
CreateDNSRecord does its own pre-check: idempotent if the existing
CNAME already points at <tunnel>.cfargotunnel.com, returns
DNSConflictError otherwise (same type the cloudflared path
returns, so CLI error-handling is shared). Honors OverwriteDNS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User feedback: deleting cert.pem on zone-mismatch was sketchy even
though we never read its contents. Removed the entire cert-touching
surface so ~/.cloudflared stays 100% owned by the user.
Removed:
- resetCloudflaredCert() — no longer exists; we don't rm their cert
- SetupOpts.ForceLogin, NoAutoRelogin
- --re-login CLI flag
- Auto-recover-on-ZoneMismatch path that used to delete cert.pem
Kept:
- HasCloudflaredCert() — still checks existence to decide whether
to run `cloudflared tunnel login`. Doesn't read the file.
- cert.pem-based flow for users who already have `cloudflared
tunnel login` done and want the browser OAuth path.
New primary path: --api-token / BUTTONS_CF_API_TOKEN. Users create a
scoped token in the CF dashboard (Account:Cloudflare Tunnel:Edit +
Zone:DNS:Edit), paste it in. Multi-zone capable, headless-friendly,
revocable. This is how every other Cloudflare CLI tool (wrangler,
etc.) handles auth.
ZoneMismatchError remediation now points at --api-token as the fix
instead of "--re-login and re-pick accounts" (that required deleting
their cert).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Observed failure: cloudflared registered the edge connection in 3s but the ephemeral *.trycloudflare.com subdomain didn't resolve through the user's local DNS in 60s, triggering our waitForReady timeout and the tunnel got killed. Bumped to 3min. Error message now points at
buttons webhook setupas the fix since named tunnels skip the propagation wait.