Skip to content

fix(webhook): 3min readiness timeout for trycloudflare DNS propagation#172

Merged
bobakemamian merged 6 commits intomainfrom
fix/tunnel-dns-propagation
Apr 21, 2026
Merged

fix(webhook): 3min readiness timeout for trycloudflare DNS propagation#172
bobakemamian merged 6 commits intomainfrom
fix/tunnel-dns-propagation

Conversation

@bobakemamian
Copy link
Copy Markdown
Contributor

Observed failure: cloudflared registered the edge connection in 3s but the ephemeral *.trycloudflare.com subdomain didn't resolve through the user's local DNS in 60s, triggering our waitForReady timeout and the tunnel got killed. Bumped to 3min. Error message now points at buttons webhook setup as the fix since named tunnels skip the propagation wait.

bobakemamian and others added 6 commits April 20, 2026 20:54
Observed timeout against a fresh quick-tunnel: cloudflared registered
the edge connection at +3s, but the ephemeral *.trycloudflare.com
subdomain didn't resolve through the local DNS resolver in time,
hitting our 60s waitForReady cap. Bumped to 3 minutes — covers the
slowest quick-tunnel DNS we've seen without silently hanging on a
truly broken tunnel.

Also improved the error hint: when the readiness check fails, the
message now points at `buttons webhook setup` as the fix — named
tunnels skip the DNS-propagation wait because the hostname already
exists in the user's own CF zone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observed in live use: user ran `buttons webhook setup` and entered
their apex (autono.co) at the Huh prompt. Setup routed the tunnel to
the root, silently overriding every existing DNS record at autono.co
— breaking their website, MX, etc. — because cloudflared's `tunnel
route dns` overwrites whatever was there.

validateWebhookHostname now rejects apex-shaped hostnames (best-effort
dot-count heuristic with a ccTLD-style PSL list) and points the user
at the safe subdomain form. The check runs on both the interactive
Huh prompt and the --hostname flag, so scripted installs get the same
guardrail.

Opt-out: --allow-apex when a user genuinely owns a bare domain solely
for webhook routing. Logs a prominent WARNING when used.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes observed in live use on branch fix/tunnel-dns-propagation:

(1) DNS conflict. routeDNS was swallowing cloudflared's "already
exists" response as idempotent success. That silently passed on the
"record exists and points at someone else" path — leaving the user's
tunnel config half-wired while their real DNS stayed untouched. Now
returns DNSConflictError with a clear remediation: delete the
existing CF record OR re-run with --overwrite-dns. cmd/webhook.go
maps this to a DNS_CONFLICT error code so agents can handle it.

(2) cloudflared flag ordering. `cloudflared tunnel run --url X
--no-autoupdate NAME` fails in cloudflared 2026.x because
--no-autoupdate is a top-level flag and is rejected on the `run`
subcommand (cloudflared responds with its help dump and exits 2).
Moved --no-autoupdate to the top level: `cloudflared --no-autoupdate
tunnel run --url X NAME`. Same fix applied to startQuick for parity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses cloudflared bug #1295 where the CF API silently rewrites a
route target when the hostname's base domain isn't owned by the zone
in cert.pem — it appends the authorized zone as a suffix. Setup then
reports "Connected" with a silently-wrong hostname and nothing works.

Observed in live use: cert.pem was bound to autonosquads.dev; user
asked for webhook.autono.co; cloudflared created
webhook.autono.co.autonosquads.dev without any loud error. See:
cloudflare/cloudflared#1295

Now routeDNS parses cloudflared's success line ("<HOST> is [already|
now] configured to route to your tunnel") and returns a
ZoneMismatchError when <HOST> != requested. RunSetup auto-recovers
by deleting cert.pem and re-running Login once, then retries
routeDNS. Infinite-loop guarded via NoAutoRelogin + ForceLogin flags.

CLI surface:
- `webhook setup --re-login` explicitly forces fresh auth (lets user
  pick a different CF account up front).
- On ZoneMismatch surfaced to the CLI (auto-retry disabled or second
  retry also mismatched), emits ZONE_MISMATCH error code with the
  requested-vs-effective hostname and a hint to re-run with --re-login.

Reference: docs on this behavior are nonexistent — the append logic
is server-side in the CF API, cert.pem binds one zoneID, and there's
no `cloudflared` subcommand to inspect authorized zones. We detect
via the success-line hostname rather than parsing cert.pem so the
check stays robust if CF changes cert format.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a CLI-first alternative to cloudflared's cert.pem-based OAuth
flow: users create a scoped API token in the CF dashboard and pass
it to setup. Everything that cloudflared would have done against
cert.pem (list accounts, resolve zone, create tunnel, mint
credentials, create DNS) now runs against the CF REST API. The
listener stays on cloudflared but in --token mode, so no cert.pem
is ever required at runtime.

Why this path:
  - cert.pem binds one zoneID; switching accounts requires browser
    re-auth. #1295 zone-drift bug lives here.
  - An API token can be authorized on multiple zones and multiple
    accounts at once. Multi-zone users and headless CI cases work.
  - Token permissions are user-scoped + revocable; cleaner security
    model than a per-machine cert.

Usage:
  buttons webhook setup \
    --hostname webhook.example.com \
    --api-token $CF_TOKEN \
    [--api-account-id <id>]          # only needed for multi-account

Or via env: BUTTONS_CF_API_TOKEN=... buttons webhook setup --hostname ...
(env keeps the secret out of argv/process listings.)

Required token permissions (CF dashboard → My Profile → API Tokens):
  Account — Cloudflare Tunnel — Edit
  Zone    — DNS               — Edit  (on target zones)

Implementation:
  internal/webhook/cfapi.go: CFAPIClient with exactly the four
    operations we need — ListAccounts, FindZoneForHostname,
    CreateOrFindTunnel, TunnelToken, CreateDNSRecord. Everything
    else stays with cloudflared.

  SetupOpts.APIToken / APIAccountID trigger runSetupViaAPI; empty
    string falls through to the cert.pem path unchanged.

  Config gains TunnelToken field. When set, startNamed runs
    `cloudflared tunnel run --url LOCAL --token TOKEN`. When empty
    (cert.pem path), falls back to `cloudflared tunnel run --url
    LOCAL NAME` as before.

  CreateDNSRecord does its own pre-check: idempotent if the existing
    CNAME already points at <tunnel>.cfargotunnel.com, returns
    DNSConflictError otherwise (same type the cloudflared path
    returns, so CLI error-handling is shared). Honors OverwriteDNS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User feedback: deleting cert.pem on zone-mismatch was sketchy even
though we never read its contents. Removed the entire cert-touching
surface so ~/.cloudflared stays 100% owned by the user.

Removed:
  - resetCloudflaredCert() — no longer exists; we don't rm their cert
  - SetupOpts.ForceLogin, NoAutoRelogin
  - --re-login CLI flag
  - Auto-recover-on-ZoneMismatch path that used to delete cert.pem

Kept:
  - HasCloudflaredCert() — still checks existence to decide whether
    to run `cloudflared tunnel login`. Doesn't read the file.
  - cert.pem-based flow for users who already have `cloudflared
    tunnel login` done and want the browser OAuth path.

New primary path: --api-token / BUTTONS_CF_API_TOKEN. Users create a
scoped token in the CF dashboard (Account:Cloudflare Tunnel:Edit +
Zone:DNS:Edit), paste it in. Multi-zone capable, headless-friendly,
revocable. This is how every other Cloudflare CLI tool (wrangler,
etc.) handles auth.

ZoneMismatchError remediation now points at --api-token as the fix
instead of "--re-login and re-pick accounts" (that required deleting
their cert).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bobakemamian bobakemamian merged commit d2fb029 into main Apr 21, 2026
16 checks passed
@bobakemamian bobakemamian deleted the fix/tunnel-dns-propagation branch April 21, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant