Skip to content

Expose the Keystone API via Gateway API in the Quick Start using Envoy Gateway and nip.io #265

@berendt

Description

@berendt

Category: enhancement | Scope: Medium

Description: Take the optional spec.gateway HTTPRoute sub-reconciler landed in CC-0065 (#238, commit 0b2c7d9) all the way through the Quick Start, so that a fresh kind cluster ends with the Keystone API reachable at a fixed, externally-looking hostname — https://keystone.127-0-0-1.nip.io/v3 — instead of the current kubectl port-forward svc/keystone-api -n openstack 5000:5000 workaround (docs/quick-start.md:498-546). Using the public nip.io wildcard resolver means any *.127-0-0-1.nip.io label resolves to 127.0.0.1 over regular DNS with no /etc/hosts editing, no MetalLB, and no CoreDNS customisation — the hostname is stable, identical on every developer's machine, and round-trips cleanly through the Keystone CRD's status.endpoint derivation (CC-0065: "Derive status.endpoint from spec.gateway.hostname as https://{hostname}/v3 when gateway is set"). Today hack/deploy-infra.sh:707-729 installs the Gateway API standard-install.yaml CRDs but no controller, no GatewayClass, and no Gateway, so spec.gateway is effectively a prod-only code path the Quick Start never exercises. The HTTPRoute E2E (tests/e2e/keystone/httproute/) compensates by applying the stub CRD from 00-httproute-crd.yaml and patching Accepted=True onto the parent status manually — a fresh onboarding user has no such shortcut.

Concrete steps: (1) add a kind-only Envoy Gateway install — matches the envoy-gateway-system namespace already referenced by operators/keystone/api/v1alpha1/keystone_webhook_test.go:1083,1537, the network-policy E2E (tests/e2e/keystone/network-policy/00-keystone-cr.yaml:38), and the CRD reference example at docs/reference/keystone-crd.md:476. Deliver it as a HelmRepository + HelmRelease under deploy/kind/base/ (new envoy-gateway.yaml) so Flux reconciles it the same way it reconciles cert-manager / MariaDB-operator / memcached-operator (deploy/flux-system/releases/*.yaml); keep the production overlay deploy/flux-system/ untouched — same kind-only posture as Headlamp (deploy/kind/base/headlamp.yaml) and the OpenBao UI flip (CC-0082). (2) Ship a GatewayClass/envoy + Gateway/openstack-gw (namespace openstack, not envoy-gateway-system, so Keystone's spec.gateway.parentRef does not need a ReferenceGrant — the operator explicitly does not manage ReferenceGrant, see keystone_types.go:348-354) with a single HTTPS listener on :443, hostname keystone.127-0-0-1.nip.io, TLS terminated with a self-signed certificate issued by the existing selfsigned-cluster-issuer (already set up by the OpenBao chain). Use an EnvoyProxy CR to switch the proxy Service to NodePort with a fixed nodePort so kind can route it without MetalLB. Name the Gateway openstack-gw to line up with the E2E fixture at tests/e2e/keystone/httproute/01-keystone-cr.yaml:40. (3) Extend hack/kind-config.yaml with extraPortMappings mapping host 443 → the fixed Envoy NodePort so https://keystone.127-0-0-1.nip.io/v3 (which resolves to 127.0.0.1 via the public nip.io wildcard) reaches the Envoy proxy on the kind container. (4) Add a Step 2b to hack/deploy-infra.sh — after the Gateway API CRDs install — that waits for the new Envoy Gateway HelmRelease and the Gateway/openstack-gw resource to report Programmed=True; extend the existing Step 4 HelmRelease wait list to include envoy-gateway. (5) Rewrite Quick Start Step 7 (docs/quick-start.md:374-414) so the sample Keystone CR includes spec.gateway with parentRef.name: openstack-gw, hostname: keystone.127-0-0-1.nip.io, path: /. (6) Rewrite the ## Access Keystone from your local machine section (docs/quick-start.md:498-546) to drop the port-forward and document: export OS_AUTH_URL=https://keystone.127-0-0-1.nip.io/v3, decide whether to trust the self-signed CA or set OS_INSECURE=true, then openstack token issue works without any terminal-blocking port-forward and without any local DNS / hosts-file editing. (7) Add an ## Accept the self-signed certificate subsection explaining how to extract the self-signed CA from the cert-manager selfsigned-cluster-issuer and add it to the local trust store, for users who do not want -k / OS_INSECURE. (8) Update the docs/reference/infrastructure/e2e-deployment.md:104-108 diagram to insert the Envoy Gateway + Gateway/openstack-gw install between Step 2a (CRDs) and Step 3 (base overlay). (9) Add a new E2E suite under tests/e2e/keystone/gateway-quick-start/ (or extend httproute/) that deploys against a real Envoy Gateway on kind and asserts that HTTPRoute.status.parents[0].conditions[type=Accepted]=True arrives from the real controller — not the simulated patch step — so the Quick Start path is covered by CI.

Motivation: CC-0065 (#238) made spec.gateway a first-class feature but kept the Quick Start on the port-forward path. That leaves three concrete gaps for the on-ramp: (a) contributors who read the CRD reference (docs/reference/keystone-crd.md:590-615) and decide to try spec.gateway in kind will set it, watch HTTPRouteReady=GatewayAPINotInstalled stick, and have no documented way to install a Gateway controller on the kind cluster — the operator reports GatewayAPINotInstalled as a terminal state if the HTTPRoute CRD is present but no Gateway controller has claimed the parent, exactly the kind-cluster state today; (b) the status.endpoint field, which the operator derives as https://{hostname}/v3 when spec.gateway is set, is a hollow promise in the Quick Start — it prints a URL that nothing resolves; (c) the current port-forward flow forces a second terminal window and then silently breaks the moment openstack CLI tries to resolve catalog endpoints, because the catalog contains cluster-internal URLs (docs/quick-start.md:541-545) — a real Gateway listener fixes both problems at once for identity-scope commands and sets up the story for follow-on OpenStack services landing under the same Gateway. Envoy Gateway is the natural choice because the project already assumes that namespace (envoy-gateway-system) in tests and reference docs, and because the upstream Envoy Gateway Helm chart supports EnvoyProxy-based NodePort exposure out of the box — no MetalLB dependency. nip.io is chosen over /etc/hosts entries because it needs zero local configuration: every developer hits the exact same hostname, CI gets the same hostname, the Quick Start has no "add this line to your hosts file as root" caveat, and the Keystone CR fixtures are byte-for-byte reproducible across machines. Shape and scope match CC-0082 (OpenBao UI, kind-only Flux release + Quick Start section) and CC-0086/#257 (flux-operator Web UI, kind-only Flux release + Quick Start section): one kind-only manifest, one Quick Start rewrite, one E2E suite extension; production overlay untouched.

Affected Areas:

  • deploy/kind/base/envoy-gateway.yaml (new — HelmRepository + HelmRelease for the upstream envoy-gateway chart, pinned; EnvoyProxy CR switching the proxy Service to NodePort with a fixed nodePort so extraPortMappings can target it)
  • deploy/kind/base/openstack-gateway.yaml (new — GatewayClass/envoy (controllerName gateway.envoyproxy.io/gatewayclass-controller), Gateway/openstack-gw in namespace openstack with an HTTPS listener on :443 for hostname keystone.127-0-0-1.nip.io, tls.mode: Terminate, certificateRefs pointing at a Certificate issued by selfsigned-cluster-issuer)
  • deploy/kind/base/kustomization.yaml (add the two new manifests to resources alongside headlamp.yaml; production overlay deploy/flux-system/kustomization.yaml stays unchanged)
  • deploy/flux-system/sources/ (new envoy-gateway-charts.yaml HelmRepository if the chart is not already reachable via c5c3-charts)
  • hack/kind-config.yaml (add extraPortMappings for host 443 → container NodePort; the current file has an empty nodes: list so this is a single additive block)
  • hack/deploy-infra.sh (new Step 2b that waits for HelmRelease/envoy-gateway Ready and Gateway/openstack-gw Programmed=True; extend the existing Step 4 HelmRelease wait list to include envoy-gateway)
  • docs/quick-start.md — multiple edits: (a) Step 3 "What happens" table (:103-116) gets a Step 2b row for Envoy Gateway + Gateway install; (b) Step 7 sample CR (:380-410) gains spec.gateway; (c) the ## Access Keystone from your local machine section (:498-546) is rewritten to use https://keystone.127-0-0-1.nip.io/v3 directly — no /etc/hosts step, no port-forward; (d) a new subsection documents how to trust / bypass the self-signed TLS cert; (e) a short one-liner explains how nip.io works (public wildcard DNS that resolves *.127-0-0-1.nip.io to 127.0.0.1, so no local DNS changes are required — the hostname is the same everywhere); (f) the architecture snapshot (:141-155) lists envoy-gateway-system envoy-gateway-* Ready alongside the existing controllers
  • docs/reference/infrastructure/e2e-deployment.md (insert a Gateway-install block between the existing Step 2a and Step 3 at :104-108)
  • docs/reference/keystone-crd.md:590-615 (add a kind-specific note pointing at the new Quick Start section and clarifying that status.endpoint = https://{hostname}/v3 now actually resolves on a Quick Start cluster)
  • tests/e2e/keystone/gateway-quick-start/ (new Chainsaw suite that deploys a Keystone CR with spec.gateway against the real Envoy Gateway and asserts real Accepted=True from the controller, not a simulated patch)
  • renovate.json (pin the Envoy Gateway chart version with a customManagers entry mirroring the FLUX_OPERATOR_VERSION pattern introduced by CC-0085, so chart bumps auto-PR)
  • Nothing in operators/keystone/spec.gateway, the webhook, the reconciler, and HTTPRouteReady are already in place from CC-0065

Acceptance Criteria:

  • deploy/kind/base/envoy-gateway.yaml ships a HelmRelease/envoy-gateway in envoy-gateway-system that reaches Ready=True during make deploy-infra within the existing HELMRELEASE_TIMEOUT window
  • deploy/kind/base/openstack-gateway.yaml ships a GatewayClass/envoy and a Gateway/openstack-gw in namespace openstack; on a fresh make deploy-infra run, kubectl get gateway openstack-gw -n openstack -o jsonpath='{.status.conditions[?(@.type=="Programmed")].status}' returns True
  • hack/kind-config.yaml exposes the Envoy proxy NodePort on host 443 via extraPortMappings so https://keystone.127-0-0-1.nip.io/v3 is reachable on the developer's machine with no further port-forward
  • The Quick Start Step 7 sample CR sets spec.gateway with parentRef.name: openstack-gw, hostname: keystone.127-0-0-1.nip.io, path: /
  • status.endpoint on the Quick Start CR reports https://keystone.127-0-0-1.nip.io/v3 after reconciliation, matching spec.gateway.hostname
  • curl -k https://keystone.127-0-0-1.nip.io/v3 returns HTTP 200 with a {"version": {"id": "v3", ...}} JSON body on a freshly deployed kind cluster — no /etc/hosts edit required, no kubectl port-forward running
  • With OS_AUTH_URL=https://keystone.127-0-0-1.nip.io/v3, openstack token issue succeeds — no kubectl port-forward, no DNS / hosts-file editing
  • docs/quick-start.md ## Access Keystone from your local machine section is rewritten and no longer references kubectl port-forward svc/keystone-api; the self-signed TLS handling (trust the CA or pass -k / OS_INSECURE) is documented inline; a short note explains the nip.io wildcard so readers understand why no local DNS config is needed
  • docs/reference/infrastructure/e2e-deployment.md diagram (:104-108 block) shows the Gateway install immediately after the Gateway API CRDs step
  • tests/e2e/keystone/gateway-quick-start/ asserts HTTPRoute.status.parents[0].conditions[?(@.type=="Accepted")].status=True arrives from the real Envoy Gateway controller within the Chainsaw timeout envelope, without any manual status patching
  • renovate.json has a customManagers entry that pins the Envoy Gateway chart version with minimumReleaseAge: "3 days" and disabled major bumps, same pattern as FLUX_OPERATOR_VERSION
  • deploy/flux-system/kustomization.yaml, deploy/flux-system/fluxinstance.yaml, and deploy/flux-system/releases/* are not modified — production overlay posture is unchanged
  • The existing kubectl port-forward svc/keystone-api path still works for users on networks that block nip.io resolution; it is moved into a "Fallback" subsection but not deleted

Non-Goals:

  • Installing Envoy Gateway (or any Gateway controller) in the production deploy/flux-system/ overlay. The operator stays platform-agnostic — customers pick their own Gateway implementation, exactly as the CRD reference already documents (docs/reference/keystone-crd.md:316-318: "The Gateway and GatewayClass are infrastructure concerns managed outside the operator"). This feature is strictly a kind-overlay demo convenience, same posture as CC-0082 (OpenBao UI) and CC-0086 (Flux Web UI).
  • Replacing the port-forward path for the rest of the Quick Start (Headlamp, OpenBao UI). Those stay on kubectl port-forward because they serve cluster-internal HTTP on non-443 ports; exposing them through the Gateway would require per-service HTTPRoute objects that do not match their current lifecycle.
  • Cross-namespace parentRef support. Gateway/openstack-gw lives in the same openstack namespace as the Keystone CR to avoid ReferenceGrant — the operator explicitly does not manage ReferenceGrant (keystone_types.go:348-354: "Cross-namespace references require a ReferenceGrant in the target namespace (out of scope for this operator)").
  • Bringing up MetalLB in kind. Envoy Gateway's EnvoyProxy CR supports NodePort directly, which is the lighter-weight path and matches kind's single-node posture.
  • Switching the E2E httproute suite from its simulated parent-status patch (tests/e2e/keystone/httproute/chainsaw-test.yaml:99-100) to the real controller. That suite exists to exercise the operator reconciler in isolation — a new suite under gateway-quick-start/ covers the real-controller path instead.
  • Editing /etc/hosts or shipping CoreDNS rewrite rules. The explicit design choice is public-wildcard DNS (nip.io) so the Quick Start has zero host-side DNS configuration. If nip.io is unreachable on a developer's network, the retained port-forward fallback covers that case.
  • Wiring OIDC / SSO / real CAs on the kind Gateway. The Quick Start cluster is single-user localhost-only; self-signed is correct, matching the OpenBao UI's posture (docs/quick-start.md:242-245: "Your browser will warn that the certificate is not trusted — this is expected for a kind cluster").

References:

Labels: enhancement

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions