Skip to content

Latest commit

 

History

History
441 lines (355 loc) · 28.1 KB

gep-1897.md

File metadata and controls

441 lines (355 loc) · 28.1 KB

GEP-1897: TLS from Gateway to Backend for ingress

  • Issue: #1897
  • Status: Provisional

TLDR

This document specifically addresses the topic of conveying HTTPS from the Gateway dataplane to the backend (backend TLS termination), and intends to satisfy the single use case “As a client implementation of Gateway API, I need to know how to connect to a backend pod that has its own certificate”. TLS configuration can be a nebulous topic, so in order to drive resolution this GEP focuses only on this single piece of functionality.

Immediate Goals

  1. The solution must satisfy the following use case: the backend pod has its own certificate and the gateway implementation client needs to know how to connect to the backend pod. (Use case #4 in Gateway API TLS Use Cases)
  2. In terms of the Gateway API personas, only the application developer persona in this solution. The application developer should control the gateway to backend TLS settings, not the cluster operator, as requiring a cluster operator to manage certificate renewals and revocations would be extremely cumbersome.
  3. The solution should consider client certificate settings used in the TLS handshake from Gateway to backend, such as TLS versions and cipher suites. (Use case #5 in Gateway API TLS Use Cases)

Longer Term Goals

These are worthy goals, but may need a different GEP for proper attention.

  1. TCPRoute use cases (completed by GA)
  2. Mutual TLS use cases
  3. Service mesh use cases

Non-Goals

These are worthy goals, but will not be covered by this GEP.

  1. Changes to the existing mechanisms for edge or passthrough TLS termination
  2. Providing a mechanism to decorate multiple route instances
  3. TLSRoute use cases
  4. UDPRoute use cases
  5. Controlling certificates used by more than one workload (#6 in Gateway API TLS Use Cases)
  6. Client certificate settings used in TLS from external clients to the Listener (#7 in Gateway API TLS Use Cases)
  7. Providing a mechanism for the cluster operator to override gateway to backend TLS settings.

Already Solved TLS Use Cases

These are worthy goals that are already solved and thus will not be modified by the implementation.

  1. Termination of TLS for HTTP routing (#1 in Gateway API TLS Use Cases)
  2. HTTPS passthrough use cases (#2 in Gateway API TLS Use Cases)
  3. Termination of TLS for non-HTTP TCP streams (#3 in Gateway API TLS Use Cases)

Overview - what do we want to do?

Given that the current ingress solution specifies edge TLS termination (from the client to the gateway), and how to handle passthrough TLS (from the client to the backend pod), this proposed ingress solution specifies TLS origination to the backend (from the gateway to the backend pod). As mentioned, this solution satisfies the use case in which the backend pod has its own certificate and the gateway client needs to know how to connect to the backend pod.

image depicting TLS termination types

Gateway API is missing a mechanism for separately providing the details for the backend TLS handshake, including:

  • use of TLS
  • destination CA (certificate authority) or CA bundle
  • SANs for validating upstream service (server authentication)
  • client certificate of the gateway (client authentication)

Purpose - why do we want to do this?

This proposal is very tightly scoped because we have tried and failed to address this well-known gap in the API specification. The lack of support for this fundamental concept is holding back Gateway API adoption by users that require a solution to the use case. One of the recurring themes that has held up the prior art has been interest related to service mesh and as such this proposal focuses only on the ingress use case to avoid contention there. Another reason for the tight scope is that we have been too focused on a generic representation of everything that TLS can do, which covers too much ground to address in a single GEP.

The history of backend TLS

Work on this topic has spanned over three years, as documented in our repositories and other references, and summarized below.

In January 2020, in issue TLS Termination Policy #52, this use case was discussed. The discussion ended after being diverted by KEP: Adding AppProtocol to Services and Endpoints #1422, which was implemented and later reverted.

In February 2020, HTTPRoute: Add Reencrypt #81 added the dataplane feature as “reencrypt”, but it went stale and was closed in favor of the work done in the next paragraph, which unfortunately didn’t implement the backend TLS termination feature.

In August 2020, it resurfaced with a comment on this pull request: tls: introduce mode and sni to cert matching behavior. The backend TLS termination feature was deferred at that time. Other TLS discussion was documented in [SIG-NETWORK] TLS config in service-apis , a list of TLS features that had been collected in June 2020, itself based on spreadsheet Service API: TLS related issues.

In December 2021, this was discussed as a beta blocker in issue Docs mentions Reencrypt for HTTPRoute and TLSRoute is available #968.

A March 2022 issue documents another request for it: Provide a way to configure TLS from a Gateway to Backends #1067

A June 2022 issue documents a documentation issue related to it: Unclear how to specify upstream (webserver) HTTP protocol #1244

A July 2022 discussion Specify Re-encrypt TLS Termination (i.e., Upstream TLS) #1285 collected most of the historical context preceding the backend TLS termination feature, with the intention of collecting evidence that this feature is still unresolved. This was followed by GEP: Describe Backend Properties #1282.

In August 2022, Add Provisional GEP-1282 document #1333 was created, and in October 2022, a GEP update with proposed implementation GEP-1282 Backend Properties - Update implementation #1430 was followed by intense discussion and closed in favor of a downsize in scope.

In January 2023 we closed GEP-1282 and began a new discussion on enumerating TLS use cases in Gateway API TLS Use Cases, for the purposes of a clear definition and separation of concerns. This GEP is the outcome of the TLS use cases #4 and #5 in Gateway API TLS Use Cases as mentioned in the Goals section above.

API

To allow the gateway client to know how to connect to the backend pod, when the backend pod has its own certificate, we implement a metaresource that is already mentioned as an example in GEP-713: Metaresources and PolicyAttachment, and as a hypothetical in the Policy Attachment documentation.

This metaresource is named TLSConnectionPolicy. In this document, because naming is hard, we chose to retain the name TLSConnectionPolicy to advertise alignment with a previously discussed naming choice, but a new name may be substituted without blocking acceptance of the content of the API change.

The selection of the applicable Gateway API persona is important in the design of this proposal, because each persona is assigned a role which handles specific Gateway API resources. TLSConnectionPolicy is used by the application developer Gateway API persona to convey client certificate settings used in the TLS handshake from Gateway to backend, because this persona handles resources involved with application access and configuration, such as Routes and Services. Choosing any other role would move the application-related responsibility from the application developer role to that role, which violates the role-oriented design principle of Gateway API. As mentioned in Non-goal #7, providing a mechanism for the cluster operator gateway role to override gateway to backend TLS settings is not covered here, but can be addressed in a future update should the need arise.

TLSConnectionPolicy is defined as a Direct Policy Attachment without defaults or overrides, applied to a Service that accesses the backend in question, where the TLSConnectionPolicy resides in the same namespace as the Service it is applied to. The TLSConnectionPolicy and the Service must reside in the same namespace in order to prevent the complications involved with sharing trust across namespace boundaries. By choosing the Service resource rather than the Route resource, we can reuse the same TLSConnectionPolicy for all the different Routes that might point to this Service. For the use case where certificates are stored in their own namespace, users may create Secrets and use ReferenceGrants for a TLSConnectionPolicy-to-Secret binding.

In the API defined here, the definition of TrustedCACertRefs follows a convention established by TLSRoute in https://github.com/kubernetes-sigs/gateway-api/blob/main/apis/v1beta1/gateway_types.go#L340

One of the areas of concern for this API is that we need to indicate how and when the API implementations should use the backend destination certificate authority. This solution proposes, as introduced in GEP-713, that the implementation should watch the connections to a specific port on a specified targetRef (such as a Service), and if the port and Service match a TLSConnectionPolicy, then assume the connection is TLS, and verify that the targetRef’s certificate can be validated by the provided trusted CA certificates before the connection is made. On the question of how to signal that there was a failure in the certificate validation, this is left up to the implementation to return a response error that is appropriate, such as one of the HTTP error codes: 400 (Bad Request), 401 (Unauthorized), 403 (Forbidden), or other signal that makes the failure sufficiently clear to the requester without revealing too much about the transaction, based on established security requirements.

Not covered here, but possible to add would be additional configuration options mentioned in SIG-NET Gateway API: TLS to the K8s.Service/Backend. All of these are currently implementation-dependent, with the following recommended defaults:

  • Visibility: the visibility level could be all or none, indicating that if the connection failed due to validation failures, it would drop the connection silently if the visibility level were none, and report an error if the visibility level were all. This is left as implementation-dependent.
  • Server Name Indication: enables passing of the server name through server name indication, in the TLS transaction, to assist with selection of certificates when several hosts share the same IP address. (default to enabled)
  • Subject Alternative Name certificates: enable the use of a single certificate that can serve multiple domains. (default to enabled)
  • Version: specifies the minimum TLS version that the connection may use (default TLSv1.2)
  • Ciphers: specifies enabled ciphers to be used in TLS exchanges. (default to align with TLS versions 1.1, 1.2, and/or 1.3 as described in RFC4346 TLS 1.1, RFC 5246 TLS 1.2, and RFC 8446 TLS 1.3.
  • Version: specifies the minimum TLS version that the connection may use (default TLSv1.2)
  • Visibility: the visibility level could be all or none, indicating that if the connection failed due to validation failures, it would drop the connection silently if the visibility level were none, and report an error if the visibility level were all. (default to all)
// TLSConnectionPolicy provides a way to publish TLS configuration
// that enables a gateway client to connect to a backend pod.
type TLSConnectionPolicy struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    // Spec defines the desired state of TLSConnectionPolicy.
    Spec TLSConnectionPolicySpec `json:"spec"`

    // Status defines the current state of TLSConnectionPolicy.
    Status TLSConnectionPolicyStatus `json:"status,omitempty"`
}

// TLSConnectionPolicySpec defines the desired state of
// TLSConnectionPolicy.
// Note: there is no Override or Default policy configuration.
type TLSConnectionPolicySpec struct {
    // TargetRef identifies an API object to apply policy to.
    TargetRef gatewayv1a2.PolicyTargetReference `json:"targetRef"`

    // TLS contains TLS connection policy configuration.
    TLS *TLSConnectionPolicyConfig `json:”tls”`
}

// TLSConnectionPolicyConfig contains TLS connection policy configuration.
type TLSConnectionPolicyConfig struct {
    // TrustedCACertRefs contains one or more references to
    // Kubernetes objects that contain TLS certificates, which are
    // used to establish a TLS handshake between the gateway and
    // backend pod.
    //
    // A single TrustedCACertRef to a Kubernetes Secret has "Core"
    // support.  Implementations MAY choose to support attaching
    // multiple certificates to a backend, but this behavior is 
    // implementation-specific.
    //
    // References to a resource in a different namespace are 
    // invalid.
    // 
    // This field is required for any TLSConnectionPolicyConfig.
    // 
    // Support: Core - A single reference to a Kubernetes Secret of type kubernetes.io/tls
    //
    // Support: Implementation-specific (More than one reference or other resource types)
    //
    // +kubebuilder:validation:MaxItems=64
    // +kubebuilder:validation:MinItems=1
    TrustedCACertRefs []SecretObjectReference `json:”trustedCACertRefs”`

    // Port is the network port that the implementation watches to
    // know if the connection should be TLS and the targetRef’s
    // certificate should be validated by the certs in TrustedCACertRefs
    // If empty, then all ports for the targetRef are watched to know
    // if the connection should be TLS, the targetRef’s certificate
    // should be validated by the certs in TrustedCACertRefs, and a
    // status delivered in the response for validation failures.
    Port PortNumber `json:port,omitempty`
}

// TLSConnectionPolicyStatus defines the observed state of TLSConnectionPolicy.
type TLSConnectionPolicyStatus struct {
    // Conditions describe the current conditions of the TLSConnectionPolicy.
    //
    // Implementations should prefer to express TLSConnectionPolicy
    // conditions using the `TLSConnectionPolicyConditionType` and
    // `TLSConnectionPolicyConditionReason` constants so that 
    // operators and tools can converge on a common vocabulary to 
    // describe TLSConnectionPolicy state.
    // Known condition types are:
    // 
    // * “Accepted”
    // 
    // +optional
    // +listType=map
    // +listMapKey=type
    // +kubebuilder:validation:MaxItems=8
    // +kubebuilder:default={type: "Accepted", status: "Unknown", reason:"Pending", message:"Waiting for validation", lastTransitionTime: "1970-01-01T00:00:00Z"}
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

// TLSConnectionPolicyConditionType is the type of a condition used
// as a signal by TLSConnectionPolicy.  This type should be used with
// the TLSConnectionPolicyStatus.Conditions field.
type TLSConnectionPolicyConditionType string

//  TLSConnectionPolicyConditionReason is a reason that explains why a
// particular TLSConnectionPolicyConditionType was generated.
type TLSConnectionPolicyConditionReason string

const (
    // This condition indicates that the TLSConnectionPolicy has been
    // accepted as valid.
    // Possible reason for this condition to be True is:
    // * “Validated”
    // Possible reasons for this condition to be False are:
    // * “Invalid”
    // * “Pending”
    TLSConnectionPolicyConditionAccepted TLSConnectionPolicyConditionType =Accepted// This reason is used with the “Accepted” condition when the condition is true.
    TLSConnectionPolicyReasonAccepted TLSConnectionPolicyConditionReason =Valid// This reason is used with the “Accepted” condition when the TLSConnectionPolicy is invalid, e.g. crossing namespace boundaries.
    TLSConnectionPolicyReasonInvalid TLSConnectionPolicyConditionReason =Invalid// This reason is used with the “Accepted” condition when the TLSConnectionPolicy is pending validation.
    TLSConnectionPolicyReasonPending TLSConnectionPolicyConditionReason =Pending”
)

How a client behaves

This table describes the effect that a TLSConnectionPolicy has on a Route. There are only two cases where the TLSConnectionPolicy will signal a Route to connect to a backend using TLS, an HTTPRoute with a backend that is targeted by a TLSConnectionPolicy, either with or without listener TLS configured. (There are a few other cases where it may be possible, but is purposely marked “not supported” due to a desire for less confusion on the assigned purpose of each of the protocol-affiliated types of Routes.)

Route Type Gateway Config Backend is targeted by a TLSConnectionPolicy? Connect to backend with TLS?
HTTPRoute Listener tls Yes Yes
HTTPRoute No listener tls Yes Yes
HTTPRoute Listener tls No No
HTTPRoute No listener tls No No
TLSRoute Listener Mode: Passthrough Yes No
TLSRoute Listener Mode: Terminate Yes Not supported
TLSRoute Listener Mode: Passthrough No No
TLSRoute Listener Mode: Terminate No No
TCPRoute Listener TLS Yes Not supported
TCPRoute No listener TLS Yes Not supported
TCPRoute Listener TLS No No
TCPRoute No listener TLS No No
UDPRoute Listener TLS N/A No
UDPRoute No listener TLS N/A No
UDPRoute Listener TLS N/A No
UDPRoute No listener TLS N/A No
GRPCRoute N/A N/A No

Request Flow

One additional step would be added to the typical client/gateway API request flow for a gateway implemented using a reverse proxy. This is shown as step 6 below.

  1. A client makes a request to http://foo.example.com.
  2. DNS resolves the name to a Gateway address.
  3. The reverse proxy receives the request on a Listener and uses the Host header to match an HTTPRoute.
  4. Optionally, the reverse proxy can perform request header and/or path matching based on match rules of the HTTPRoute.
  5. Optionally, the reverse proxy can modify the request, i.e. add/remove headers, based on filter rules of the HTTPRoute.
  6. (New) Optionally, the reverse proxy can determine the outcome of verifying the cert served by the backend, based on backendRef rules of the HTTPRoute.
  7. Lastly, the reverse proxy forwards the request to one or more objects, i.e. Service, in the cluster based on backendRefs rules of the HTTPRoute.

Alternatives

Most alternatives are enumerated in the section on the history of backend TLS above. A couple of additional alternatives are also listed here.

  1. Expand BackendRef, which is already an expansion point. At first, it seems logical that since listeners are handling the client-gateway certs, BackendRefs could handle the gateway-backend certs. However, when multiple Routes to target the same Service, there would be unnecessary copying of the BackendRef every time the Service was targeted. As well, there could be multiple bBackendRefs with multiple rules on a rRoute, each of which might need the gateway-backend cert configuration so it is not the appropriate pattern.
  2. Extend HTTPRoute to indicate TLS backend support. Extending HTTPRoute would interfere with deployed implementations too much to be a practical solution.
  3. Add a new type of Route for backend TLS. This is impractical because we might want to enable backend TLS on other route types in the future, and because we might want to have both TLS listeners and backend TLS on a single route.

Prior Art

TLS from gateway to backend for ingress exists in several implementations, and was developed independently.

Istio Gateway supports this with a DestinationRule:

  • A secret representing a certificate/key pair, where the certificate is valid for the route host
  • Set Gateway spec.servers[].port.protocol: HTTPS, spec.servers[].tls.mode=SIMPLE, spec.servers[].tls.credentialName
  • Set DestinationRule spec.trafficPolicy.tls.mode: SIMPLE

Ref: Istio / Understanding TLS Configuration and Istio / Destination Rule

OpenShift Route (comparable to GW API Gateway) supports this with the following route configuration items:

  • A certificate/key pair, where the certificate is valid for the route host
  • A separate destination CA certificate enables the Ingress Controller to trust the destination’s certificate
  • An optional, separate CA certificate that completes the certificate chain

Ref: Secured routes - Configuring Routes | Networking | OpenShift Container Platform 4.12

Contour supports this from Envoy to the backend using:

  • An Envoy client certificate
  • A CA certificate and SubjectName which are both used to verify the backend endpoint’s identity
  • Kubernetes Service annotation: projectcontour.io/upstream-protocol.tls

Ref: Upstream TLS

GKE supports a way to encrypt traffic to the backend pods using:

  • AppProtocol on Service set to HTTPS
  • Load balancer does not verify the certificate used by backend pods

Ref: Secure a Gateway

Emissary supports encrypted traffic to services

  • In the Mapping definition, set https:// in the spec.service field
  • A spec.tls in the Mapping definition, with the name of a TLSContext
  • A TLSContext to provide a client certificate, set minimum TLS version support, SNI

Ref: TLS Origination

NGINX implementation through CRDs (Comparable to Route or Policy of Gateway API) supports both TLS and mTLS

  • In the Upstream section of a VirtualServer or VirtualServerRoute (equivalent to HTTPRoute) there is a simple toggle to enable TLS. This does not validate the certificate of the backend and implictly trusts the backend in order to form the SSL tunnel. This is not about validating the certificate but obfuscating the traffic with TLS/SSL.
  • A Policy attachment can be provided when certification validation is required that is called egressMTLS (egress from the proxy to the upstream). This can be tuned to perform various certificate validation tests. It was created as a Policy becuase it implies some type of AuthN/AuthZ due to the additional checks. This was also compatible with Open Service Mesh and NGINX Service Mesh and removed the need for a sidecar at the ingress controller.
  • A corresponding 'IngressMTLS' policy also exists for mTLS verification of client connections to the proxy. The Policy object is used for anything that implies AuthN/AuthZ.

Ref: Upstream.TLS Ref: EgressMTLS Ref: IngressMTLS

Open Questions

This section is to record issues that were warranted for discussion in the API section before this GEP moves out of Provisional status.

  1. Bowei recommended that we mention the approach of cross-namespace referencing between Route and Service. Be explicit about using the standard rules with respect to attaching policies to resources. This is mentioned in the API section.
  2. Costin recommended that Gateway SHOULD authenticate with either a JWT with audience or client cert or some other means - so gateway added headers can be trusted, amongst other things. This is out of scope for this proposal, which centers around application developer persona resources such as HTTPRoute and Service.
  3. Costin mentioned we need to answer the question - is configuring the connection to a backend and TLS something the route author decides - or the backend owner? Same for SAN (Subject Alternative Name) certificates. The use of SAN certificates and the use of SNI, as a part of TLS, can be implementation-dependent, though the application can still reject any request or certificate that it doesn’t support, including requests with SNI or a certificate with SANs.

References

Gateway API TLS Use Cases https://gateway-api.sigs.k8s.io/geps/gep-713/ https://gateway-api.sigs.k8s.io/references/policy-attachment/ https://gateway-api.sigs.k8s.io/v1alpha2/guides/tls/ https://docs.nginx.com/nginx-ingress-controller/configuration/policy-resource/#egressmtls SIG-NET[Gateway API]: TLS to the K8s.Service/Backend https://serverfault.com/questions/807959/what-is-the-difference-between-san-and-sni-ssl-certificates GEP-713: Metaresources and PolicyAttachment Policy Attachment RFC4346 TLS 1.1 RFC 5246 TLS 1.2 RFC 8446 TLS 1.3