Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 45 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,47 @@
# Routing Release

This repository is a [BOSH release](https://github.com/cloudfoundry/bosh) for
deploying Gorouter, TCP Routing, and other associated tasks that provide HTTP and TCP routing in Cloud Foundry foundations.

## Downloads

Our BOSH release is available on [bosh.io](http://bosh.io/releases/github.com/cloudfoundry/routing-release)
and on our [GitHub Releases page](https://github.com/cloudfoundry/routing-release/releases).

## Getting Help

If you have a concrete issue to report or a change to request, please create a
[Github issue on
routing-release](https://github.com/cloudfoundry/routing-release/issues/new/choose).

Issues with any related submodules
([Gorouter](https://github.com/cloudfoundry/gorouter), [Routing
API](https://github.com/cloudfoundry/routing-api), [Route
Registrar](https://github.com/cloudfoundry/route-registrar), [CF TCP
Router](https://github.com/cloudfoundry/cf-tcp-router)) should be created here
instead.

You can also reach us on Slack at
[cloudfoundry.slack.com](https://cloudfoundry.slack.com) in the
[`#cf-for-vms-networking`](https://cloudfoundry.slack.com/app_redirect?channel=C01ABMVNE9E).
channel.

## Contributing
See the [Contributing.md](./.github/CONTRIBUTING.md) for more information on how to contribute.

## Table of Contents
1. [Routing Operator Resources](#routing-operator-resources)
1. [Routing App Developer Resources](#routing-app-developer-resources)
1. [Routing Contributor Resources](#routing-contributor-resources)

---
## <a name="routing-operator-resources"></a> Routing Operator Resources
### <a name="high-availability"></a> High Availability

The TCP Router and Routing API are stateless and horizontally scalable. The TCP
Routers must be fronted by a load balancer for high-availability. The Routing
API depends on a database, that can be clustered for high-availability. For high
availability, deploy multiple instances of each job, distributed across regions
of your infrastructure.

### <a name="routing-api"></a> Routing API
For details refer to [Routing API](https://github.com/cloudfoundry/routing-api/blob/master/README.md).

### <a name="metrics"></a> Metrics
For documentation on metrics available for streaming from Routing components
through the Loggregator
[Firehose](https://docs.cloudfoundry.org/loggregator/architecture.html), visit
the [CloudFoundry
Documentation](http://docs.cloudfoundry.org/loggregator/all_metrics.html#routing).
You can use the [NOAA Firehose sample app](https://github.com/cloudfoundry/noaa)
to quickly consume metrics from the Firehose.
## <a name="routing-app-developer-resources"></a> Routing App Developer Resources

### <a name="session-affinity"></a> Session Affinity
For more information on how Routing release accomplishes session affinity, i.e.
sticky sessions, refer to the [Session Affinity document](docs/session-affinity.md).

### <a name="headers"></a> Headers
[X-CF Headers](/docs/x_cf_headers.md) describes the X-CF headers that are set on requests and responses inside of CF.

This repository is a [BOSH](https://github.com/cloudfoundry/bosh)
release for deploying Gorouter, TCP Routing, and other associated tasks
that provide HTTP and TCP routing in Cloud Foundry foundations.

For information on getting started with Cloud Foundry look at the docs
for [CF Deployment](https://github.com/cloudfoundry/cf-deployment).

# Docs

- [How To enable Quotas for TCP
Routing](./docs/03-how-to-enable-quota-tcp-routing.md)
- [How To Limit Trusted CAs for
Gorouter](./docs/03-how-to-limit-trusted-cas-for-gorouter.md)
- [How To Use NATS Client](./docs/03-how-to-use-nats-client.md)
- [How To Use Session
Affinity](./docs/03-how-to-use-session-affinity.md)
- [How To Use X-CF Headers](./docs/03-how-to-use-x-cf-headers.md)
- [(go1.15) Fixing Bad
Transfer-Encoding](./docs/04-go1.15-fixing-bad-transfer-encoding.md)
- [(go1.15) X.509 CommonName
deprecation](./docs/04-go1.15-x509-commonname-deprecation.md)
- [(go1.20) Multiple Expect 100-continue
responses](./docs/04-go1.20-multiple-expect-100-continue.md)
- [(routing-release-0.262.0) Healthy App Route
Pruning](./docs/04-routing-0.262.0-healthy-app-route-pruning.md)
- [(routing-release-0.277.0) TCP Router Port
Conflict](./docs/04-routing-0.277.0-tcp-router-port-conflict.md)
- [High Availability & Scaling](./docs/05-high-availbility-scaling.md)

# Contributing

See the [Contributing.md](./.github/CONTRIBUTING.md) for more
information on how to contribute.

# Working Group Charter

This repository is maintained by [App Runtime
Platform](https://github.com/cloudfoundry/community/blob/main/toc/working-groups/app-runtime-platform.md)
under `Networking` area.

> \[!IMPORTANT\]
>
> Content in this file is managed by the [CI task
> `sync-readme`](https://github.com/cloudfoundry/wg-app-platform-runtime-ci/blob/c83c224ad06515ed52f51bdadf6075f56300ec93/shared/tasks/sync-readme/metadata.yml)
> and is generated by CI following a convention.
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# Enable Quotas for TCP Routing
---
title: How To enable Quotas for TCP Routing
expires_at: never
tags: [routing-release]
---

# How To Enable Quotas for TCP Routing

As ports can be a limited resource in some environments, the default quotas in
Cloud Foundry for IaaS other than BOSH Lite do not allow reservation of route
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# Limiting Trusted CAs for Gorouter
---
title: How To Limit Trusted CAs for Gorouter
expires_at: never
tags: [routing-release]
---

# How to Limit Trusted CAs for Gorouter

This doc is for operators who want to use the new "only trust client CA certs" feature for gorouter to limit the CA certs that gorouter trusts.

Expand Down
10 changes: 8 additions & 2 deletions docs/nats-client.md → docs/03-how-to-use-nats-client.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# NATS Client
---
title: How To Use NATS Client
expires_at: never
tags: [routing-release]
---

# How To Use NATS Client

## What is it?

Expand Down Expand Up @@ -109,4 +115,4 @@ There are many scenarios where you may use `nats_client` to debug gorouter issue
- Set up large deployments with hundreds of apps and thousands of routes, without having to actually deploy all of them
- Simulate outages where large numbers of backends no longer respond (e.g. AZ outages)
- Simulate NATS outages where apps have moved elsewhere but gorouter didn't get the proper `router.unregister` message
- etc.
- etc.
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# Session Affinity
---
title: How To Use Session Affinity
expires_at: never
tags: [routing-release]
---

# How To Use Session Affinity

## What is it?

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# X-CF Headers
---
title: How To Use X-CF Headers
expires_at: never
tags: [routing-release]
---

# How To Use X-CF Headers

| Header | If a client provides this header will that affect routing decisions? | Value | More info |
| -- | -- | -- | -- |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
---
title: (go1.15) Fixing Bad Transfer-Encoding
expires_at: 2025-11-20
tags: [routing-release,1.15]
---

# (go1.15) Fixing Bad Transfer-Encoding

### Context
To read more about the problem that this doc is fixing, see the [release notes for routing-release 0.209.0](https://github.com/cloudfoundry/routing-release/releases/tag/0.209.0).

Expand All @@ -6,7 +14,7 @@ In general, to resolve this issue you need to review applications experiencing t

1. For streaming results from the server to a client, use Spring's built-in support for this. Using a ResponseBodyEmitter or a SseEmitter, you can easily stream content back to your clients and Spring & Tomcat will ensure that the transfer-encoding header is set correctly.

2. If you created a route service based on this example code, https://github.com/nebhale/route-service-example (no longer published due to this bug), you will be impacted as the example code was copying all response headers, include transfer-encoding headers from the proxied response to the client response (see the 👈 line below).
2. If you created a route service based in the following format, you will be impacted as the example code was copying all response headers, include transfer-encoding headers from the proxied response to the client response (see the 👈 line below).
```
return this.webClient
.method(request.getMethod())
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# golang 1.15 X.509 CommonName deprecation
---
title: (go1.15) X.509 CommonName deprecation
expires_at: 2027-03-07
tags: [routing-release,1.15]
---

# (go1.15) X.509 CommonName deprecation

This doc helps operators understand why certificates used by network Load
Balancers and the gorouter to serve TLS traffic must contain at least one
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# Known Issue: Multiple Expect 100-continue responses
---
title: (go1.20) Multiple Expect 100-continue responses
expires_at: 2028-06-29
tags: [routing-release,1.20]
---

# (go1.20) Multiple Expect 100-continue responses

## 🐛 Bug 1 Summary
Previously clients that sent a request with the header “Expect: 100-continue”
Expand Down Expand Up @@ -61,7 +67,7 @@ However, if the server app takes more than 1 second to send a response with stat
then there is a chance that the client will again get 2 responses with status code 100.

## 📖 RFC Says
[The RFC says that proxies like gorouter must not filter 1XX responses.]([url](https://datatracker.ietf.org/doc/html/rfc7231#section-6.2))
The RFC says that proxies like gorouter must not filter 1XX responses.[url](https://datatracker.ietf.org/doc/html/rfc7231#section-6.2)

> “A proxy MUST forward 1xx responses unless the proxy itself requested
> the generation of the 1xx response. For example, if a proxy adds an
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
# Issue
---
title: (routing-release-0.262.0) Healthy App Route Pruning
expires_at: 2028-05-08
tags: [routing-release,0.262.0]
---

# (routing-release-0.262.0) Healthy App Route Pruning

Cloud Foundry environments may experience many 503 errors with `x_cf_routererror:"no_endpoints"` even though all of the apps appear to up and functional without error.
There is an entry in the route table for the desired route, but there are no healthy endpoints available.

This is caused by Changes introduced in routing-release 0.262.0 to enable Gorouter to retry more types of idempotent requests to failed backends.

# How to detect if your app has experienced this bug
## How to detect if your app has experienced this bug

The following commands can be run on the gorouter log file to check for possible occurrences. What you need to look for is when data.error has a value of "context canceled" followed by a prune-failed-endpoint error.

Expand All @@ -25,6 +32,6 @@ egrep -A5 -Hn 27116dd3-f047-4a35-7873-e9ef7e1d3f71 ./router.d60e75ac-5459-49f8-b
./router.d60e75ac-5459-49f8-b029-543579d74ed0.2023-05-05-18-05-52/gorouter/gorouter.stdout.log-193-{"log_level":3,"timestamp":"2023-05-04T19:38:42.838565797Z","message":"prune-failed-endpoint","source":"vcap.gorouter.registry","data":{"route-endpoint":{"ApplicationId":"d45e4b57-3420-40b3-b13d-9ef0562d58c5",REDACTED,"process_instance_id":"2ea1596c-a745-4fdc-53a4-d885","process_type":"web","source_id":"d45e4b57-3420-40b3-b13d-9ef0562d58c5",REDACTED,"RouteServiceUrl":""}}}
```

# Resolution
## Resolution

To resolve this issue, upgrade routing-release to v0.266.0 or above.
Original file line number Diff line number Diff line change
@@ -1,8 +1,47 @@
# Known Issue: TCP Router Fails when Port Conflicts with Local Process
---
title: (routing-release-0.277.0) TCP Router Port Conflict
expires_at: 2028-08-17
tags: [routing-release,0.277.0]
---

<!-- vim-markdown-toc GFM -->

* [(routing-release-0.277.0) TCP Router Port Conflict](#routing-release-02770-tcp-router-port-conflict)
* [📑 Context](#-context)
* [🔥 Affected Versions](#-affected-versions)
* [✔️ Operator Checklist](#-operator-checklist)
* [🐛 Bug Variation 1 - TCP Router claims the port first](#-bug-variation-1---tcp-router-claims-the-port-first)
* [Symptoms](#symptoms)
* [Explanation](#explanation)
* [🐞 Bug Variation 2 - Internal component claims the port first](#-bug-variation-2---internal-component-claims-the-port-first)
* [Symptoms](#symptoms-1)
* [Explanation](#explanation-1)
* [🧰 Fix](#-fix)
* [Overview](#overview)
* [New Bosh Properties](#new-bosh-properties)
* [Runtime Check Details](#runtime-check-details)
* [Deploytime Check Details](#deploytime-check-details)
* [🗨️ FAQ](#-faq)
* [📝 <a name="list-of-ports"></a>Appendix A: Default System Component Ports](#-a-namelist-of-portsaappendix-a-default-system-component-ports)

<!-- vim-markdown-toc -->
# (routing-release-0.277.0) TCP Router Port Conflict

## 📑 Context

Each TCP route requires one port on the TCP Router VM. Ports for TCP routes are managed via [router groups](https://github.com/cloudfoundry/routing-api/blob/main/docs/api_docs.md#create-router-groups). Each router group has a list of `reservable_ports`.
The [Cloud Foundry documentation for "Enabling and Configuring TCP Routing"](https://docs.cloudfoundry.org/adminguide/enabling-tcp-routing.html#-modify-tcp-port-reservations) has the following warning and suggestions for valid port ranges:

> Do not enter reservable_ports that conflict with other TCP router instances or ephemeral port ranges. Cloud Foundry recommends using port ranges within 1024-2047 and 18000-32767 on default installations.

These port suggestions do not overlap with any ports used by system components.
However, there is nothing (until now) preventing users from expanding this range into ports that *do* overlap with ports used by system components.

This port conflict can result in two different buggy outcomes.

## 🔥 Affected Versions

* All versions of routing-release
* All versions of routing-release before 0.277.0

## ✔️ Operator Checklist
* [ ] Read this doc.
Expand All @@ -13,17 +52,6 @@
* [ ] Fix invalid router groups. See routing-api documentation [here](https://github.com/cloudfoundry/routing-api/blob/main/docs/api_docs.md#update-router-group).
* [ ] Re-run the check to make sure all router groups are valid. See how [here](#how-to-rerun).

## 📑 Context

Each TCP route requires one port on the TCP Router VM. Ports for TCP routes are managed via [router groups](https://github.com/cloudfoundry/routing-api/blob/main/docs/api_docs.md#create-router-groups). Each router group has a list of `reservable_ports`.
The [Cloud Foundry documentation for "Enabling and Configuring TCP Routing"](https://docs.cloudfoundry.org/adminguide/enabling-tcp-routing.html#-modify-tcp-port-reservations) has the following warning and suggestions for valid port ranges:

> Do not enter reservable_ports that conflict with other TCP router instances or ephemeral port ranges. Cloud Foundry recommends using port ranges within 1024-2047 and 18000-32767 on default installations.

These port suggestions do not overlap with any ports used by system components.
However, there is nothing (until now) preventing users from expanding this range into ports that *do* overlap with ports used by system components.

This port conflict can result in two different buggy outcomes.

## 🐛 Bug Variation 1 - TCP Router claims the port first

Expand Down Expand Up @@ -64,7 +92,7 @@ The fix for this issues focuses on preventing the creation of router groups that
* a runtime check for creating and updating router groups
* a deploytime check for exising router groups

These fixes are available in routing release XYZ+ (will update when released). If you cannot update at this time, you can fix your routing groups manually. See [here](#how-to-manually-fix) for instructions.
These fixes are available in routing release v0.277.0+. If you cannot update at this time, you can fix your routing groups manually. See [here](#how-to-manually-fix) for instructions.

### New Bosh Properties

Expand Down Expand Up @@ -227,7 +255,7 @@ Some of these ports are configurable and may not match what is running on your d
| 14823 | loggr-forwarder-agent | metrics.port | no | See bosh property [here](https://github.com/cloudfoundry/loggregator-agent-release/blob/acfbb6b015d897c11f715ac9e1a226eb5b96875c/jobs/loggr-forwarder-agent/spec#L51-L53) |
| 14824 | loggregator_agent | metrics.port | no | See bosh property [here](https://github.com/cloudfoundry/loggregator-agent-release/blob/acfbb6b015d897c11f715ac9e1a226eb5b96875c/jobs/loggregator_agent/spec#L78-L80). |
| 14829 | loggr-udp-forwarder | metrics.port | no | See bosh property [here](https://github.com/cloudfoundry/loggregator-agent-release/blob/acfbb6b015d897c11f715ac9e1a226eb5b96875c/jobs/loggr-udp-forwarder/spec#L44-L46). |
| 14830 | otel-collector | TBD | n/a | This port is used for the collector's metrics. This port was previously used by loggr-udp-forwarder, however it was disabled there. See bosh property [here](TBD). See [this issue](https://github.com/cloudfoundry/loggregator-agent-release/issues/44) for more historical information. |
| 14830 | otel-collector | TBD | n/a | This port is used for the collector's metrics. This port was previously used by loggr-udp-forwarder, however it was disabled there. See [this issue](https://github.com/cloudfoundry/loggregator-agent-release/issues/44) for more historical information. |
| 14920* | system-metrics-scraper | metrics_port | no | *This job does not run on TCP router or Gorouter! However you should not use it for an agent that will be deployed along side that job. See bosh property [here](https://github.com/cloudfoundry/system-metrics-scraper-release/blob/473caa08af286e617e7391111639a70846d35de0/jobs/loggr-system-metric-scraper/spec#L58-L60). |
| ~14921*~ | ~system-metrics-scraper~ | ~n/a~ | ~n/a~ | *This port was considered for a debug port, but it turns out it's in use by leadership-election which does not run on tcp-router. It is not reserved in TCP Router. See [this issue](https://github.com/cloudfoundry/system-metrics-scraper-release/issues/2) for more information. |
| 14922 | system-metrics-agent | debug_port | no | See bosh property [here](https://github.com/cloudfoundry/system-metrics-release/blob/4e22e11ba4d72c5bd6895b94a75d67c212cfaa22/jobs/loggr-system-metrics-agent/spec#L20) |
Expand Down
13 changes: 13 additions & 0 deletions docs/05-high-availbility-scaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: High Availability & Scaling
expires_at: never
tags: [routing-release]
---

# High Availability & Scaling

The TCP Router and Routing API are stateless and horizontally scalable. The TCP
Routers must be fronted by a load balancer for high-availability. The Routing
API depends on a database, that can be clustered for high-availability. For high
availability, deploy multiple instances of each job, distributed across regions
of your infrastructure.
Loading