Skip to content

Styx Data Path

Mikko Karjalainen edited this page Apr 2, 2020 · 8 revisions

This document discusses Styx forwarding path for HTTP requests. Writing it here for the benefit of new developers.

Specifically we will cover the following:

  • Styx live request/response messages
  • Content body publishers
  • Http Pipeline Handler

Request/Response Messages

Styx is a cut-through proxy (https://en.wikipedia.org/wiki/Cut-through_switching). It begins forwarding the received message as soon as the HTTP headers are received. It won’t wait for the message body. The body will stream through later as soon as it is received.

Styx LiveHttpRequest and LiveHttpResponse message types are designed for the cut-through proxying. They expose all HTTP headers, but the content is available asynchronously via ByteStream publisher:

class LiveHttpRequest {
    ..
    public ByteStream body() { ... }
    ..
}

The ByteStream is just a reactive streams Publisher:

public class ByteStream implements Publisher<Buffer> { … }

A LiveHttpRequest propagates synchronously through Styx Core. It is passed on the call stack, from a handler to handler, and from an interceptor to an interceptor. But in the opposite direction, a LiveHttpResponse comes back as an asynchronous reactive Publisher event.

Both request and response body always flows through asynchronously via ByteStream publisher.

ByteStream must support flow control. Without flow control Styx will ingest too much data at once and run out of memory. We will discuss flow control in depth later in this document.

ByteStream Content Publishers

A ByteStream is a Reactive Streams Publisher, and as such it supports non blocking back pressure. There are other good sources of information about reactive back pressure protocol, and I won’t go into details here. But in a nutshell:

  • The Subscriber requests for N events.
  • The Publisher will emit up to the requested N events, and never more than the requested N events.
  • When Publisher produces or receive more than N events, it must discard, queue, or otherwise stop emitting the excess events until the subscriber requests for more.

Within Styx there are few ByteStream consumers with different back pressure characteristics. But because the ByteStream is consumable with standard Reactive Streams APIs, we aim to keep the underlying implementation flexible enough to cater for different use cases. The consumers within Styx are:

  • HttpResponseWriter - Consumes the response body in order to proxy it. That is, to send it across the network to the remote recipient.
  • HttpRequestOperation - as above, but for the request body.
  • HTTP Content Aggregator - Aggregates the entire request/response body to a one single buffer. Therefore it disables the back pressure content, and consumes all at once.
  • Intermediate handlers and interceptors. Although usually they are neutral from back pressure point of view.

Flow Control

When the ByteStream is sourced from a Netty channel (or more generally from a Network socket), it can't discard user data packets. Instead it queues all excess data until its consumer (reactive streams Subscriber) is able to accept more.

Queuing will become a problem when the upstream or downstream is significantly faster than another. Consider a slow client downloading a large, say a 1GB file from a fast server. The file would end up sitting in Styx receive queue until the client gradually consumes it. We can't take many such clients before running out of memory:

  • Styx will exhaust Netty direct memory.
  • Or it will exhaust JVM heap. Normally we need less direct memory than heap.

Styx implements Flow Control to prevent this problem. Flow control adjusts the upstream and downstream speeds to minimise the need for queuing. This is how it works.

  • ByteStream data producer queues all excess data packets.
  • It suspend Netty channel by setting its autoRead property to false. This stops Netty from issuing more channelRead events until we ask for more. This then fills the TCP receive window, and eventually slows down the remote sender.
  • Reactive request()s from a subscriber will dequeue some queued content. When the queue is exhausted, Styx asks Netty for more data.

Ultimately ByteStream subscriber uses back pressure requests to control the rate at which publisher produces events.

Styx HttpResponseWriter (and request writer) consumes ByteStream one buffer at a time. It requests for the next one only after the preceding chunk has successfully been written to the Netty channel.

FlowControllingHttpContentProducer

The ByteStream is non-opinionated of its underlying data source. You can create a ByteStream from Java String like so:

ByteStream.from("Hello!", UTF_8);

Or from another Publisher:

Publisher p = createPublisher(...);
new ByteStream(p);

The FlowControllingPublisher and FlowControllingHttpContentProducer classes implement a reactive Publisher to source ByteStream content from Netty channel. The former is just a shallow shim to implement the Publisher interface, while the latter does all the heavy lifting.

The primary purpose of FlowControllingHttpContentProducer is to implement non blocking back pressure, and to enforce the Reactive Streams event order. It also mediates between asynchronous event sources:

  • Netty pipeline - produces content data (Buffer) events on the Netty IO thread.
  • Content subscriber - generates reactive back pressure requests that, and runs on a separate thread.

As it publishes "real time" networking events, it has some additional considerations:

  • It cannot lose any user data. All content chunks are queued until a subscriber subscribes.
  • It allows only one subscription. There is little point having a 2nd subscription half way through the stream.
  • It has to deal with network errors, timeouts etc.

We implemented FlowControllingHttpContentProducer using a Finite State Machine (FSM).

An FSM is a great tool to manage complexity in asynchronous environment. It makes it easy to reason about the content stream state. This would be notoriously difficult otherwise. Perhaps most importantly we can enumerate all possible state/transition pairs to ensure all possible scenarios are covered.

Other important scenarios to consider, are:

  • Subscriber times out
  • Network times out
  • Threading model, and lock free operation
  • TCP connection closure

The detailed design for the finite state machine is described here: https://github.com/HotelsDotCom/styx/wiki/Flow-Controlling-Content-Producer