Zulu 2.0 supports push messaging - i.e. sending messages from server to client. It supports two protocols, WebSockets and Server Sent Events (SSE) for sending push messages.
sample app demonstrates set up of both WebSockets as well as SSE to enable push messaging for Zuul.
Zuul Push server must authenticate each incoming push connection. You can plugin your own custom authentication into Zuul Push server. You can do so by extending the abstract
PushAuthHandler class and implementing its
doAuth() method. Please refer to
SamplePushAuthHandler as an example of how to do this.
Client Registration and Lookup
After successful authentication, Zuul Push registers each authenticated connection against the client or user identity so that it can be looked up later to send push message to that particular client or user. You can decide what goes into this identity by implementing
PushUserAuth interface and returning an instance of it from
doAuth() after successful authentication. Please refer to
SamplePushUserAuth as an example.
Each Zuul Push server maintains a local, in-memory registry of all the clients connected to it using
PushConnectionRegistry . For a single node push cluster this in-memory local registry is sufficient. In case of a multi-node push cluster, a second level, off-the box global datastore is needed to extend push registry beyond the single machine. In such case lookup for a particular client follows a two step process. First, you lookup the push server to which the specified client is connected in the off-the-box, global push registry. That lookup returns the push server which can then look up the actual client connection within its local, in-memory push registry.
You can integrate off-the-box, global push registry with Zuul Push by extending
PushRegistrationHandler and overriding its
registerClient() method. Zuul push allows you to plugin any datastore of your choice as the global push registry but for the best results the chosen datastore should support following features
- Low read latency
- TTL or automatic record expiry of some sort.
Having these features means your push cluster can horizontally scale to millions of push connections, if needed. Redis, Cassandra, Amazon DynamoDB are just few of many possible good choices for the global push registry datastore.
Accepting new Push connection
SampleSSEPushChannelInitializer demonstrate how to set up Netty channel pipeline for accepting incoming WebSocket and SSE connections respectively. These classes set up authentication and registration handlers for every new incoming connection based on the protocol being used.
Load balancers vs WebSockets and SSE
Push connections are different than normal request/response style HTTP connections in that they are persistent and long-lived. Once the connection is made, it is kept open by both client and server even when there are no requests pending. This throws off many popular load balancers which cut the connection after some period of inactivity. Amazon Elastic Load Balancers (ELB) and older versions of HAProxy and Nginx all have this issue. You have basically two choices to make your cluster work with your load balancers:
- Either use latest version of the load balancer that supports WebSocket proxying, like the latest version of HAProxy or Nginx or Application Load Balancer (ALB) instead of ELB in case of Amazon cloud, or
- Run your existing load balancer as a TCP load balancer at layer 4 instead of as HTTP load balancer doing layer 7 load balancing. Most load balancers - including ELBs - support a mode where they act as a TCP load balancer. In this mode they just proxy TCP packets back and forth without trying to parse or interpret any application protocol which generally fixes the issue.
You probably also need to increase the IDLE timeout value of your load balancer as the default value for IDLE timeout is usually in seconds and almost always insufficient for the typical long-lived, persistent and mostly idle push connections.
||Record expiry (TTL) of a client registration record in the global registry||1800 seconds|
||Randomization window for each client's max connection lifetime. Helps in spreading subsequent client reconnects across time||180 seconds|
||Number of seconds the server will wait for the client to close the connection before it closes it forcefully from its side||4 seconds|
If you use Netflix OSS Archaius module, you can change all of the above configuration options at runtime and they will take effect without having to restart the server.