SignalR core: confusion about the usage of sticky sessions when scaling out the app #11678

EnricoMassone · 2019-06-28T10:19:42Z

Hi,

we are migrating an old ASP.NET MVC application written in .NET framework 4.7.1 to ASP.NET core 2.2

The process has been quite smooth until now, but we are facing some issues with the porting from the "old" signalr library for .NET framework 4.7.1 to the new .net core signalr library. This is a summary of our current environment:

asp.net core 2.2 MVC web application
Microsoft.AspNetCore.SignalR nuget package version 1.1.0
aspnet-signalr javascript library version 1.1.4
deployment in a test environment using Azure app service
testing the web application by using Google Chrome version 75.0.3770.100
the web sockets are enabled in our Azure app service
the ARR affinity is disabled in our Azure app service
the client side application is built with AngularJS version 1.6.5
we only need to push notifications from the server to the clients, we don't need to do the reverse

When the app service plan is scaled out to only 1 instance everything works like a charm. The javascript client connects to the web application by using web sockets as a transmission protocol , the web server notifications are received and the behavior of the client side application doesn't break when browser page is reloaded. The application doesn't break even if web sockets are turned off: in that case signalr downgrades to SSE events and everything works as expected. In this configuration (only 1 instance of the app service plan) turning on and off the ARR affinity doesn't have any impact on the application behavior: in any case everything works fine.

The troubles begin when we scale out the app service plan to 2 instances.

We are aware of the issues pointed out in this guide related to the fact that each instance of the web application only knows the clients connected to it, while being completely unaware of the clients connected to the other instances of the web application. In our case this is not an issue, because each instance of our web application receives messages from a service bus and we have the guarantee that each message is received by all the existing instances of the web application, so the fact that each instance of the web application is only able to notify its own clients is not a problem for us (in any case, all the existing clients will be notified by the node to which they are connected via signalr). For this reason we don't use neither Azure SignalR Service nor the Redis backplane. The issues we are experiencing when we scale out are probably related to the fact that we have the ARR affinity disabled, because our application is completely stateless and the old version of SignalR didn't require the ARR affinity (we have always had the ARR affinity disabled and we didn't experience any issue with signal r when we scaled out the old app built with .NET framework 4.7.1).

The behaviour we get when we scale out is the following: sometimes the clients works fine (they connect to the backend by using web sockets and they receive notifications as expected), sometimes doing a page refresh of the browser breaks the behavior (after the page reloads signalr stops working), some other times signalr doesn't work since the first page load in the browser (no page reload is needed in order to break the application). All of this is completely random: sometimes it happens and some other times it doesn't happen. There is not a clear error pattern.

Interestingly, we get two different types of error:

sometimes the client side application is unable to use web sockets, so it decides to downgrade to SSE events and is able to work with SSE events (but this is unexpected, because web sockets are enabled server side and the Google chrome version we are using supports web sockets, so we expect that web sockets should always be used). I'll refer to this case later as scenario A.
other times, all the attempts to negotiate a communication protocol fail, and the client side app is unable to receive notifications from the server. I'll refer to this case later as scenario B.

Here are the errors that we get in the Google chrome console in scenario A:

Here are the errors that we get in the Google chrome console in scenario B:

Some guides online (for instance this blog post) seems to state that the usage of the ARR affinity is not required when the web sockets are enabled, regardless of the number of instances of the web application. Put another way, it seems that the ARR affinity is required only when communication protocols other than web sockets are used.

So, here is my question: is the ARR affinity always required when the app is scaled out to more instances, regardless of the type of communication protocol used, and so even when both the server and the client are able to use web sockets ?

Thanks for helping

Enrico

The text was updated successfully, but these errors were encountered:

davidfowl · 2019-06-28T13:31:41Z

So, here is my question: is the ARR affinity always required when the app is scaled out to more instances, regardless of the type of communication protocol used, and so even when both the server and the client are able to use web sockets ?

Sorta. By default it's required because SignalR will at the very least make 2 requests regardless of the transport. First the negotiate request to determine what transports the server supports, second is the attempt to connect to that transport. It may make more requests to other transports as it tries to fallback. This all requires sticky sessions to work as SignalR Core requires the transport request to go back to the same server that negotiate requests was made on. Without ARR affinity it'll fail with the 404 you're seeing above. SignalR stores local state about a connection on the machine the connection was made to.

Further more, for non websocket transports sending from client to server requires sticky sessions (I know you don't care about this scenario). It also matters for long polling because even receiving from server to client, long polling needs to make multiple requests and those need to go back to the same server where that state is stored.

Now for the more nuanced answer: SignalR core does support a direct to websocket connection that avoids the negotiate requests (via an option on the client side called skipNegotiation). This means you know the server supports websockets and you don't want to try falling back. You can give it a try and see if it works for your scenario.

PS: The Azure SignalR service handles this for you so you can avoid having to make your web tier more stateful.

EnricoMassone · 2019-06-28T14:23:05Z

So, here is my question: is the ARR affinity always required when the app is scaled out to more instances, regardless of the type of communication protocol used, and so even when both the server and the client are able to use web sockets ?

Sorta. By default it's required because SignalR will at the very least make 2 requests regardless of the transport. First the negotiate request to determine what transports the server supports, second is the attempt to connect to that transport. It may make more requests to other transports as it tries to fallback. This all requires sticky sessions to work as SignalR Core requires the transport request to go back to the same server that negotiate requests was made on. Without ARR affinity it'll fail with the 404 you're seeing above. SignalR stores local state about a connection on the machine the connection was made to.

Further more, for non websocket transports sending from client to server requires sticky sessions (I know you don't care about this scenario). It also matters for long polling because even receiving from server to client, long polling needs to make multiple requests and those need to go back to the same server where that state is stored.

Now for the more nuanced answer: SignalR core does support a direct to websocket connection that avoids the negotiate requests (via an option on the client side called skipNegotiation). This means you know the server supports websockets and you don't want to try falling back. You can give it a try and see if it works for your scenario.

PS: The Azure SignalR service handles this for you so you can avoid having to make your web tier more stateful.

Hi,

thanks for replying.

So, summarizing, there are three options available:

switching ARR affinity on: this way the app will work regardless of the communication protocol used
leave ARR affinity off and skip the negotiation from the client side: this way it should work, but we will lose the ability to negotiate the protocol. The only supported protocol with this configuration is web sockets.
use the Azure SignalR service and offload to it the connection handling

Unfortunately we can't adopt the third solution, because several of our customers have their own infrastructure or they want to run their installation in clouds other than Azure. The only viable solutions for our applications are the first and the second.

The main concern with the second solution is that, if for any reason the web sockets are not available in some customer networks, then our app won't be able to work properly. We have to consider quite a wide range of different scenarios.

I have a question related to the first solution (switching the ARR affinity on).

Some customers of ours have multi data center installation in Azure. We basically deploy our application in two different data centers and we use a traffic manager in order to route the client requests to the proper data center.

Based on my understanding, the ARR affinity available in Azure app service works at the load balancer level using a cookie. The traffic manager sits in front of the load balancers of the two data centers. How can I get sticky sessions in such a scenario ? Is the Azure ARR affinity able to handle such a scenario ?

EnricoMassone · 2019-07-02T11:39:06Z

Hi,

I tried out the solution of skipping negotiation from the JS client and I confirm that it works fine, even when the web application is scaled out to multiple instances and ARR affinity is switched off. That's great for us.

Just as a reference for the readers, the connection options required are the followings:

const options = {
      skipNegotiation: true,
      transport: 1 // WebSockets
    };

With regard to my question on traffic manger scenario, I found this documentation which seems to confirm that sticky sessions are not available when running behind the Azure traffic manager.
Quote from the documentation:

Therefore, Traffic Manager has no way to track individual clients and cannot implement 'sticky' sessions.

Using sticky sessions instead of skipping negotiation could be helpful for us in scenarios where, for any reason, web socket protocol is not available from the users side.

Do you confirm that there is no way to have sticky sessions in our multi data center installations ?

Thanks for helping

Enrico

RyanHill-MSFT · 2019-07-02T20:39:25Z

@EnricoMassoneDeltatre as far as I know, bouncing between multi data center installations will have traffic routed only at the traffic manager level. And since that routing as you noted earlier is purely DNS based, there won't be any knowledge of any session state. In cases where your product can't make use of web sockets, I think you'll have to enable AARAffinity.

Hope that helps even though I know it's more than likely not answer you're looking for.

EnricoMassone · 2019-07-03T07:33:12Z

@EnricoMassoneDeltatre as far as I know, bouncing between multi data center installations will have traffic routed only at the traffic manager level. And since that routing as you noted earlier is purely DNS based, there won't be any knowledge of any session state. In cases where your product can't make use of web sockets, I think you'll have to enable AARAffinity.

Hope that helps even though I know it's more than likely not answer you're looking for.

Hi @RyanHill-MSFT thanks for replying.

We will go with the web sockets only option (avoiding protocol negotiation), because we need to support the multi data center scenario. The best compromise seems to be requiring the web sockets support as a prerequisite to install our product.

Unfortunately this change in the behavior of signal-r from the old version to the .net core version (the old one was stateless, while the new one is stateful) can be quite painful in the case of cloud applications. To be honest, this limitation affects other similar products such as socket.io as documented here, so this design seems to be reasonable.

Today I noticed that the sticky sessions requirement for scaling out is actually documented here; I didn't noticed it before opening this issue.

Thanks for the help !

EnricoMassone changed the title ~~SignalR: confusion about usage of sticky sessions when scaling out~~ SignalR core: confusion about the usage of sticky sessions when scaling out the app Jun 28, 2019

BrennanConroy added the area-signalr Includes: SignalR clients and servers label Jun 28, 2019

EnricoMassone closed this as completed Jul 3, 2019

ghost locked as resolved and limited conversation to collaborators Dec 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SignalR core: confusion about the usage of sticky sessions when scaling out the app #11678

SignalR core: confusion about the usage of sticky sessions when scaling out the app #11678

EnricoMassone commented Jun 28, 2019 •

edited

Loading

davidfowl commented Jun 28, 2019

EnricoMassone commented Jun 28, 2019 •

edited

Loading

EnricoMassone commented Jul 2, 2019 •

edited

Loading

RyanHill-MSFT commented Jul 2, 2019

EnricoMassone commented Jul 3, 2019 •

edited

Loading

SignalR core: confusion about the usage of sticky sessions when scaling out the app #11678

SignalR core: confusion about the usage of sticky sessions when scaling out the app #11678

Comments

EnricoMassone commented Jun 28, 2019 • edited Loading

davidfowl commented Jun 28, 2019

EnricoMassone commented Jun 28, 2019 • edited Loading

EnricoMassone commented Jul 2, 2019 • edited Loading

RyanHill-MSFT commented Jul 2, 2019

EnricoMassone commented Jul 3, 2019 • edited Loading

EnricoMassone commented Jun 28, 2019 •

edited

Loading

EnricoMassone commented Jun 28, 2019 •

edited

Loading

EnricoMassone commented Jul 2, 2019 •

edited

Loading

EnricoMassone commented Jul 3, 2019 •

edited

Loading