Blazor signalr connection breaks after syncing large dom / dom element #9683

tn-5 · 2019-04-23T14:13:02Z

Describe the bug

On login in our application we load a logo which is displayed. Normally this is quite small but since it is user specific it sometimes can be large. With preview 4 the server connection is broken when this is too large. The client then "freezes" - i.e. does not respond to clicks. The clicks do generate websocket messages but no responses. After a timeout the auto reconnect establishes a new connection and everything is then ok (until the next large dom sync).

This worked fine in preview 3.

To Reproduce

I tried to reproduce with a small update to the template project but could not. Let me know if I should try again. It is 100% reproducible in our application.

I've artificially generated various size images and it seems to break when the image is between 25440 and 25450 bytes. Tracing in chrome the websocket message for this dom sync is between 43772 and 43784 bytes (working / non-working).

Please let me know if there is anything I can do to get more diagnostics on this for the case where it fails.

danroth27 · 2019-04-24T16:24:26Z

It sounds like you are hitting the buffer size limit in SignalR.

@anurse Is there any reason to not remove/increase the limit on the outbound buffer size?

analogrelay · 2019-04-24T17:02:04Z

Is there any reason to not remove/increase the limit on the outbound buffer size?

The main reason for the limit is to ensure that clients can't over-allocate server resources. I think I vaguely remember that this may not be as much of an issue in recent previews because of how Pipelines tracks the limit now (+ @davidfowl)

danroth27 · 2019-04-24T17:07:13Z

The main reason for the limit is to ensure that clients can't over-allocate server resources.

I can understand limiting the inbound buffer size for this reason, but doesn't the server completely control how much data it wants to send to the client?

davidfowl · 2019-04-24T17:49:07Z

Turns out in preview4 we shipped with a 32 MB limit right? Are you sure you're hitting that?

danroth27 · 2019-04-24T19:37:34Z

@tn-5 does anything show up in the server logs to indicate the nature of the error?

analogrelay · 2019-04-25T02:42:25Z

doesn't the server completely control how much data it wants to send to the client?

I'm having deja vu about this conversation. I feel like we talked about this before and removed the outbound limit. Or maybe Blazor configured it away... I can't find the issue though.

Turns out in preview4 we shipped with a 32 MB limit right? Are you sure you're hitting that?

This is true. Due to a typo we actually had a 32 MB limit in preview 4, so it doesn't seem too likely this would be the case.

BrennanConroy · 2019-04-25T03:11:59Z

doesn't the server completely control how much data it wants to send to the client?

Yes, so you can set the limit higher if you'd like.

tn-5 · 2019-04-25T04:22:45Z

After some more testing I'm convinced it is not related to a buffer size per se. At this point I'm thinking it is either a timing issue or something related to the actual dom diff code itself. I've tested very large dom syncs (total dom + individual component) - this all works 100%. In this specific instance the reproducible error happens when I request the data from an async api. I'll investigate further to see if it can be reproduced more reliably.

tn-5 · 2019-04-25T10:33:16Z

I've managed to reproduce this reliably with some small changes to the template project. it seems to be a time related issue linked to a large dom sync. The large dom makes some code take a bit longer and then the signalr messages sequences are different. It happens specifically when navigating from one page to another.

Trace detail

Working case:

trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[11]
      Sending payload: 75 bytes.
trce: Microsoft.AspNetCore.Components.Browser.Rendering.RemoteRenderer[101]
      Begin remote rendering of components on client bsz_lKfQEEbDuvRBtUAOmw.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[11]
      Sending payload: 86 bytes.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
      Message received. Type: Binary, size: 123, EndOfMessage: True.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
      Message received. Type: Binary, size: 93, EndOfMessage: True.
dbug: Microsoft.AspNetCore.SignalR.Internal.DefaultHubDispatcher[1]
      Received hub invocation: InvocationMessage { InvocationId: "", Target: "BeginInvokeDotNetFromJS", Arguments: [ 6, Microsoft.AspNetCore.Components.Server, NotifyLocationChanged, 0, ["http://localhost:49993/"] ], StreamIds: [  ] }.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
      Message received. Type: Binary, size: 26, EndOfMessage: True.
dbug: Microsoft.AspNetCore.Components.Server.Circuits.RemoteUriHelper[0]
      Location changed to 'http://localhost:49993/'.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[11]
      Sending payload: 86 bytes.

Failure case:

trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[11]
      Sending payload: 75 bytes.
trce: Microsoft.AspNetCore.Components.Browser.Rendering.RemoteRenderer[101]
      Begin remote rendering of components on client S5PSgzO0YLBrGnYHx1rSHQ.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[11]
      Sending payload: 86 bytes.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
      Message received. Type: Binary, size: 123, EndOfMessage: True.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
      Message received. Type: Binary, size: 93, EndOfMessage: True.
dbug: Microsoft.AspNetCore.SignalR.Internal.DefaultHubDispatcher[1]
      Received hub invocation: InvocationMessage { InvocationId: "", Target: "BeginInvokeDotNetFromJS", Arguments: [ 6, Microsoft.AspNetCore.Components.Server, NotifyLocationChanged, 0, ["http://localhost:49993/"] ], StreamIds: [  ] }.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[9]
      Message received. Type: Binary, size: 26, EndOfMessage: True.
trce: Microsoft.AspNetCore.Http.Connections.Internal.Transports.WebSocketsTransport[11]
      Sending payload: 86 bytes.
dbug: Microsoft.AspNetCore.Components.Server.Circuits.RemoteUriHelper[0]
      Location changed to 'http://localhost:49993/'.

You'll see in the failure case there is a websocketstransport message between the "invoke notifylocationchanged" and the "remoteurihelper - location changed to..." message.

Reproduce

dotnet new blazorserverside
Add large dom element to counter.razor
<img src="data:image/png;base64, @dummydata" alt="[LOGO]" />
Initialise to large data (500k should reliably break it) - functions block:

string dummydata = "";
protected override async Task OnInitAsync()
{
    Random rnd = new Random();
    int i = rnd.Next(32, 127);
    dummydata  = new string((char)(i), 500000); 
}

At this point the application works fine, navigating between different pages works as expected

What seems to then break it is when navigation occurs from another layout (it may also be that there are more different elements to sync). Anyway -, to reproduce:

Create a layout file Layout1.razor (no specific content required):

@inherits LayoutComponentBase
<div>@Body</div>

Change the fetchdata.razor file to use this layout
Add a button on fetchdata to enable navigation back to the counter page (due to the layout change this is required):
<button onclick="@(() => UriHelper.NavigateTo("counter"))">Counter</button>
This also requires: @inject IUriHelper UriHelper

In this scenario the issue reproduces reliably:

Run application
Navigate to "fetchdata" page
Navigate to "counter" page
Clicks are not being processed. After a timeout the connection is reestablished and it works again

javiercn · 2019-04-30T19:49:25Z

@danroth27 I think I saw this the other day when trying out a separate bug.

I think what is happening is that the server takes a long time to process the event/render and the signalr connection gives up along the way.

I will investigate more, but I’m not sure we will be able to do anything reasonable here.

I was rendering 100.000 elements at the time, so I didn’t give it much importance given that I was able to reliably render at least 1000 and that if you render large payloads these things are going to happen eventually.

I also have a proposal based on this to not produce large batch updates but to split them into smaller ones to improve responsiveness.

For the record, I would avoid delivering assets this way.

analogrelay · 2019-04-30T23:54:06Z

<img src="data:image/png;base64, @dummydata" alt="[LOGO]" />

For the record, I would avoid delivering assets this way.

I'd definitely agree with @javiercn here. SignalR isn't really designed with large message payloads in mind. It's designed for frequent small-to-medium size updates. Generally, when users ask about transferring large files "through" SignalR, we strongly recommend hosting the content at some URL (blob storage, CDN, etc.) and sending the URL through SignalR instead of the actual content.

tn-5 · 2019-05-01T04:16:51Z

I agree in general that this should not be done, at least not without a good reason.

In our use case this logo is a client logo in a large multi-tenant installation. The logo is stored in the database and retrieved from an API and then displayed. Normally the logo should be small (< 20k) and retrieved only once (since it is present on the main layout across all pages).

We are designing our Blazor app primarily to be used client side (wasm) which means the code will directly receive this data from the api using our normal authentication etc.

We only found this issue originally since there was a misconfiguration on one tenant (resulting in a very large logo - several 100k).

@javiercn I am concerned about the way this is exhibiting itself. In the failure case all the data is actually transferred correctly and the client side seems ok - it still keeps sending signalr messages to the server. It seems more likely that some state in the signalr handling on the server side is getting confused than a a simple timeout (we are talking total transfer times of several ms here, not seconds). I am guessing that this is rather indicative of a bug related to receiving responses out of an expected sequence.

davidfowl · 2019-05-01T04:22:27Z

@pranavkm didn’t you fix this issue the other day in the message pack implementation?

analogrelay · 2019-05-01T18:35:46Z

In our use case this logo is a client logo in a large multi-tenant installation. The logo is stored in the database and retrieved from an API and then displayed.

Generally we would also recommend not storing the image content in the database (instead storing it on blob storage and storing a link) but I realize you may not want to redesign your entire app :).

I'd still advise you consider using a standard server-side ASP.NET Core middleware to allow you to fetch a tenant's image using standard HTTP given the necessary data (tenant ID, authentication, etc.) and then use that URL in the image tag. data URLs require transferring the entire image on every render which means you can't take advantage of browser caching. If, instead, you fill in a URL like <img src=/api/tenant/{tenantId}/logo />, the browser can cache that image appropriately and save a lot of wasted bandwidth (even for small logos) on full-page refreshes.

This is a little tangential of course, since we do still need to investigate the Blazor issue further. Just throwing my design thoughts out there 😸

tn-5 · 2019-05-02T05:28:06Z

@davidfowl I've tested again with latest nightly (11681) and the problem is still there, so it is not fixed.

When I tested now I realised that it only happens when the navigation is performed by using NavigateTo from the UriHelper.

In the repro example you can copy Mainlayout.razor to Layout1.razor. When navigating using the button click (which uses NavigateTo) it fails. When clicking the link in the navpane it works.

So it seems the issue is somehow related to the UriHelper implementation

davidfowl · 2019-05-02T05:45:08Z

I dunno if the fix is merged but there is an issue that @pranavkm found in the current impl. I dunno if it's related to this but it's a bug in the protocol parsing which might be related.

tn-5 · 2019-05-08T05:56:43Z

I've investigated this in more detail. The root cause is that there is an issue with the lock handling in the signalR HubConnectionContext.

Specifically, the use case I have fails if the following sequence of events happen:

A call to WriteAsync does not complete immediately and it returns a CompleteWriteAsync task
Immediately afterwards another message is sent, no immediate lock is obtained and it returns a CompleteWriteAsync task.

In this scenario task [2] is triggered when task [1] releases the lock (but before the actual return from CompleteWriteAsync). Upon completion of [2] the _writeLock count is 1 (which is correct). Control flows back to CompleteWriteAsync(to return) and at that point the _writeLock count is 0 (incorrect).

At this point the lock seems to be allocated (although it shouldn't be). No further comms is then allowed until the connection is reset.

I am not sure what the root cause of this is, it may be a subtle issue with the ValueTask being awaited (maybe it caches the same one & then awaits it multiple times which is not allowed?).

So to summarise this issue occurs when the CompleteWriteAsync is still busy and another message needs to be sent and ends up in WriteSlowAsync, awaiting the lock.

davidfowl · 2019-05-08T06:14:16Z

@anurse Can we get somebody to look at this? Thanks for looking deeper @tn-5

analogrelay · 2019-05-08T18:19:13Z

@BrennanConroy can you take a look?

ghost · 2019-05-21T15:00:41Z

I have the same error when I try to transfer a html page with a weight of 23,000 characters (no pictures, just html), the application crashes as described in the first post.

analogrelay · 2019-05-21T21:16:23Z

We're continuing to dig through this. Moving out to preview7 because we aren't confident we can have a fix for preview6.

davidfowl · 2019-05-22T05:07:54Z

We've identified the issue and @rynowak is working on a fix

Fixes: #9683 - SignalR connection breaks on large DOM The root cause here is a misbehaving sync context. It's not legal for a Post() implementation to run a callback synchronously. We want that behavior for most of the functionality Blazor calls directly, but Post() should always go async or else various threading primitives are broken.

* Fix RendererSyncContext.Post() Fixes: #9683 - SignalR connection breaks on large DOM The root cause here is a misbehaving sync context. It's not legal for a Post() implementation to run a callback synchronously. We want that behavior for most of the functionality Blazor calls directly, but Post() should always go async or else various threading primitives are broken. * Fix incorrect tests These tests have the assumption that setting the result of a TCS will execution continuations synchronously. This was a bug in our SyncContext, and these tests needed updating to be more resiliant. * Remove a delegate allocation

Eilon added the area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates label Apr 23, 2019

danroth27 added the area-blazor Includes: Blazor, Razor Components label Apr 23, 2019

danroth27 added the bug This issue describes a behavior which is not expected - a bug. label Apr 25, 2019

limefrogyank mentioned this issue Apr 28, 2019

Blazor SignalR message corruption #9808

Closed

danroth27 added the investigate label Apr 30, 2019

danroth27 added this to the 3.0.0-preview6 milestone Apr 30, 2019

danroth27 assigned javiercn Apr 30, 2019

mkArtakMSFT mentioned this issue May 1, 2019

Blazor Preview4 increase buffer size #9570

Closed

mkArtakMSFT removed area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates labels May 9, 2019

mkArtakMSFT added the PRI: 1 - Required label May 17, 2019

BrennanConroy unassigned javiercn May 17, 2019

analogrelay added the blocked The work on this issue is blocked due to some dependency label May 21, 2019

analogrelay modified the milestones: 3.0.0-preview6, 3.0.0-preview7 May 21, 2019

analogrelay pinned this issue May 21, 2019

analogrelay modified the milestones: 3.0.0-preview7, 3.0.0-preview6 May 21, 2019

analogrelay unpinned this issue May 21, 2019

BrennanConroy removed the area-signalr Includes: SignalR clients and servers label May 22, 2019

BrennanConroy assigned rynowak and unassigned BrennanConroy May 22, 2019

BrennanConroy removed the blocked The work on this issue is blocked due to some dependency label May 22, 2019

rynowak mentioned this issue May 22, 2019

Fix RendererSyncContext.Post() #10451

Merged

mkArtakMSFT added 2 - Working and removed investigate labels May 22, 2019

rynowak closed this as completed in #10451 May 23, 2019

rynowak added Done This issue has been fixed and removed 2 - Working labels May 23, 2019

BrennanConroy mentioned this issue May 24, 2019

Server-Side Blazor loses connection randomly without visible errors #10459

Closed

BrennanConroy mentioned this issue Jun 5, 2019

[Blazor server side] Error: Connection disconnected with error 'Error: Server timeout elapsed without receiving a message from the server.'. #10877

Closed

ghost locked as resolved and limited conversation to collaborators Dec 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blazor signalr connection breaks after syncing large dom / dom element #9683

Blazor signalr connection breaks after syncing large dom / dom element #9683

tn-5 commented Apr 23, 2019

danroth27 commented Apr 24, 2019

analogrelay commented Apr 24, 2019

danroth27 commented Apr 24, 2019

davidfowl commented Apr 24, 2019

danroth27 commented Apr 24, 2019

analogrelay commented Apr 25, 2019

BrennanConroy commented Apr 25, 2019

tn-5 commented Apr 25, 2019

tn-5 commented Apr 25, 2019 •

edited by analogrelay

Loading

javiercn commented Apr 30, 2019

analogrelay commented Apr 30, 2019 •

edited

Loading

tn-5 commented May 1, 2019

davidfowl commented May 1, 2019

analogrelay commented May 1, 2019

tn-5 commented May 2, 2019 •

edited

Loading

davidfowl commented May 2, 2019

tn-5 commented May 8, 2019

davidfowl commented May 8, 2019 •

edited

Loading

analogrelay commented May 8, 2019

ghost commented May 21, 2019

analogrelay commented May 21, 2019

davidfowl commented May 22, 2019

Blazor signalr connection breaks after syncing large dom / dom element #9683

Blazor signalr connection breaks after syncing large dom / dom element #9683

Comments

tn-5 commented Apr 23, 2019

Describe the bug

To Reproduce

danroth27 commented Apr 24, 2019

analogrelay commented Apr 24, 2019

danroth27 commented Apr 24, 2019

davidfowl commented Apr 24, 2019

danroth27 commented Apr 24, 2019

analogrelay commented Apr 25, 2019

BrennanConroy commented Apr 25, 2019

tn-5 commented Apr 25, 2019

tn-5 commented Apr 25, 2019 • edited by analogrelay Loading

Trace detail

Reproduce

javiercn commented Apr 30, 2019

analogrelay commented Apr 30, 2019 • edited Loading

tn-5 commented May 1, 2019

davidfowl commented May 1, 2019

analogrelay commented May 1, 2019

tn-5 commented May 2, 2019 • edited Loading

davidfowl commented May 2, 2019

tn-5 commented May 8, 2019

davidfowl commented May 8, 2019 • edited Loading

analogrelay commented May 8, 2019

ghost commented May 21, 2019

analogrelay commented May 21, 2019

davidfowl commented May 22, 2019

tn-5 commented Apr 25, 2019 •

edited by analogrelay

Loading

analogrelay commented Apr 30, 2019 •

edited

Loading

tn-5 commented May 2, 2019 •

edited

Loading

davidfowl commented May 8, 2019 •

edited

Loading