Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Writing is not allowed after writer was completed", with SignalR 1.1 #6701

Closed
viktorgullmark opened this issue Jan 15, 2019 · 26 comments

Comments

@viktorgullmark
Copy link

commented Jan 15, 2019

Describe the bug

I'm receiving "Writing is not allowed after writer was completed" on the latest release of SignalR with .NET Core 2.2. We previously used 1.0.0-rc1-final without any issues, and all of a sudden started receiving this issue with 1.1 on the client. We also receive a lot of "Task was cancelled" from the RedisHubLifetimeManager. Attaching screenshots below.

We ran SignalR 1.1 on the backend with 1.0.0-rc1-final on the client without any issues, but when we upgraded our client to 1.1, we started receiving this in our logs.

Another change we did while upgrading was to add "skipNegotiate: true" on the client, as well as forcing websockets. Not sure if this is related. I can't reproduce the error locally, since it doesn't happen often and seems related to only a few of our clients.

Also, the server seems to go down quite often after the upgrade. Can't seem to find any reason for this, but it was stable before we upped our clients.

To Reproduce

Can't reproduce the error locally, we just see it in our production environment logs.

Expected behavior

Expected it to be gone with 1.1.0, since I saw previous reports of the problem on RC versions.

Screenshots

Screenshot of writing is not allowed message:
image

Screenshot of task cancelled exception:
image

Additional context

dotnet --info output:
image
This is on our production environment. Locally we have the 2.2 runtime.

EDIT: If this issue is misplaced or inproper, feel free to close it. I didn't know where to turn.

@BrennanConroy

This comment has been minimized.

Copy link
Member

commented Jan 15, 2019

The task canceled exception from Redis occurs when you try to add or remove a connection from a group where the connection either doesn't exist, or is on another server and the other server took too long to respond.

As for the "Writing not allowed" exception, how often are you seeing this? Do the server logs show that a client was disconnected around the same time the exception occurs?

@viktorgullmark

This comment has been minimized.

Copy link
Author

commented Jan 17, 2019

Thanks for the input. We've resolved the Task canceled exception now thanks to your clarification.

We're seeing the "Writing is not allowed after writer was completed" a few times every hour, with around 100 users online. Can't find a disconnect at this timestamp in the logs though 🤔

@muratg muratg added the investigate label Jan 23, 2019

@muratg muratg added this to the 3.0.0-preview3 milestone Jan 23, 2019

@BrennanConroy

This comment has been minimized.

Copy link
Member

commented Jan 28, 2019

Could you capture logs around the exception, at Debug level preferably.

@muratg muratg removed this from the 3.0.0-preview3 milestone Jan 30, 2019

@muratg muratg added the waiting label Jan 30, 2019

@bradygaster

This comment has been minimized.

Copy link
Member

commented Feb 6, 2019

Closing this as we haven't heard from you. Please feel free to comment if you're able to get the information we're looking for and we can reopen the issue if we're able to identify the problem.

@bradygaster bradygaster closed this Feb 6, 2019

@imperugo

This comment has been minimized.

Copy link

commented Feb 6, 2019

Hi,
I'm getting the same error with this scenario:

.NET Core 2.2, SignalR 1.1, scaling using Redis (2.x) and hosting into Docker.

Unfortunately, I'm unable to reproduce the problem and the log isn't so helpful (Writing is not allowed after writer was completed)
Is there anything I can do to provide more information? (add more logging somewhere, try another build or whatever?).

I'm also using Groups so, probably is the same problem of @viktorgullmark but I don't have the issue related to the Task Canceled exception.

Let me know.

@codingsteff

This comment has been minimized.

Copy link

commented Feb 7, 2019

We have the same errors, only in production inside Docker behind Nginx.
The messages appeared after upgrade to .NET Core 2.2.

It doesn't matter if we use SignalR 1.1 or 1.0 on the Client Side.

By the way, we don't use Groups or Redis.

@BrennanConroy BrennanConroy reopened this Feb 7, 2019

@BrennanConroy

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

@imperugo Enabling Debug level logs would be nice. I'd like to see the logs around the failure if possible, as they can give context on what is happening on the server.

@codingsteff Can you confirm that switching from SignalR 1.0.0 to 1.1.0 on the server causes the issue? And could you also collect logs please?

@imperugo

This comment has been minimized.

Copy link

commented Feb 8, 2019

@BrennanConroy I'll switch the log only for "Microsoft.AspNetCore.SignalR.*" to debug.

Soon news

@codingsteff

This comment has been minimized.

Copy link

commented Feb 8, 2019

I rolled back our server to .NET Core 2.1 and now we have no errors in the log anymore.
Currently all our clients using JavaScript SignalR 1.1.0, the server runs with .Net Core 2.1.1 and Microsoft.AspNetCore.SignalR 1.0.3.

Next week I can switch the production to Microsoft.AspNetCore.SignalR 1.1.0, but because of the dependency to Microsoft.AspNetCore.Connections.Abstractions I need to switch to .NET Core 2.2 as well.

I'll send you some log entries.

@imperugo

This comment has been minimized.

Copy link

commented Feb 8, 2019

@BrennanConroy here my logs

screenshot 2019-02-08 at 14 05 09

@codingsteff

This comment has been minimized.

Copy link

commented Feb 13, 2019

@BrennanConroy here are the logs from the same ConnectionId
image

And another one
image

Too many clients, I'm not sure if it's the right ConnectionId.

@fabiano

This comment has been minimized.

Copy link

commented Feb 13, 2019

We have the same errors here in two applications.

One is running on .NET Core 2.2.1 (deployed as self-contained). The other is running on .NET 4.7.1.

They are built using the SDK 2.2.103 and runtime 2.2.1.

The Microsoft.AspNetCore.SignalR package version is 1.1.0.
The @aspnet/signalr package version is 1.0.4.

We enabled the SignalR log on our servers, then soon I'll be able to share more data about the connections and errors.

Microsoft.AspNetCore.SignalR.HubConnectionContext -> Failed writing message. Aborting connection.
System.InvalidOperationException: Writing is not allowed after writer was completed.
   at System.IO.Pipelines.ThrowHelper.ThrowInvalidOperationException_NoWritingAllowed()
   at System.IO.Pipelines.Pipe.GetMemory(Int32 sizeHint)
   at System.IO.Pipelines.Pipe.DefaultPipeWriter.GetSpan(Int32 sizeHint)
   at System.IO.Pipelines.PipeWriter.WriteAsync(ReadOnlyMemory`1 source, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.SignalR.HubConnectionContext.<TryWritePingSlowAsync>d__49.MoveNext()

@bradygaster bradygaster added bug and removed waiting labels Feb 13, 2019

@bradygaster bradygaster added this to the 3.0.0-preview6 milestone Feb 13, 2019

@BrennanConroy

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

This is a benign error as it looks like the client is already disconnected.

We'll look at making this either not try to write after the client is gone, or at a minimum not be an error log.

@viktorgullmark

This comment has been minimized.

Copy link
Author

commented Feb 13, 2019

@BrennanConroy Alright, thanks for looking into it!

@pashchuk

This comment has been minimized.

Copy link

commented Mar 26, 2019

Is there any progress on this one?
Can anyone confirm that this type of error can be safely ignored by our log filter(as it constantly fire our alerting system based on Error log level) and does not indicate any issues in a running service?

@pashchuk

This comment has been minimized.

Copy link

commented Mar 26, 2019

Probably I can try to fix this and submit a PR if there is no one in the team already on this issue or you have focused on the more priority tasks.

@rfrancisco

This comment has been minimized.

Copy link

commented Apr 12, 2019

This is also affecting us. We have roughly 30k connections on each server and we don't use any backplane between them. Might be a coincidence but i noticed an increase in the number of exceptions of this type when i updated the typescript clients from 1.0.0-rc-final to 1.1.4. (The servers are running dotnet 2.2)

Here is an example of 1s of logs from one of the servers:
ubuntu_ip-172-31-54-19___opt_insurads_log

Also, what would be the your recommendation for handling these exceptions?

@rfrancisco

This comment has been minimized.

Copy link

commented Apr 13, 2019

Might be a coincidence but i noticed an increase in the number of exceptions of this type when i updated the typescript clients from 1.0.0-rc-final to 1.1.4

I reverted the client script to the version compiled with 1.0.0-rc1-final and the exceptions disappeared after an hour or so. Moved back to the version compiled with 1.1.4 and the System.InvalidOperationException: Writing is not allowed after writer was completed. exceptions start to occur.

I did this 3 times and the results were always the same.

I also noticed a small drop in the number of connected users when i use the version compiled with 1.1.4. It's hard to know for sure because the number of connected users vary constantly but whenever i published the version that caused the exceptions i would see a drop in connected users after a couple of minutes. In the last test the connections wend down from 216K to 200K in 10m, then went back up when i published the version with 1.0.0-rc-final. But again, this could just be a coincidence.

@anurse

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

To confirm, is anyone seeing a functional problem here or is it just a spurious exception message in the logs? I just want to make sure to understand if this is actually causing a functional problem. It looks like it's the writing of the "ping" message that's failing, which is fine (though annoying to see in the logs).

We'll look at ensuring the pings are stopped before the pipeline is closed in preview 6.

@rfrancisco

This comment has been minimized.

Copy link

commented Apr 30, 2019

I believe that there is a functional problem. We saw a reduction of connected users, that was the main reason we were forced to revert back to the version compiled with 1.0.0-rc-final.

@anurse

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

Hrm, I don't think that would be directly related to the callstack you're seeing though. The ping logic is very carefully designed to be resiliant to failures like this. In fact, if that log message is written it means the exception is not bubbling up, since we catch it and don't re-throw it:

catch (Exception ex)
{
Log.FailedWritingMessage(_logger, ex);
}

The log message is unfortunately unclear about this. It says "Failed writing message. Aborting connection.". This is because we share the same log message in multiple places and in the other places it occurs it causes the connection to terminate. In this specific call stack though, the connection is not terminated. We should fix that message when we make this change (cc @halter73).

The "Failed Connection Handshake" messages (referenced in your other issue #9327) are more likely to indicate a problem.

@kofifus

This comment has been minimized.

Copy link

commented May 7, 2019

So the message is confusing and unclear, can it please be changed ?

@BrennanConroy

This comment has been minimized.

Copy link
Member

commented May 7, 2019

Yes. that is part of what we're doing.

@BrennanConroy

This comment has been minimized.

Copy link
Member

commented May 28, 2019

Acceptance checklist (check one item)

  • We decided not to take this fix.
  • The fix is tests-only.
  • The fix contains product changes (check all items below).
    • Relevant XML documentation comments for new public APIs are present.
    • Narrative docs (docs.microsoft.com) are updated. (check one item below)
      • The change requires a new article. An issue with an outline has been filed here: [ISSUE LINK]
      • The change requires a change to an existing article. A closed docs PR with these changes is linked here: [PR LINK]
      • The change requires no docs changes.
    • Verification has been completed. (check one item below)
      • The change is in the shared framework and was verified against the following versions
        • SDK installer: [3.0.100-preview6-012096]
        • ASP.NET Core Runtime: [3.0.0-preview6-19277-03]
        • .NET Core Runtime: [3.0.0-preview6-27727-02]
      • The change is in an OOB NuGet/NPM/JAR package and was verified against the following version of that package: [PACKAGE ID] [VERSION]
@Veikedo

This comment has been minimized.

Copy link

commented May 29, 2019

Hi all, same problem.

This is a benign error as it looks like the client is already disconnected.

No, the client is not disconnected at the time of the error. In our case the client is still waiting for the response, and disconnects after timeout.

Here is exception from the client:

"lg-exception": "System.TimeoutException: Server timeout (30000.00ms) elapsed without receiving a message from the server.",
"lg-callsite": "Microsoft.AspNetCore.SignalR.Client.HubConnection+d__62.MoveNext:0",

@BrennanConroy

This comment has been minimized.

Copy link
Member

commented May 29, 2019

Hi, we don't track closed issues. If you're seeing a problem please open a new issue with details of the issue.

We have a nice guide for collecting information that is often useful to include in issues: https://docs.microsoft.com/en-us/aspnet/core/signalr/diagnostics?view=aspnetcore-2.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.