Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: PUBLISH content #1147

Closed
Tadimsky opened this issue May 15, 2019 · 8 comments

Comments

@Tadimsky
Copy link

We upgrade to the v2 version recently and since then have started seeing these exceptions. When this happens, we have to restart the service instance in order for it to reconnect.
However, it seems that about an hour after the restart occurs, we get the same issue show up.

We're rolling back to the v1 library to mitigate the issue, but would like to find out what could be happening.
We're using the Azure Redis Cache for Pub/Sub only. The redis instance is hosted in West US 2 and used by services running in West US 2 and East US. In our case, only Easst US is running into these issues.

Here is the full exception information:

StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: PUBLISH content; An existing connection was forcibly closed by the remote host; IOCP: (Busy=1,Free=999,Min=4,Max=1000), WORKER: (Busy=0,Free=32767,Min=4,Max=32767), Local-CPU: n/a ---> StackExchange.Redis.RedisConnectionException: SocketFailure on intentcontent-prod-westus2.redis.cache.windows.net:6380/Interactive, Flushing/Faulted, last: PUBLISH, origin: ReadFromPipe, outstanding: 984, last-read: 18s ago, last-write: 12s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 7 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.600.65315 ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host

StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: PUBLISH content; An existing connection was forcibly closed by the remote host; IOCP: (Busy=1,Free=999,Min=4,Max=1000), WORKER: (Busy=0,Free=32767,Min=4,Max=32767), Local-CPU: n/a ---> StackExchange.Redis.RedisConnectionException: SocketFailure on intentcontent-prod-westus2.redis.cache.windows.net:6380/Interactive, Flushing/Faulted, last: PUBLISH, origin: ReadFromPipe, outstanding: 984, last-read: 18s ago, last-write: 12s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 7 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.600.65315 ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
   --- End of inner exception stack trace ---
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
   at System.Net.Security.SslStreamInternal.<FillBufferAsync>g__InternalFillBufferAsync|38_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)
   at System.Net.Security.SslStreamInternal.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer)
   at Pipelines.Sockets.Unofficial.StreamConnection.AsyncStreamPipe.CopyFromStreamToReadPipe() in C:\Code\Pipelines.Sockets.Unofficial\src\Pipelines.Sockets.Unofficial\StreamConnection.AsyncStreamPipe.cs:line 97
   at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
   at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
   at System.IO.Pipelines.Pipe.GetReadAsyncResult()
   at StackExchange.Redis.PhysicalConnection.ReadFromPipe()
   --- End of inner exception stack trace ---
   at Microsoft.Intentional.ContentService.Frontend.Storage.RedisChangeNotifier.PublishMessage(ContentMessage message, Boolean fireAndForget)
   at Microsoft.Intentional.ContentService.Frontend.Storage.RedisChangeNotifier.SendSmallChangeAsync(String documentId, ReadOnlyMemory`1 payload, Boolean fireAndForget)
   at Microsoft.Intentional.ContentService.Frontend.Storage.RedisChangeNotifier.SendAsync(String documentId, ReadOnlyMemory`1 payload, Boolean fireAndForget)
   at Microsoft.Intentional.ContentService.Frontend.Storage.RedisChangeNotifier.SendAsync(String documentId, ReadOnlyMemory`1 payload, Boolean fireAndForget)
   at Microsoft.Intentional.ContentService.Frontend.Storage.RedisChangeNotifier.SendAsync(String documentId, ReadOnlyMemory`1 payload, Boolean fireAndForget)
   at Microsoft.Intentional.ContentService.Frontend.Storage.RedisChangeNotifier.SendAsync(String documentId, ReadOnlyMemory`1 payload, Boolean fireAndForget)
   --- End of inner exception stack trace ---
   at StackExchange.Redis.ConnectionMultiplexer.ThrowFailed[T](TaskCompletionSource`1 source, Exception unthrownException) in C:\projects\stackexchange-redis\src\StackExchange.Redis\ConnectionMultiplexer.cs:line 2201
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Intentional.ContentService.Frontend.Storage.RedisChangeNotifier.PublishMessage(ContentMessage message, Boolean fireAndForget)
   at Microsoft.Intentional.ContentService.Frontend.Storage.RedisChangeNotifier.SendLargeChangeAsync(String documentId, ReadOnlyMemory`1 payload)

Anyone know if the info in the exception message is helpful? And what we can do to investigate further?

@Tadimsky
Copy link
Author

Here's a chart of the number of times it hit - our mitigation was to restart the services which you can see when the number of exceptions goes to zero.

However, it then comes right back after an hour or so:

image

@Tadimsky
Copy link
Author

The first exception that happens (before it collapses) does seem to be a TimeoutException. Here's an example:

The timeout was reached before the message could be written to the output buffer, and it was not sent, command=PUBLISH, timeout: 5000, outbound: 1KiB, inbound: 379KiB, inst: 0, qu: 1808, qs: 915, aw: True, bw: RecordingTimeout, rs: ReadAsync, ws: Flushing, in: 0, serverEndpoint: Unspecified/intentcontent-prod-westus2.redis.cache.windows.net:6380, mgr: 9 of 10 available, clientName: nt1vm000000, IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=6,Free=32761,Min=4,Max=32767), v: 2.0.600.65315 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

Both qs and qu look like large values - so lots of stuff is waiting to be written (and processed by server).
It looks like the read socket is in a good state? ReadyAsync, but the write socket is in the Flushing state. This seems to be the same error we see in the connection errors (in first comment).

@serhatkiyak
Copy link

We are having the same issue, here's a sample log:

StackExchange.Redis.RedisConnectionException: No connection is available to service this operation: GET V3.DATASOURCE.OBJECTSTORE.ANID.CACHE.E40676519428E7F77223059FFFFFFFFF.MOCK:FALSE; A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond; IOCP: (Busy=0,Free=1000,Min=8,Max=1000), WORKER: (Busy=5,Free=32762,Min=8,Max=32767), Local-CPU: n/a ---> StackExchange.Redis.RedisConnectionException: SocketFailure on DFSEAP-Prod-BN2-p7.redis.cache.windows.net:6380/Interactive, Flushing/Faulted, last: SETEX, origin: ReadFromPipe, outstanding: 4175, last-read: 30s ago, last-write: 15s ago, unanswered-write: 37s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 7 of 10 available, in: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.0.601.3402 ---> System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

Our service has multiple Azure Redis shards (7 of them) in this environment. We observe that this issue only occurs when the machines are running with high CPU usage, and the Redis client loses connectivity to one Redis instance. From then on, the machines cannot reestablish the connection even when the CPU usage is back to normal. On Azure side, we see a difference in the problematic Redis account in terms of number of max connected clients, compared to the rest of the accounts, which seems to indicate the machines keep creating new connections where none of them are successful for reconnection:

image

And, here are some views with machines that hit 100% CPU usage at some point, showing the correlation between the start time of these errors:
image
image

Is there any known issue in the client library that may be causing this? We are using StackExchange.Redis v2.0.601 nuget package that is the latest as of today at .NET Framework 4.7.2.

@Tadimsky
Copy link
Author

We rolled back to 1.2.6 and are no longer seeing the issue in production. I don't think we'll be moving to v2 until this issue is known and fixed.

I would love to figure out what's going on though as I would like to move to v2 when it's ready.

@mstavrev
Copy link

We are observing similar problem. Our application is running on .NET Core 2.2. It is fully asynchronous and basically this happens under load, when there is large amount of async tasks being queued by the various logic on the application. Some of these tasks are also performing Redis read/writes.
We had double checked all of the implementation and there is no mix up of sync and async calls - not only for Redis, but in general. The app is basically fully async in both Redis, SQL and disk IO.

Reading the documentation of the Redis client I was under the impression that it had been rewritten to utilize its own pool of threads for handling the Redis operations. May be the threads on this pool are starving for CPU due to the overall load on the rest of the application or are somehow blocked.

@mts57017
Copy link

Same issue here as well with vs .601. Any resolution in sight, or do i need to roll back?

@mstavrev
Copy link

It looks like this is the same issue as #1120. Adding @mgravell if he decides to close it as a duplicate.

@NickCraver
Copy link
Collaborator

Collapsing into #1120 indeed - see #1374 for the fix in progress. Thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants