-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigating 'No connection is available to service this operation' #1277
Comments
That's really useful additional context thanks and it will undoubtedly help
focus things.
I'm aware this needs attention. The problem right now is simply developer
bandwidth. I'm wrapping up the work I've been doing on some other projects,
so hopefully I should be able to find some suitably large block of time to
progress this!
…On Mon, 18 Nov 2019, 09:53 StfnStrn, ***@***.***> wrote:
Hey StackExchange.Redis team,
there are several open issues concerning the error message 'No connection
is available to service this operation'. We face the same issue in our
application. It's an Azure WebApp on containers, using StackExchange.Redis
2.0.601. For us, we face a SocketDisposedException as root cause. It
usually occurs after deployment, when the app is started and the deployment
slots are exchanged. Our Redis Server is an Azure Redis. The problem
occurred in April, so I planned to investigate. But a few days later, we
didn't face it again, and I dropped the issue. Since three weeks, the
problem is back.
This time, I started to investigate the issue and cloned the
StackExchange.Redis repository. In wrote a test case, which simulates the
error. The test connects to an Azure Redis instance and sets a key. Then
the test calls Dispose() on the underlying Socket object, an operation
only possible after I made the fields interactive in PhysicalBridge and
the VirtualSocket in PhysicalConnection available via an internal getter.
After disposing the socket, the test tries to read the previously set
string value. That fails due to a SocketDisposedException. So far, so
good.
Using that first StringGetAsync action as a trigger for the connection to
reconnect, the first exception was ignored and a second read operation was
added. This one failed occasionally, but sometimes returned the proper
value. So I added a pause, using await Task.Delay(TimeSpan.FromSecond(1))
to give the connection a change to repair itself. This lead to the second
call being valid for every test execution.
This made me curious, so I removed the delay and replaced the two
StringGetAsync with a for-loop. It launches *count* StringGetAsync
operations, putting the task in a list, and waits *delay* milliseconds
before starting the next read operation. The result strongly depends on
different factors. The value for *delay*, whether I launch the test in
debug or in run mode and most likely on the executing machine. There are
values, where some of the tasks return with a valid result, so the
connection repairs itself. Using different values, not a single tasks
returns. For all combinations, the product of delay and count leads is
between one and two seconds.
If I insert a one second delay after disposing the Socket, before the
for-loop starts, all operations always return as success.
That's how far I've come. If anyone has hints on where to look for an
error, I'm open for suggestions. From pov, it looks like the read-requests
disturbed the internal reconnection operation. Also, there's the heartbeat
operation. And I haven't checked, how the subscription property of the
PhysicalBridge deals with all this. Since I haven't found a fix yet, I
didn't create a Pull-Request. If you like to run the test case on your end,
I can catch up on that and share the current test code. As long as I
haven't run out of ideas, I'll keep digging into this and check if I can
find the root cause, why the self-reconnect-logic does not work reliably
and why a disposed socket can lead to a dead ConnectionMultiplexer object.
StackTrace.txt
<https://github.com/StackExchange/StackExchange.Redis/files/3859455/StackTrace.txt>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1277?email_source=notifications&email_token=AAAEHMDDC6AMXLJOYGSCHGTQUK3BFA5CNFSM4JOVSPZKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H2CR6OA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAEHMEUA3VPEWCRQRMB7QLQUK3BFANCNFSM4JOVSPZA>
.
|
Going to close this out to fold all discussions together - see #1374 which we hope to vet and get in a 2.1.x release. Watch this week for some progress here. And thanks to everyone for adding more info and context around this - the hang situation is subtle and it's appreciated. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey StackExchange.Redis team,
there are several open issues concerning the error message 'No connection is available to service this operation'. We face the same issue in our application. It's an Azure WebApp on containers, using StackExchange.Redis 2.0.601. For us, we face a
SocketDisposedException
as root cause. It usually occurs after deployment, when the app is started and the deployment slots are exchanged. Our Redis Server is an Azure Redis. The problem occurred in April, so I planned to investigate. But a few days later, we didn't face it again, and I dropped the issue. Since three weeks, the problem is back.This time, I started to investigate the issue and cloned the StackExchange.Redis repository. In wrote a test case, which simulates the error. The test connects to an Azure Redis instance and sets a key. Then the test calls
Dispose()
on the underlyingSocket
object, an operation only possible after I made the fieldsinteractive
inPhysicalBridge
and theVirtualSocket
inPhysicalConnection
available via an internal getter. After disposing the socket, the test tries to read the previously set string value. That fails due to aSocketDisposedException
. So far, so good.Using that first
StringGetAsync
action as a trigger for the connection to reconnect, the first exception was ignored and a second read operation was added. This one failed occasionally, but sometimes returned the proper value. So I added a pause, usingawait Task.Delay(TimeSpan.FromSecond(1))
to give the connection a change to repair itself. This lead to the second call being valid for every test execution.This made me curious, so I removed the delay and replaced the two
StringGetAsync
with a for-loop. It launches countStringGetAsync
operations, putting the task in a list, and waits delay milliseconds before starting the next read operation. The result strongly depends on different factors. The value for delay, whether I launch the test in debug or in run mode and most likely on the executing machine. There are values, where some of the tasks return with a valid result, so the connection repairs itself. Using different values, not a single tasks returns. For all combinations, the product of delay and count leads is between one and two seconds.If I insert a one second delay after disposing the Socket, before the for-loop starts, all operations always return as success.
That's how far I've come. If anyone has hints on where to look for an error, I'm open for suggestions. From pov, it looks like the read-requests disturbed the internal reconnection operation. Also, there's the heartbeat operation. And I haven't checked, how the
subscription
property of thePhysicalBridge
deals with all this. Since I haven't found a fix yet, I didn't create a Pull-Request. If you like to run the test case on your end, I can catch up on that and share the current test code. As long as I haven't run out of ideas, I'll keep digging into this and check if I can find the root cause, why the self-reconnect-logic does not work reliably and why a disposed socket can lead to a deadConnectionMultiplexer
object.StackTrace.txt
The text was updated successfully, but these errors were encountered: