Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigating 'No connection is available to service this operation' #1277

Closed
StfnStrn opened this issue Nov 18, 2019 · 2 comments
Closed

Investigating 'No connection is available to service this operation' #1277

StfnStrn opened this issue Nov 18, 2019 · 2 comments

Comments

@StfnStrn
Copy link

Hey StackExchange.Redis team,

there are several open issues concerning the error message 'No connection is available to service this operation'. We face the same issue in our application. It's an Azure WebApp on containers, using StackExchange.Redis 2.0.601. For us, we face a SocketDisposedException as root cause. It usually occurs after deployment, when the app is started and the deployment slots are exchanged. Our Redis Server is an Azure Redis. The problem occurred in April, so I planned to investigate. But a few days later, we didn't face it again, and I dropped the issue. Since three weeks, the problem is back.

This time, I started to investigate the issue and cloned the StackExchange.Redis repository. In wrote a test case, which simulates the error. The test connects to an Azure Redis instance and sets a key. Then the test calls Dispose() on the underlying Socket object, an operation only possible after I made the fields interactive in PhysicalBridge and the VirtualSocket in PhysicalConnection available via an internal getter. After disposing the socket, the test tries to read the previously set string value. That fails due to a SocketDisposedException. So far, so good.

Using that first StringGetAsync action as a trigger for the connection to reconnect, the first exception was ignored and a second read operation was added. This one failed occasionally, but sometimes returned the proper value. So I added a pause, using await Task.Delay(TimeSpan.FromSecond(1)) to give the connection a change to repair itself. This lead to the second call being valid for every test execution.

This made me curious, so I removed the delay and replaced the two StringGetAsync with a for-loop. It launches count StringGetAsync operations, putting the task in a list, and waits delay milliseconds before starting the next read operation. The result strongly depends on different factors. The value for delay, whether I launch the test in debug or in run mode and most likely on the executing machine. There are values, where some of the tasks return with a valid result, so the connection repairs itself. Using different values, not a single tasks returns. For all combinations, the product of delay and count leads is between one and two seconds.

If I insert a one second delay after disposing the Socket, before the for-loop starts, all operations always return as success.

That's how far I've come. If anyone has hints on where to look for an error, I'm open for suggestions. From pov, it looks like the read-requests disturbed the internal reconnection operation. Also, there's the heartbeat operation. And I haven't checked, how the subscription property of the PhysicalBridge deals with all this. Since I haven't found a fix yet, I didn't create a Pull-Request. If you like to run the test case on your end, I can catch up on that and share the current test code. As long as I haven't run out of ideas, I'll keep digging into this and check if I can find the root cause, why the self-reconnect-logic does not work reliably and why a disposed socket can lead to a dead ConnectionMultiplexer object.

StackTrace.txt

@mgravell
Copy link
Collaborator

mgravell commented Nov 18, 2019 via email

@NickCraver
Copy link
Collaborator

Going to close this out to fold all discussions together - see #1374 which we hope to vet and get in a 2.1.x release. Watch this week for some progress here. And thanks to everyone for adding more info and context around this - the hang situation is subtle and it's appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants