Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web Session stuck with OperationTimeout after faileover #89

Closed
jbengit opened this issue Jun 12, 2018 · 10 comments
Closed

Web Session stuck with OperationTimeout after faileover #89

jbengit opened this issue Jun 12, 2018 · 10 comments

Comments

@jbengit
Copy link

jbengit commented Jun 12, 2018

Configuration:
Asp.NET 4.6
couchbase-aspnet package: 3.0.0 beta 2
Couchbase Community Edition: 5.0.1 on Windows
3 Node cluster, ephemeral bucket, with 1 replica.
Default couchbase client settings.

After a failover simulation on the cluster when an Asp.NET Web session was active:

  • remove one node, then re balance
  • failover the second one, then re-balance
  • re-join the two nodes on the cluster, then re-balance
    , the following client errors occurred in an infinite loop, until that we recycled the app and flushed the bucket:

Debug: Operation for key zkdlhkxibjhyjrynekzwmnhs failed after 24 retries using vb838 from rev43677 and opaque7233. Reason: VBucketBelongsToAnotherServer
Debug: Operation for key zkdlhkxibjhyjrynekzwmnhs failed after 24 retries using vb838 from rev43677 and opaque7233. Reason: The operation has timed out.
Error: Could not retrieve, remove or write key 'zkdlhkxibjhyjrynekzwmnhs' - reason: OperationTimeout

@MikeGoldsmith
Copy link

@jbengit Thanks for the bug report, we'll look to reproduce and see what's going on.

@jbengit
Copy link
Author

jbengit commented Jun 13, 2018

Thank you. I reproduced the issue again with the following steps:

  1. Stop two nodes at the same time in which the web sessions were present
    => the page waited for a response infinitely
  2. Failover manually the two nodes and rebalance
    => the page responded (no error) and the website was up again
  3. Restart the two missing nodes and join
  4. Rebalance the cluster
    => From this point, the page responded after a delay of 30-40 seconds, with "Operation Timeout" and "VBucketBelongsToAnotherServer" in client logs (recycle did not help)
    => For new web sessions, no encountered issues
    => In the client settings, one node among the three was missing in the "servers" section. When I added it, the issue on the existing web session disappeared.
    It seems that for some specific cases of node failures, the client needs to be aware of all the nodes in the cluster.

@MikeGoldsmith
Copy link

Hi @jbengit - can you let me know what version of the CouchbaseNetClient package you're using? It's the actual library that interacts with the Couchbase cluster and is a dependency of the couchbase-aspnet package. The latest version (as of 12th June) is 2.5.12.

CouchbaseNetClient - Release Notes - Nuget

@jbengit
Copy link
Author

jbengit commented Jun 17, 2018

We are using the Couchbase.NetClient v2.5.10

@jeffrymorris
Copy link

@jbengit -

Have you tried beta Beta3? It may be a resolved issue.

-Jeff

@jbengit
Copy link
Author

jbengit commented Jun 18, 2018

@jeffrymorris
We did not try the Beta3 because it depends on a beta version of the couchbase sdk client.
Which couple of versions (CouchbaseNetClient + couchbase-aspnet) should be used for a production environment (Couchbase Community edition 5.0.1, or Enterprise 5.1.1 version)?

@jeffrymorris
Copy link

@jbengit -

Can you try Beta 3 in a non-production environment? If there are any issues, we can squeeze bug fixes in before EOM.

v2.0 is probably the most stable, but uses the CB 4.0 way of doing authentication (non-RBAC). Very soon v3 (Around the 1st week of July 2018) will be released along with SDK 2.6.0; I would suggest waiting for v3 in production if you can.

  • Jeff

@jeffrymorris
Copy link

@jbengit -

v3.0 is up on NuGet w/Couchbase SDK 2.6.0 official release.

@jbengit
Copy link
Author

jbengit commented Aug 28, 2018

Thank you! I will try it.
The log4net and / or NewtonSoft.Json additional dependencies are really required or is it possible to remove them in a future version?

@jeffrymorris
Copy link

@jbengit -

I removed the dependencies on log4net and NewtonSoft.JSON on main project. Thanks for reporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants