Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle changes to Couchbase Server cluster topology at runtime #942

Closed
zgramana opened this issue Jun 23, 2015 · 13 comments
Closed

Handle changes to Couchbase Server cluster topology at runtime #942

zgramana opened this issue Jun 23, 2015 · 13 comments
Assignees
Milestone

Comments

@zgramana
Copy link
Contributor

Currently, the "server" key of the database config JSON only supports a single string value,
despite the fact that the bucket source's constructor in go-couchbase actually takes a vector value of URLs in the client config.

The proposal is to permit the user to directly provide multiple cluster manager endpoints to use during cluster map download, for failover purposes:

{
     "databases": {
          "foo": {
            "server": ["http://fizz:8091/", "http://buzz:8091/"],
            "bucket": "bar"
          }
     }
}
@zgramana zgramana added this to the 1.1.1 milestone Jun 23, 2015
@adamcfraser
Copy link
Collaborator

go-couchbase doesn't actually support natively - we're using ConnectWithAuth to establish the standard bucket connection, which takes a single URL.

https://github.com/couchbase/go-couchbase/blob/master/pools.go#L579

It's cb-datasource (the DCP feed runner that was moved to the go-couchbase repo) that's added the additional layer to support trying multiple server URLs (https://github.com/couchbase/go-couchbase/blob/master/cbdatasource/cbdatasource.go#L481).

I don't think it makes sense for us to reimplement the server URL iteration in SG - it would make more sense as a go-couchbase enhancement to move the handling down from cb-datasource into go-couchbase.

It looks like the functionality is already in place in gocb - another option would be to wait until we move to that library. https://github.com/couchbaselabs/gocb/blob/fab5b631290a984e214702c7190d5833849e8b6e/connspec.go#L25

@zgramana
Copy link
Contributor Author

Just a general note that this enhancement is intended to make it easier to upgrade CBS.

@adamcfraser
Copy link
Collaborator

Followup based on the discussion of the upgrade scenario: go-couchbase will try to refresh it's bucket definition when it gets an error making a memcached request. (see https://github.com/couchbase/go-couchbase/blob/master/client.go#L264 for one example). It's going to use the original server URL to do that refresh. If that's the node that's unavailable (because it's getting upgraded), the refresh fails. Ideally Sync Gateway would provide a list of server URLs to go-couchbase, and go-couchbase would use all of these when attempting to refresh a bucket.

@zgramana
Copy link
Contributor Author

zgramana commented Jul 1, 2015

This issue is causing the upgrade process grow more complex, and hence more error prone. @ashvindersingh is going to open up a ticket for go-couchbase, and link back to here, so that we can get Manik's help.

@ashvindersingh
Copy link

Created ticket: https://issues.couchbase.com/browse/MB-15525

@maniktaneja
Copy link

So you guys need a ConnectWithAuth() type API that accepts a list of Server URLs and automatically reconnects to different url if one that it was previously connected to fails ? If that is the case then it sounds like a wrapper that can be easily implemented in the application.

@adamcfraser
Copy link
Collaborator

@maniktaneja The main issue is that we need the list of server URLs to be used when go-couchbase does a bucket.refresh(), like mentioned above, which doesn't work with the wrapper approach.

@maniktaneja
Copy link

@adamcfraser Can you define the exact API spec that you need ?

@tleyden
Copy link
Contributor

tleyden commented Jul 6, 2015

Re-assigning to myself due to this being a pre-requisite to #969 which is assigned to me -- see #969 (comment)

@tleyden tleyden self-assigned this Jul 6, 2015
@adamcfraser
Copy link
Collaborator

@maniktaneja The key idea is for clients to provide a list of URLs to go-couchbase when initializing a client, and then have the client have the ability to retry over all entries in that list until it finds one it's able to connect to successfully.

As mentioned above, the other key point is that the list should get used internally when doing things like a bucket refresh(). The particular use case QE is focusing on at the moment is during upgrade, when the cluster node corresponding to the current single URL is brought down. We need to be able to handle this scenario smoothly and transition to using one of the other server nodes if we need to refresh the bucket (e.g. rebalance, etc).

@tleyden tleyden added ready and removed in progress labels Jul 13, 2015
@tleyden tleyden assigned ashvindersingh and unassigned tleyden Jul 20, 2015
@tleyden
Copy link
Contributor

tleyden commented Jul 27, 2015

Retry would be needed here:

  • doPOSTApi
  • queryRestApi

in pools.go

@ashvindersingh
Copy link

Update:

  • I ran the (partially automated) test with SG build: feature/issue_1011 and verified that original issue has been resolved. The SG was not offline and continue to operate.
  • Issue found: Approx 10% of docs did not had the latest revision number displayed in the changes feed. Although the Bucket contained all the revisions.
  • Explanation (as per Adam): During upgrade/rebalance, Sync Gateway is dropping the TAP feed from Couchbase Server (possibly only intermittently). This results in the in-memory channel cache ending up with a gap. Sync Gateway knows that it has a gap, and is communicating that to the clients (docs are being sent with a composite sequence, like 3643::13784.
    The fix for this is planned for 1.2 release aka online/offline functionality/ TAP feed disconnection issues.
  • Recommendation for the issue: Update the each sync gateway config file to point to the updated node (with 3.1.x release) and restart the sync gateway one at a time. This should reset the SG cache and the clients will get the full changes feed.

@zgramana
Copy link
Contributor Author

Note: we should log a warning on cluster configuration change, tracked on issue #1011.

@adamcfraser adamcfraser changed the title Support an array of URLs for the "server" key in config JSON Handle changes to Couchbase Server cluster topology at runtime Aug 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants