New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCP Feed does not seed from all nodes #3756

Closed
JFlath opened this Issue Sep 20, 2018 · 0 comments

Comments

Projects
None yet
5 participants
@JFlath
Copy link
Collaborator

JFlath commented Sep 20, 2018

In non-import mode, when we start the DCP feed, we first do a stats call to each node in the CB cluster to get vbucket-seqno - this returns the high_seqno (or abs_high_seqno) which we use to start the DCP feed to get everything "from now on".

We then process this here:

sync_gateway/base/bucket.go

Lines 287 to 323 in f9e0e3d

func GetStatsVbSeqno(stats map[string]map[string]string, maxVbno uint16, useAbsHighSeqNo bool) (uuids map[uint16]uint64, highSeqnos map[uint16]uint64, seqErr error) {
// GetStats response is in the form map[serverURI]map[]
uuids = make(map[uint16]uint64, maxVbno)
highSeqnos = make(map[uint16]uint64, maxVbno)
for _, serverMap := range stats {
for i := uint16(0); i < maxVbno; i++ {
// stats come map with keys in format:
// vb_nn:uuid
// vb_nn:high_seqno
// vb_nn:abs_high_seqno
// vb_nn:purge_seqno
uuidKey := fmt.Sprintf("vb_%d:uuid", i)
// workaround for https://github.com/couchbase/sync_gateway/issues/1371
highSeqnoKey := ""
if useAbsHighSeqNo {
highSeqnoKey = fmt.Sprintf("vb_%d:abs_high_seqno", i)
} else {
highSeqnoKey = fmt.Sprintf("vb_%d:high_seqno", i)
}
highSeqno, err := strconv.ParseUint(serverMap[highSeqnoKey], 10, 64)
if err == nil && highSeqno > 0 {
highSeqnos[i] = highSeqno
uuid, err := strconv.ParseUint(serverMap[uuidKey], 10, 64)
if err == nil {
uuids[i] = uuid
}
}
}
// We're only using a single server, so can break after the first entry in the map.
break
}
return
}

Of particular concern is the break:

sync_gateway/base/bucket.go

Lines 318 to 319 in f9e0e3d

// We're only using a single server, so can break after the first entry in the map.
break

The results from a single node only contain sequence numbers for vBuckets present on that node. It does include both Active and Replica vBuckets, so we actually get (1024 / num_nodes) * num_replicas sequence numbers from this. Technically, the replica ones aren't guaranteed to be exact (as there's always a lag in an active system), but as long as the UUID is the same it shouldn't cause a functional issue for that fraction.

This still leaves us with some of the vBucket sequence numbers unseeded as they were only provided from the skipped nodes, meaning that we start those streams from 0. This causes a massive backlog for Sync Gateway to work through (and cache). The impact goes up as the number of nodes in the cluster increases, as the one node we're seeding from has a smaller fraction of vBuckets.

@JFlath JFlath self-assigned this Sep 20, 2018

JFlath added a commit that referenced this issue Sep 20, 2018

adamcfraser added a commit that referenced this issue Sep 20, 2018

Seed DCP Feed from all nodes (Fixes #3756) (#3757)
* Seed DCP Feed from all nodes (Fixes #3756)

* Add unit test for GetStatsVbSeqno

* Add test for replica vbuckets

* Split test for lagging and non-lagging replicas

@adamcfraser adamcfraser self-assigned this Sep 20, 2018

tleyden added a commit that referenced this issue Sep 20, 2018

Seed DCP Feed from all nodes (Fixes #3756) (#3757) (#3760)
* Seed DCP Feed from all nodes (Fixes #3756)

* Add unit test for GetStatsVbSeqno

* Add test for replica vbuckets

* Split test for lagging and non-lagging replicas

bbrks added a commit that referenced this issue Oct 1, 2018

Seed DCP Feed from all nodes (Fixes #3756) (#3757)
* Seed DCP Feed from all nodes (Fixes #3756)

* Add unit test for GetStatsVbSeqno

* Add test for replica vbuckets

* Split test for lagging and non-lagging replicas

adamcfraser added a commit that referenced this issue Oct 1, 2018

Seed DCP Feed from all nodes (Fixes #3756) (#3757) (#3772)
* Seed DCP Feed from all nodes (Fixes #3756)

* Add unit test for GetStatsVbSeqno

* Add test for replica vbuckets

* Split test for lagging and non-lagging replicas

@djpongh djpongh added this to the 2.1.1 milestone Oct 8, 2018

@djpongh djpongh added the bug label Oct 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment