From b322eb4171fcff664872d4f9a2b12936c06885dc Mon Sep 17 00:00:00 2001 From: Steve Yen Date: Tue, 21 Jul 2015 16:53:19 -0700 Subject: [PATCH] more DESIGN.md ideas from rebalance-flow.txt --- DESIGN.md | 159 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 106 insertions(+), 53 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index c77183d..5464729 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -7,23 +7,23 @@ and focuses especially on integrating with couchbase's features like Rebalance, Failover, etc. ------------------------------------------------- -Links +# Links References to related documents: -* cbgt design - https://github.com/couchbaselabs/cbgt/blob/master/IDEAS.md +* cbgt design - https://github.com/couchbaselabs/cbgt/blob/master/IDEAS.md (GT) * ns-server documents, especially "rebalance flow"... * https://github.com/couchbase/ns_server/blob/master/doc - * https://github.com/couchbase/ns_server/blob/master/doc/rebalance-flow.txt + * https://github.com/couchbase/ns_server/blob/master/doc/rebalance-flow.txt (rebalance-flow.txt) ------------------------------------------------- -Requirements +# Requirements (NOTE: We're currently missing a formal PRD (product requirements document), so these requirements are based on anticipated PRD requirements.) -# GT-CS1 - Consistent queries during stable topology. +## GT-CS1 - Consistent queries during stable topology. (Requirements with the "GT-" prefix originally come from the cbgt IDEAS.md design document.) @@ -32,17 +32,17 @@ Clients should be able to ask, "I want query results where the full-text-indexes have incorporated at least up to this set of {vbucket to seq-num} pairings." -# GT-CR1 - Consistent queries under datasource rebalance. +## GT-CR1 - Consistent queries under datasource rebalance. Full-text queries should be consistent even as data source (Couchbase Server) nodes are added and removed in a clean rebalance. -# GT-CR2 - Consistent queries under cbgt topology change. +## GT-CR2 - Consistent queries under cbgt topology change. Full-text queries should be consistent even as cbgt nodes are added and removed in a clean takeover fashion. -# GT-OC1 - Support optional looser "best effort" options +## GT-OC1 - Support optional looser "best effort" options The options might be along a spectrum from stale=ok to totally consistent. The "best effort" option should probably have lower @@ -51,7 +51,7 @@ latency than a totally consistent CR1 query. For example, perhaps the client may want to just ask for consistency around just one vbucket-seqnum. -# GT-IA1 - Index aliases. +## GT-IA1 - Index aliases. This is a level of indirection to help split data across multiple indexes, but also not change your app all the time. Example: the @@ -61,19 +61,19 @@ search limited to only the most recent quarter index of 'last-quarter-sales' alias to the the newest 'sales-2014Q4' index without any client-side application changes. -# GT-MQ1 - Multi-index query for a single bucket. +## GT-MQ1 - Multi-index query for a single bucket. This is the ability to query multiple indexes in one request for a single bucket, such as the "comments-fti" index and the "description-fti" index. -# GT-MQ2 - Multi-index query across multiple buckets. +## GT-MQ2 - Multi-index query across multiple buckets. This is the ability to query multiple indexes across multiple buckets in a single query, such as "find any docs from the customer, employee, vendor buckets who have an address or comment about 'dallas'". -# GT-NI1 - Resilient to datasource node down scenarios. +## GT-NI1 - Resilient to datasource node down scenarios. If a data source (couchbase cluster server node) goes down, then the subset of a cbgt cluster that was indexing data from the down node @@ -81,7 +81,7 @@ will not be able to make indexing progress. Those cbgt instances should try to automatically reconnect and resume indexing from where they left off. -# GT-E1 - The user should be able to see error conditions +## GT-E1 - The user should be able to see error conditions For example yellow or red coloring on node down and other error conditions. @@ -96,100 +96,149 @@ In ES, note the frustrating bouncing between yellow, green, red; ns-server example, not enough CPU & timeouts leads to status bounce-iness. -# GT-NQ1 - Querying still possible if datasource node goes down. +## GT-NQ1 - Querying still possible if datasource node goes down. Querying of a cbgt cluster should be able to continue even if some datasource nodes are down. -# GT-PI1 - Ability to pause/resume indexing. +## GT-PI1 - Ability to pause/resume indexing. -# IPAC - IP Address Changes +## IPAC - IP Address Changes -IP address discovery is "late bound", when couchbase server nodes initially joins to a cluster. A "cluster of one", in particular, only has an erlang node address of "ns_1@127.0.0.1". +IP address discovery is "late bound", when couchbase server nodes +initially joins to a cluster. A "cluster of one", in particular, only +has an erlang node address of "ns_1@127.0.0.1". -# BUCKETD - Bucket Deletion Cascades to Full-Text Indexes +## BUCKETD - Bucket Deletion Cascades to Full-Text Indexes -If a bucket is deleted, any full-text indexes based on that bucket should be also automatically deleted. +If a bucket is deleted, any full-text indexes based on that bucket +should be also automatically deleted. -There whould be user visible UI warnings on these "cascading deletes" of cbft indexes. +There whould be user visible UI warnings on these "cascading deletes" +of cbft indexes. -# RIO - Rebalance Nodes In/Out +## RIO - Rebalance Nodes In/Out -# RP - Rebalance progress estimates/indicator +## RP - Rebalance progress estimates/indicator -# RS - Swap Rebalance +## RS - Swap Rebalance -# FOH - Hard Failover +## FOH - Hard Failover -# FOG - Graceful Failover +## FOG - Graceful Failover Reject any new requests and wait for any inflight requests to finish before failover. -# AB - Add Back Rebalance +## AB - Add Back Rebalance -# DNR - Delta Node Recovery +## DNR - Delta Node Recovery -# RP1 - Rebalance Phase 1 - VBucket Replication Phase -# RP2 - Rebalance Phase 2 - View Indexing Phase -# RP2 - Rebalance Phase 3 - VBucket Takeover Phase +## RP1 - Rebalance Phase 1 - VBucket Replication Phase +## RP2 - Rebalance Phase 2 - View Indexing Phase +## RP2 - Rebalance Phase 3 - VBucket Takeover Phase -# CIUR - Consistent Indexes Under Rebalance +## RSTOP - Ability to Stop Rebalance + +## MDS-RI - Multidimensional Scaling - ability to rebalance Full-Text + indexes indpendent of other services + +## RRU-EE - Rebalance Resource Utilization More Efficient With + Enterprise Edition + +## CIUR - Consistent Indexes Under Rebalance This is the equivalent of "consistent view indexes under rebalance". -# QUERYR - Querying Replicas +## QUERYR - Querying Replicas + +## QUERYLB - Querying Load Balancing -# QUERYLB - Querying Load Balancing +## QUERYLB-EE - Query Load Balancing To Replicas With Enterprise Edition -# ODS - Out of Disk Space +## ODS - Out of Disk Space -# KP - Killed Processes (linux OOM, etc) +Out of disk space conditions are handled gracefully (not segfaulting). -# RSN - Return of the Shunned Node +## ODSR - Out of Disk Space Repaired -# DLC - Disk Level Copy/Restore of Node +After an administrator fixes the disk space issue (adds more disks; +frees more space) then the full-text system should be able to +automatically continue successfully. -This is the scenario when a user "clones" a node via disk/storage level maneuvers, such as incorrect usage of EBS snapshot or tar'ing up a whole dataDir. +## KP - Killed Processes (linux OOM, etc) + +## RSN - Return of the Shunned Node + +## DLC - Disk Level Copy/Restore of Node + +This is the scenario when a user "clones" a node via disk/storage +level maneuvers, such as incorrect usage of EBS snapshot or tar'ing up +a whole dataDir. The issue is that old cbft.uuid files might still (incorrectly) be copied. -# UI - Full-Text tab in Couchbase's web admin UI +## UI - Full-Text tab in Couchbase's web admin UI -# STATS - Stats Integration into Couchbase's web admin UI +## STATS - Stats Integration into Couchbase's web admin UI -# AUTHI - Auth integration with Couchbase for indexing +## AUTHI - Auth integration with Couchbase for indexing cbft should be able to access any bucket for full-text indexing. -# AUTHM - Auth integration with Couchbase for admin/management +## AUTHM - Auth integration with Couchbase for admin/management cbft's administration should be protected. -# AUTHQ - Auth integration with Couchbase for queries +## AUTHQ - Auth integration with Couchbase for queries cbft's queryability should be protected. -# TLS - TLS/SSL support +## TLS - TLS/SSL support + +## HC - Health Checks -# HC - Health Checks +## TOOLBR - Tools - Backup/Restore -# TOOLBR - Tools - Backup/Restore +## TOOLCI - Tools - cbcollectinfo -# TOOLCI - Tools - cbcollectinfo +## TOOLM - Tools - mortimer -# TOOLM - Tools - mortimer +## TOOLN - Tools - nutshell -# TOOLN - Tools - nutshell +## UPGRADE - Future readiness for upgrades -# UPGRADE - Future readiness for upgrades +## QUOTAM - Memory Quota per node -# QUOTAM - Memory Quota per node +## QUOTAD - Disk Quota per node -# QUOTAD - Disk Quota per node +## BFLIMIT - Limit Backfills + +ns_single_vbucket_mover has a policy feature that limits the number of +backfills to "1 backfill during rebalance in or out of any node" (see +rebalance-flow.txt) + +## RCOMPACT - Index Compactions Controlled Under Rebalance + +During rebalance, ns_server pauses view index compactions until a +configurable number of vbucket moves have occurred for efficiency (see +rebalance-flow.txt) + +## RPMOVE - Partition Moves Controlled Under Rebalance + +During rebalance, ns_server limits outgoing moves to a single vbucket +per node for efficiency (see rebalance-flow.txt). Same should be the +case for PIndex reassignments. + +## RSPREAD - Rebalance Resource Spreading + +ns_server prioritizes VBuckets moves that equalize the spread of +active VBuckets across nodes and also tries to keep indexers busy +across all nodes. PIndex moves should have some equivalent +optimization. ------------------------------------------------- -Random notes +# Random notes / TODO's ip address changes on node joining - node goes from 'ns_1@127.0.0.1' @@ -217,3 +266,7 @@ unknown ok N/A known ok ok index lifecycle + +A vbucket in a view index has a "pending", "active", "cleanup" states, +that are especially used during rebalance orchestration. Perhaps +PIndexes need equivalent states? \ No newline at end of file