From b322eb4171fcff664872d4f9a2b12936c06885dc Mon Sep 17 00:00:00 2001
From: Steve Yen <steve.yen@gmail.com>
Date: Tue, 21 Jul 2015 16:53:19 -0700
Subject: [PATCH] more DESIGN.md ideas from rebalance-flow.txt

---
 DESIGN.md | 159 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 106 insertions(+), 53 deletions(-)

diff --git a/DESIGN.md b/DESIGN.md
index c77183d..5464729 100644
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -7,23 +7,23 @@ and focuses especially on integrating with couchbase's features like
 Rebalance, Failover, etc.
 
 -------------------------------------------------
-Links
+# Links
 
 References to related documents:
 
-* cbgt design - https://github.com/couchbaselabs/cbgt/blob/master/IDEAS.md
+* cbgt design - https://github.com/couchbaselabs/cbgt/blob/master/IDEAS.md (GT)
 * ns-server documents, especially "rebalance flow"...
   * https://github.com/couchbase/ns_server/blob/master/doc
-  * https://github.com/couchbase/ns_server/blob/master/doc/rebalance-flow.txt
+  * https://github.com/couchbase/ns_server/blob/master/doc/rebalance-flow.txt (rebalance-flow.txt)
 
 -------------------------------------------------
-Requirements
+# Requirements
 
 (NOTE: We're currently missing a formal PRD (product requirements
 document), so these requirements are based on anticipated PRD
 requirements.)
 
-# GT-CS1 - Consistent queries during stable topology.
+## GT-CS1 - Consistent queries during stable topology.
 
 (Requirements with the "GT-" prefix originally come from the cbgt
 IDEAS.md design document.)
@@ -32,17 +32,17 @@ Clients should be able to ask, "I want query results where the
 full-text-indexes have incorporated at least up to this set of
 {vbucket to seq-num} pairings."
 
-# GT-CR1 - Consistent queries under datasource rebalance.
+## GT-CR1 - Consistent queries under datasource rebalance.
 
 Full-text queries should be consistent even as data source (Couchbase
 Server) nodes are added and removed in a clean rebalance.
 
-# GT-CR2 - Consistent queries under cbgt topology change.
+## GT-CR2 - Consistent queries under cbgt topology change.
 
 Full-text queries should be consistent even as cbgt nodes are added
 and removed in a clean takeover fashion.
 
-# GT-OC1 - Support optional looser "best effort" options
+## GT-OC1 - Support optional looser "best effort" options
 
 The options might be along a spectrum from stale=ok to totally
 consistent.  The "best effort" option should probably have lower
@@ -51,7 +51,7 @@ latency than a totally consistent CR1 query.
 For example, perhaps the client may want to just ask for consistency
 around just one vbucket-seqnum.
 
-# GT-IA1 - Index aliases.
+## GT-IA1 - Index aliases.
 
 This is a level of indirection to help split data across multiple
 indexes, but also not change your app all the time.  Example: the
@@ -61,19 +61,19 @@ search limited to only the most recent quarter index of
 'last-quarter-sales' alias to the the newest 'sales-2014Q4' index
 without any client-side application changes.
 
-# GT-MQ1 - Multi-index query for a single bucket.
+## GT-MQ1 - Multi-index query for a single bucket.
 
 This is the ability to query multiple indexes in one request for a
 single bucket, such as the "comments-fti" index and the
 "description-fti" index.
 
-# GT-MQ2 - Multi-index query across multiple buckets.
+## GT-MQ2 - Multi-index query across multiple buckets.
 
 This is the ability to query multiple indexes across multiple buckets
 in a single query, such as "find any docs from the customer, employee,
 vendor buckets who have an address or comment about 'dallas'".
 
-# GT-NI1 - Resilient to datasource node down scenarios.
+## GT-NI1 - Resilient to datasource node down scenarios.
 
 If a data source (couchbase cluster server node) goes down, then the
 subset of a cbgt cluster that was indexing data from the down node
@@ -81,7 +81,7 @@ will not be able to make indexing progress.  Those cbgt instances
 should try to automatically reconnect and resume indexing from where
 they left off.
 
-# GT-E1 - The user should be able to see error conditions
+## GT-E1 - The user should be able to see error conditions
 
 For example yellow or red coloring on node down and other error
 conditions.
@@ -96,100 +96,149 @@ In ES, note the frustrating bouncing between yellow, green, red;
 ns-server example, not enough CPU & timeouts leads to status
 bounce-iness.
 
-# GT-NQ1 - Querying still possible if datasource node goes down.
+## GT-NQ1 - Querying still possible if datasource node goes down.
 
 Querying of a cbgt cluster should be able to continue even if some
 datasource nodes are down.
 
-# GT-PI1 - Ability to pause/resume indexing.
+## GT-PI1 - Ability to pause/resume indexing.
 
-# IPAC - IP Address Changes
+## IPAC - IP Address Changes
 
-IP address discovery is "late bound", when couchbase server nodes initially joins to a cluster.  A "cluster of one", in particular, only has an erlang node address of "ns_1@127.0.0.1".
+IP address discovery is "late bound", when couchbase server nodes
+initially joins to a cluster.  A "cluster of one", in particular, only
+has an erlang node address of "ns_1@127.0.0.1".
 
-# BUCKETD - Bucket Deletion Cascades to Full-Text Indexes
+## BUCKETD - Bucket Deletion Cascades to Full-Text Indexes
 
-If a bucket is deleted, any full-text indexes based on that bucket should be also automatically deleted.
+If a bucket is deleted, any full-text indexes based on that bucket
+should be also automatically deleted.
 
-There whould be user visible UI warnings on these "cascading deletes" of cbft indexes.
+There whould be user visible UI warnings on these "cascading deletes"
+of cbft indexes.
 
-# RIO - Rebalance Nodes In/Out
+## RIO - Rebalance Nodes In/Out
 
-# RP - Rebalance progress estimates/indicator
+## RP - Rebalance progress estimates/indicator
 
-# RS - Swap Rebalance
+## RS - Swap Rebalance
 
-# FOH - Hard Failover
+## FOH - Hard Failover
 
-# FOG - Graceful Failover
+## FOG - Graceful Failover
 
 Reject any new requests and wait for any inflight requests to finish
 before failover.
 
-# AB - Add Back Rebalance
+## AB - Add Back Rebalance
 
-# DNR - Delta Node Recovery
+## DNR - Delta Node Recovery
 
-# RP1 - Rebalance Phase 1 - VBucket Replication Phase
-# RP2 - Rebalance Phase 2 - View Indexing Phase
-# RP2 - Rebalance Phase 3 - VBucket Takeover Phase
+## RP1 - Rebalance Phase 1 - VBucket Replication Phase
+## RP2 - Rebalance Phase 2 - View Indexing Phase
+## RP2 - Rebalance Phase 3 - VBucket Takeover Phase
 
-# CIUR - Consistent Indexes Under Rebalance
+## RSTOP - Ability to Stop Rebalance
+
+## MDS-RI - Multidimensional Scaling - ability to rebalance Full-Text
+   indexes indpendent of other services
+
+## RRU-EE - Rebalance Resource Utilization More Efficient With
+   Enterprise Edition
+
+## CIUR - Consistent Indexes Under Rebalance
 
 This is the equivalent of "consistent view indexes under rebalance".
 
-# QUERYR - Querying Replicas
+## QUERYR - Querying Replicas
+
+## QUERYLB - Querying Load Balancing
 
-# QUERYLB - Querying Load Balancing
+## QUERYLB-EE - Query Load Balancing To Replicas With Enterprise Edition
 
-# ODS - Out of Disk Space
+## ODS - Out of Disk Space
 
-# KP - Killed Processes (linux OOM, etc)
+Out of disk space conditions are handled gracefully (not segfaulting).
 
-# RSN - Return of the Shunned Node
+## ODSR - Out of Disk Space Repaired
 
-# DLC - Disk Level Copy/Restore of Node
+After an administrator fixes the disk space issue (adds more disks;
+frees more space) then the full-text system should be able to
+automatically continue successfully.
 
-This is the scenario when a user "clones" a node via disk/storage level maneuvers, such as incorrect usage of EBS snapshot or tar'ing up a whole dataDir.
+## KP - Killed Processes (linux OOM, etc)
+
+## RSN - Return of the Shunned Node
+
+## DLC - Disk Level Copy/Restore of Node
+
+This is the scenario when a user "clones" a node via disk/storage
+level maneuvers, such as incorrect usage of EBS snapshot or tar'ing up
+a whole dataDir.
 
 The issue is that old cbft.uuid files might still (incorrectly) be copied.
 
-# UI - Full-Text tab in Couchbase's web admin UI
+## UI - Full-Text tab in Couchbase's web admin UI
 
-# STATS - Stats Integration into Couchbase's web admin UI
+## STATS - Stats Integration into Couchbase's web admin UI
 
-# AUTHI - Auth integration with Couchbase for indexing
+## AUTHI - Auth integration with Couchbase for indexing
 
 cbft should be able to access any bucket for full-text indexing.
 
-# AUTHM - Auth integration with Couchbase for admin/management
+## AUTHM - Auth integration with Couchbase for admin/management
 
 cbft's administration should be protected.
 
-# AUTHQ - Auth integration with Couchbase for queries
+## AUTHQ - Auth integration with Couchbase for queries
 
 cbft's queryability should be protected.
 
-# TLS - TLS/SSL support
+## TLS - TLS/SSL support
+
+## HC - Health Checks
 
-# HC - Health Checks
+## TOOLBR - Tools - Backup/Restore
 
-# TOOLBR - Tools - Backup/Restore
+## TOOLCI - Tools - cbcollectinfo
 
-# TOOLCI - Tools - cbcollectinfo
+## TOOLM - Tools - mortimer
 
-# TOOLM - Tools - mortimer
+## TOOLN - Tools - nutshell
 
-# TOOLN - Tools - nutshell
+## UPGRADE - Future readiness for upgrades
 
-# UPGRADE - Future readiness for upgrades
+## QUOTAM - Memory Quota per node
 
-# QUOTAM - Memory Quota per node
+## QUOTAD - Disk Quota per node
 
-# QUOTAD - Disk Quota per node
+## BFLIMIT - Limit Backfills
+
+ns_single_vbucket_mover has a policy feature that limits the number of
+backfills to "1 backfill during rebalance in or out of any node" (see
+rebalance-flow.txt)
+
+## RCOMPACT - Index Compactions Controlled Under Rebalance
+
+During rebalance, ns_server pauses view index compactions until a
+configurable number of vbucket moves have occurred for efficiency (see
+rebalance-flow.txt)
+
+## RPMOVE - Partition Moves Controlled Under Rebalance
+
+During rebalance, ns_server limits outgoing moves to a single vbucket
+per node for efficiency (see rebalance-flow.txt).  Same should be the
+case for PIndex reassignments.
+
+## RSPREAD - Rebalance Resource Spreading
+
+ns_server prioritizes VBuckets moves that equalize the spread of
+active VBuckets across nodes and also tries to keep indexers busy
+across all nodes.  PIndex moves should have some equivalent
+optimization.
 
 -------------------------------------------------
-Random notes
+# Random notes / TODO's
 
 ip address changes on node joining
 - node goes from 'ns_1@127.0.0.1'
@@ -217,3 +266,7 @@ unknown    ok         N/A
 known      ok         ok
 
 index lifecycle
+
+A vbucket in a view index has a "pending", "active", "cleanup" states,
+that are especially used during rebalance orchestration.  Perhaps
+PIndexes need equivalent states?
\ No newline at end of file