From 0027abe8646e09d87cd0faf8208f7f95f001aa71 Mon Sep 17 00:00:00 2001
From: Steve Yen <steve.yen@gmail.com>
Date: Wed, 22 Jul 2015 18:09:35 -0700
Subject: [PATCH] more DESIGN.md notes

---
 DESIGN.md | 151 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 143 insertions(+), 8 deletions(-)

diff --git a/DESIGN.md b/DESIGN.md
index 3f7edb1..adf5e93 100644
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -23,10 +23,10 @@ References to related documents:
 document), so these requirements are based on anticipated PRD
 requirements.)
 
-## GT-CS1 - Consistent queries during stable topology.
+Requirements with the "GT-" prefix originally come from the cbgt
+IDEAS.md design document.
 
-(Requirements with the "GT-" prefix originally come from the cbgt
-IDEAS.md design document.)
+## GT-CS1 - Consistent queries during stable topology.
 
 Clients should be able to ask, "I want query results where the
 full-text-indexes have incorporated at least up to this set of
@@ -103,12 +103,15 @@ datasource nodes are down.
 
 ## GT-PI1 - Ability to pause/resume indexing.
 
-## IPAC - IP Address Changes.
+## IPADDR - IP Address Changes.
 
 IP address discovery is "late bound", when couchbase server nodes
 initially joins to a cluster.  A "cluster of one", in particular, only
 has an erlang node address of "ns_1@127.0.0.1".
 
+ns-server also has feature where node names might also be manually
+assigned.
+
 ## BUCKETD - Bucket Deletion Cascades to Full-Text Indexes.
 
 If a bucket is deleted, any full-text indexes based on that bucket
@@ -117,14 +120,28 @@ should be also automatically deleted.
 There whould be user visible UI warnings on these "cascading deletes"
 of cbft indexes.
 
+(Perhaps ns-server invokes a synchronous cmd to unregister / delete
+cbft indexes?)
+
+(What about cascade delete (or listing) of index aliases that
+(transitively) point to an index (or to a bucket datasource)?)
+
+## BUCKETR - Bucket Deletion & Recreation with the same name.
+
+## BUCKETF - Bucket Flush.
+
 ## RIO - Rebalance Nodes In/Out.
 
+(Need ability / REST API to quiesce or cool down a cbft process?)
+
 ## RP - Rebalance progress estimates/indicator.
 
 ## RS - Swap Rebalance.
 
 ## FOH - Hard Failover.
 
+Even failover in the midst of cbft rebalance ("put the pencils down").
+
 ## FOG - Graceful Failover.
 
 Reject any new requests and wait for any inflight requests to finish
@@ -152,9 +169,15 @@ This is the equivalent of "consistent view indexes under rebalance".
 
 ## QUERYR - Querying Replicas.
 
-## QUERYLB - Querying Load Balancing.
+## QUERYLB - Query Load Balancing Amongst Replicas.
+
+## QUERYLB-EE - Query Load Balancing Amongst Replicas, But Only With
+   Enterprise Edition.
+
+Perhaps EE needs to be more featureful than simple round-robin or
+random load-balancing, but targets the most up-to-date replica.
 
-## QUERYLB-EE - Query Load Balancing To Replicas With Enterprise Edition.
+Or the least-busy replica.
 
 ## ODS - Out of Disk Space.
 
@@ -194,6 +217,10 @@ cbft's administration should be protected.
 
 cbft's queryability should be protected.
 
+## AUTHPW - Auth credentials/pswd can change (or is reset).
+
+Does this affect cbauth module?
+
 ## TLS - TLS/SSL support.
 
 ## HC - Health Checks.
@@ -214,12 +241,16 @@ During rebalance, ns_server pauses view index compactions until a
 configurable number of vbucket moves have occurred for efficiency (see
 rebalance-flow.txt)
 
+This might prevent huge disk space blowup on rebalance (MB-6799?).
+
 ## RPMOVE - Partition Moves Controlled Under Rebalance.
 
 During rebalance, ns_server limits outgoing moves to a single vbucket
 per node for efficiency (see rebalance-flow.txt).  Same should be the
 case for PIndex reassignments.
 
+See "rebalanceMovesBeforeCompaction".
+
 ## RSPREAD - Rebalance Resource Spreading.
 
 ns_server prioritizes VBuckets moves that equalize the spread of
@@ -227,6 +258,17 @@ active VBuckets across nodes and also tries to keep indexers busy
 across all nodes.  PIndex moves should have some equivalent
 optimization.
 
+## COMPACTF - Ability for force compaction right now.
+
+## COMPACTO - Ability to compact files offline (while server is down?)
+
+Might be useful to recover from out-of-disk space scenarios.
+
+## COMPACTP - Ability to pause/resume automated compactions (not
+   explicitly forced).
+
+## COMPACTC - Ability to configure/reconfigure automated compaction policy.
+
 ## TOOLBR - Tools - Backup/Restore.
 
 ## TOOLCI - Tools - cbcollectinfo.
@@ -237,17 +279,46 @@ optimization.
 
 ## UPGRADE - Future readiness for upgrades.
 
+## UTEST - Unit Testable.
+
+## QTEST - QE Testable / Instrumentable.
+
+## REQTRACE - Ability to trace a request down through the layers.
+
+## REQPILL - Ability to send a fake request "pill" down through the layers.
+
+## REQTIME - Ability to track "where is the time going" on a
+   per-request, individual request basis.
+
+## REQTHRT - Request Throttling
+
+ns-server REST & CAPI support a "restRequestLimit" configuration.
+
+## DCPNAME - DCP stream naming or prefix
+
+Allow for DCP stream prefix to allow for easier diagnosability &
+correlation (e.g., these DCP streams in KV-engine come from cbft due
+to these indexes from these nodes).
+
+## TIMEREW - Handle NTP backward time jumps gracefully
+
 -------------------------------------------------
 # Random notes / TODO's
 
 ip address changes on node joining
-- node goes from 'ns_1@127.0.0.1'
+- node goes from 'ns_1@127.0.0.1' (or 0.0.0.0?)
     to ns_1@REAL_IP_ADDR
 ip address rename
 
 bind-addr needs a REAL_IP_ADDR
 from the very start to be clusterable?
 
+all nodes tart of on node with "wrong" bindHTTP addr (like 127.0.0.1
+or 0.0.0.0), so ideas:
+- fix reliance on bindHTTP
+- allow true node UUID, with "outside" mapping of UUID to contactable http address
+- add command to rename a node's bindHTTP
+
 best effort queries
 - vs return error
 - keep old index entries even if index definition changes
@@ -269,4 +340,68 @@ index lifecycle
 
 A vbucket in a view index has a "pending", "active", "cleanup" states,
 that are especially used during rebalance orchestration.  Perhaps
-PIndexes need equivalent states?
\ No newline at end of file
+PIndexes need equivalent states?
+
+idea: never run planner in cbftint?  But allow ns-server to invoke
+planner on as-needed basis whenever ns-server needs (during a master
+orchestrator / DML change event).  Then EE edition could allow a more
+advanced planner.
+
+add cmd-line param where ns-server can force a default node UUID, for
+better cross-correlation/debuggability of log events.  And, we can
+look for the return of the shunned node.
+
+add cmd-line tools/params for outside systems (like ns-server) to
+add/remove nodes?
+
+From rebalance-flow.txt on concurrency...
+
+%%           VBucket Move Scheduling
+%% Time
+%%
+%%   |   /------------\
+%%   |   | Backfill 0 |                       Backfills cannot happen
+%%   |   \------------/                       concurrently.
+%%   |         |             /------------\
+%%   |   +------------+      | Backfill 1 |
+%%   |   | Index File |      \------------/
+%%   |   |     0      |            |
+%%   |   |            |      +------------+   However, indexing _can_ happen
+%%   |   |            |      | Index File |   concurrently with backfills and
+%%   |   |            |      |     1      |   other indexing.
+%%   |   |            |      |            |
+%%   |   +------------+      |            |
+%%   |         |             |            |
+%%   |         |             +------------+
+%%   |         |                   |
+%%   |         \---------+---------/
+%%   |                   |
+%%   |   /--------------------------------\   Compaction for a set of vbucket moves
+%%   |   |  Compact both source and dest. |   cannot happen concurrently with other
+%%   v   \--------------------------------/   vbucket moves.
+%%
+%%
+
+Policy ideas...
+
+- Do node adds first, before removes?
+  Favor more capacity earlier.
+
+- Move all KV vbuckets before moving any cbft pindexes?
+
+- Favor FT index builds on new nodes, before FT index builds on
+  remaining nodes?  (Favors utilizing empty disks earlier.)
+
+- Favor FT index builds with priority 0 pindexes first.
+
+Two different kinds of rebalance...
+
+- KV vbucket rebalancing
+- cbft pindex rebalancing
+
+Current, easiest cbft pindex rebalance implementation...
+- easy, but bad behavior (CE edition--?)
+- cbft process started on new node, joins the cfg...
+- then instant (re-)planner & janitor-fication...
+- means apparently full-text index downtime and DDoS via tons of
+  concurrent DCP backfills.