Skip to content

Commit

Permalink
more DESIGN.md ideas from rebalance-flow.txt
Browse files Browse the repository at this point in the history
  • Loading branch information
steveyen committed Jul 21, 2015
1 parent d64d67f commit b322eb4
Showing 1 changed file with 106 additions and 53 deletions.
159 changes: 106 additions & 53 deletions DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,23 +7,23 @@ and focuses especially on integrating with couchbase's features like
Rebalance, Failover, etc.

-------------------------------------------------
Links
# Links

References to related documents:

* cbgt design - https://github.com/couchbaselabs/cbgt/blob/master/IDEAS.md
* cbgt design - https://github.com/couchbaselabs/cbgt/blob/master/IDEAS.md (GT)
* ns-server documents, especially "rebalance flow"...
* https://github.com/couchbase/ns_server/blob/master/doc
* https://github.com/couchbase/ns_server/blob/master/doc/rebalance-flow.txt
* https://github.com/couchbase/ns_server/blob/master/doc/rebalance-flow.txt (rebalance-flow.txt)

-------------------------------------------------
Requirements
# Requirements

(NOTE: We're currently missing a formal PRD (product requirements
document), so these requirements are based on anticipated PRD
requirements.)

# GT-CS1 - Consistent queries during stable topology.
## GT-CS1 - Consistent queries during stable topology.

(Requirements with the "GT-" prefix originally come from the cbgt
IDEAS.md design document.)
Expand All @@ -32,17 +32,17 @@ Clients should be able to ask, "I want query results where the
full-text-indexes have incorporated at least up to this set of
{vbucket to seq-num} pairings."

# GT-CR1 - Consistent queries under datasource rebalance.
## GT-CR1 - Consistent queries under datasource rebalance.

Full-text queries should be consistent even as data source (Couchbase
Server) nodes are added and removed in a clean rebalance.

# GT-CR2 - Consistent queries under cbgt topology change.
## GT-CR2 - Consistent queries under cbgt topology change.

Full-text queries should be consistent even as cbgt nodes are added
and removed in a clean takeover fashion.

# GT-OC1 - Support optional looser "best effort" options
## GT-OC1 - Support optional looser "best effort" options

The options might be along a spectrum from stale=ok to totally
consistent. The "best effort" option should probably have lower
Expand All @@ -51,7 +51,7 @@ latency than a totally consistent CR1 query.
For example, perhaps the client may want to just ask for consistency
around just one vbucket-seqnum.

# GT-IA1 - Index aliases.
## GT-IA1 - Index aliases.

This is a level of indirection to help split data across multiple
indexes, but also not change your app all the time. Example: the
Expand All @@ -61,27 +61,27 @@ search limited to only the most recent quarter index of
'last-quarter-sales' alias to the the newest 'sales-2014Q4' index
without any client-side application changes.

# GT-MQ1 - Multi-index query for a single bucket.
## GT-MQ1 - Multi-index query for a single bucket.

This is the ability to query multiple indexes in one request for a
single bucket, such as the "comments-fti" index and the
"description-fti" index.

# GT-MQ2 - Multi-index query across multiple buckets.
## GT-MQ2 - Multi-index query across multiple buckets.

This is the ability to query multiple indexes across multiple buckets
in a single query, such as "find any docs from the customer, employee,
vendor buckets who have an address or comment about 'dallas'".

# GT-NI1 - Resilient to datasource node down scenarios.
## GT-NI1 - Resilient to datasource node down scenarios.

If a data source (couchbase cluster server node) goes down, then the
subset of a cbgt cluster that was indexing data from the down node
will not be able to make indexing progress. Those cbgt instances
should try to automatically reconnect and resume indexing from where
they left off.

# GT-E1 - The user should be able to see error conditions
## GT-E1 - The user should be able to see error conditions

For example yellow or red coloring on node down and other error
conditions.
Expand All @@ -96,100 +96,149 @@ In ES, note the frustrating bouncing between yellow, green, red;
ns-server example, not enough CPU & timeouts leads to status
bounce-iness.

# GT-NQ1 - Querying still possible if datasource node goes down.
## GT-NQ1 - Querying still possible if datasource node goes down.

Querying of a cbgt cluster should be able to continue even if some
datasource nodes are down.

# GT-PI1 - Ability to pause/resume indexing.
## GT-PI1 - Ability to pause/resume indexing.

# IPAC - IP Address Changes
## IPAC - IP Address Changes

IP address discovery is "late bound", when couchbase server nodes initially joins to a cluster. A "cluster of one", in particular, only has an erlang node address of "ns_1@127.0.0.1".
IP address discovery is "late bound", when couchbase server nodes
initially joins to a cluster. A "cluster of one", in particular, only
has an erlang node address of "ns_1@127.0.0.1".

# BUCKETD - Bucket Deletion Cascades to Full-Text Indexes
## BUCKETD - Bucket Deletion Cascades to Full-Text Indexes

If a bucket is deleted, any full-text indexes based on that bucket should be also automatically deleted.
If a bucket is deleted, any full-text indexes based on that bucket
should be also automatically deleted.

There whould be user visible UI warnings on these "cascading deletes" of cbft indexes.
There whould be user visible UI warnings on these "cascading deletes"
of cbft indexes.

# RIO - Rebalance Nodes In/Out
## RIO - Rebalance Nodes In/Out

# RP - Rebalance progress estimates/indicator
## RP - Rebalance progress estimates/indicator

# RS - Swap Rebalance
## RS - Swap Rebalance

# FOH - Hard Failover
## FOH - Hard Failover

# FOG - Graceful Failover
## FOG - Graceful Failover

Reject any new requests and wait for any inflight requests to finish
before failover.

# AB - Add Back Rebalance
## AB - Add Back Rebalance

# DNR - Delta Node Recovery
## DNR - Delta Node Recovery

# RP1 - Rebalance Phase 1 - VBucket Replication Phase
# RP2 - Rebalance Phase 2 - View Indexing Phase
# RP2 - Rebalance Phase 3 - VBucket Takeover Phase
## RP1 - Rebalance Phase 1 - VBucket Replication Phase
## RP2 - Rebalance Phase 2 - View Indexing Phase
## RP2 - Rebalance Phase 3 - VBucket Takeover Phase

# CIUR - Consistent Indexes Under Rebalance
## RSTOP - Ability to Stop Rebalance

## MDS-RI - Multidimensional Scaling - ability to rebalance Full-Text
indexes indpendent of other services

## RRU-EE - Rebalance Resource Utilization More Efficient With
Enterprise Edition

## CIUR - Consistent Indexes Under Rebalance

This is the equivalent of "consistent view indexes under rebalance".

# QUERYR - Querying Replicas
## QUERYR - Querying Replicas

## QUERYLB - Querying Load Balancing

# QUERYLB - Querying Load Balancing
## QUERYLB-EE - Query Load Balancing To Replicas With Enterprise Edition

# ODS - Out of Disk Space
## ODS - Out of Disk Space

# KP - Killed Processes (linux OOM, etc)
Out of disk space conditions are handled gracefully (not segfaulting).

# RSN - Return of the Shunned Node
## ODSR - Out of Disk Space Repaired

# DLC - Disk Level Copy/Restore of Node
After an administrator fixes the disk space issue (adds more disks;
frees more space) then the full-text system should be able to
automatically continue successfully.

This is the scenario when a user "clones" a node via disk/storage level maneuvers, such as incorrect usage of EBS snapshot or tar'ing up a whole dataDir.
## KP - Killed Processes (linux OOM, etc)

## RSN - Return of the Shunned Node

## DLC - Disk Level Copy/Restore of Node

This is the scenario when a user "clones" a node via disk/storage
level maneuvers, such as incorrect usage of EBS snapshot or tar'ing up
a whole dataDir.

The issue is that old cbft.uuid files might still (incorrectly) be copied.

# UI - Full-Text tab in Couchbase's web admin UI
## UI - Full-Text tab in Couchbase's web admin UI

# STATS - Stats Integration into Couchbase's web admin UI
## STATS - Stats Integration into Couchbase's web admin UI

# AUTHI - Auth integration with Couchbase for indexing
## AUTHI - Auth integration with Couchbase for indexing

cbft should be able to access any bucket for full-text indexing.

# AUTHM - Auth integration with Couchbase for admin/management
## AUTHM - Auth integration with Couchbase for admin/management

cbft's administration should be protected.

# AUTHQ - Auth integration with Couchbase for queries
## AUTHQ - Auth integration with Couchbase for queries

cbft's queryability should be protected.

# TLS - TLS/SSL support
## TLS - TLS/SSL support

## HC - Health Checks

# HC - Health Checks
## TOOLBR - Tools - Backup/Restore

# TOOLBR - Tools - Backup/Restore
## TOOLCI - Tools - cbcollectinfo

# TOOLCI - Tools - cbcollectinfo
## TOOLM - Tools - mortimer

# TOOLM - Tools - mortimer
## TOOLN - Tools - nutshell

# TOOLN - Tools - nutshell
## UPGRADE - Future readiness for upgrades

# UPGRADE - Future readiness for upgrades
## QUOTAM - Memory Quota per node

# QUOTAM - Memory Quota per node
## QUOTAD - Disk Quota per node

# QUOTAD - Disk Quota per node
## BFLIMIT - Limit Backfills

ns_single_vbucket_mover has a policy feature that limits the number of
backfills to "1 backfill during rebalance in or out of any node" (see
rebalance-flow.txt)

## RCOMPACT - Index Compactions Controlled Under Rebalance

During rebalance, ns_server pauses view index compactions until a
configurable number of vbucket moves have occurred for efficiency (see
rebalance-flow.txt)

## RPMOVE - Partition Moves Controlled Under Rebalance

During rebalance, ns_server limits outgoing moves to a single vbucket
per node for efficiency (see rebalance-flow.txt). Same should be the
case for PIndex reassignments.

## RSPREAD - Rebalance Resource Spreading

ns_server prioritizes VBuckets moves that equalize the spread of
active VBuckets across nodes and also tries to keep indexers busy
across all nodes. PIndex moves should have some equivalent
optimization.

-------------------------------------------------
Random notes
# Random notes / TODO's

ip address changes on node joining
- node goes from 'ns_1@127.0.0.1'
Expand Down Expand Up @@ -217,3 +266,7 @@ unknown ok N/A
known ok ok

index lifecycle

A vbucket in a view index has a "pending", "active", "cleanup" states,
that are especially used during rebalance orchestration. Perhaps
PIndexes need equivalent states?

0 comments on commit b322eb4

Please sign in to comment.