Skip to content

CBG-5226: Opt-in use of _system._mobile for databases and bootstrap#8309

Merged
bbrks merged 6 commits into
mainfrom
CBG-5226
May 28, 2026
Merged

CBG-5226: Opt-in use of _system._mobile for databases and bootstrap#8309
bbrks merged 6 commits into
mainfrom
CBG-5226

Conversation

@bbrks
Copy link
Copy Markdown
Member

@bbrks bbrks commented May 28, 2026

CBG-5226

  • Adds support for the dual-collection MetadataStore wrapper on DatabaseContext in SG based on opt-in configuration - as well as similar fallback read logic in the bootstrap code.
  • Adds support for _system._mobile storage for all bootstrap metadata (registry, dbconfigs) for:
    • New deployments that have opted in at the bootstrap config level (Capella), or have their first database opting in to use the system metadata collection.
    • Existing deployments that have had all databases in the bucket fully migrated with the opt-in.
  • Uses a migration status tracking doc _sync:metadata_migration_status in _system._mobile to track per-bucket and per-database migration state. Used to bypass the dual metadata store wrapper post-migration, and also used to determine the last migrated database for bootstrap/registry migration.
  • Does not perform DatabaseContext migration - Upcoming PR as CBG-5228 - though all writes get routed to the new location after opting in, and reads do fall back to the fallback datastore even without the migration run.

Diagrams

Logic to determine which collection to use for bootstrap (not gated behind any opt-in - we only have the opt-in for databases). We optimistically write to _system if we find this is a new deployment. We write to _default if there is an existing deployment, but have a migration job to move bootstrap-level stuff over once the last database has been fully migrated.

  flowchart TD
      A[First bootstrap-doc op on bucket] --> C[Read _sync:registry]
      C --> D{Where is it?}
      D -- _system._mobile --> E[Target: _system._mobile]
      D -- _default._default --> F[Target: _default._default]
      D -- not found --> G{Cluster flag<br/>OR per-DB opt-in?}
      G -- yes --> E
      G -- no --> F
Loading

Logic to determine whether db is new and can bypass metadata migration and write directly to _system, or if we need to run migrate.

  flowchart TD
      A[Create or update db] --> B{resolveUseSystemMetadataCollection<br/>per-DB wins over cluster flag}
      B -- no --> J[MetadataStore = _default._default<br/>no wrapper]
      B -- yes --> C[Wrap MetadataStore<br/>base.NewMetadataStore]
      C --> D[probeLegacyPerDBMetadata]
      D --> E{_sync:seq<br/>or _sync:m_id:seq<br/>exists in _default._default?}
      E -- found --> F[Arm migration:<br/>not_started entry in<br/>migration_status]
      E -- none --> G[SetMigrationComplete<br/>on wrapper immediately]
      G --> H[shouldRunMetadataMigration<br/>short-circuits — no manager arm]
      F --> I[MetadataMigrationManager<br/>processes entry]
Loading

Integration Tests

…ad fallback

- Adds the dual-collection MetadataStore wrapper on DatabaseContext,
  opt-in via per-DB use_system_metadata_collection (overrides cluster
  default).
- Routes bootstrap metadata (registry, dbconfigs, cbgt cfg) into
  _system._mobile on a per-bucket basis, decided from a registry-location
  probe plus the cluster flag / per-DB opt-in. Reads fall back to
  _default._default until migration completes; writes pin to the owning
  collection so CAS stays consistent across retries.
- Tracks migration lifecycle in _sync:metadata_migration_status (born in
  _system._mobile): per-DB state map plus a bootstrap-copy phase that
  runs after every DB completes, copies bootstrap docs into
  _system._mobile, then disables fallback reads.
- New-DB fast path (probeLegacyPerDBMetadata) skips arming a migration
  when there is no legacy _sync:seq to migrate from.
- Does not perform the per-DatabaseContext data copy yet — landing as
  CBG-5228. Writes already route to the new location after opt-in, and
  reads fall back so existing data remains accessible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bbrks bbrks requested a review from gregns1 May 28, 2026 13:11
@bbrks bbrks self-assigned this May 28, 2026
Copilot AI review requested due to automatic review settings May 28, 2026 13:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements CBG-5226 by adding an opt-in path for storing Sync Gateway bootstrap metadata and per-database metadata in _system._mobile, with read-fallback to legacy _default._default during the migration window. This wires migration status tracking via a bucket-level _sync:metadata_migration_status document and adds rosmar + Couchbase bootstrap-connection support for dual-collection reads/writes.

Changes:

  • Add per-DB base.MetadataStore wrapping to target _system._mobile when opted in, including a “new DB” fast path and per-bucket bootstrap-migration completion trigger wiring.
  • Extend base.BootstrapConnection (CouchbaseCluster + RosmarCluster) to support dual-collection bootstrap metadata operations, migration status doc CRUD, and bucket-target caching.
  • Add/adjust tests to exercise dual-collection bootstrap semantics and metadata migration gating behavior.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
rest/server_context.go Routes DB metadata to _system._mobile via MetadataStore, adds migration-status stamping/refresh logic, and adds bucket-level bootstrap migration completion flow.
rest/metadatamigrationtest/metadata_migration_test.go Adjusts test setup to seed legacy metadata so migration gating behavior can be exercised.
rest/main.go Plumbs startup flag into bootstrap connection creation to enable bootstrap dual-collection behavior.
rest/admin_api.go Adds per-bucket bootstrap-target hinting prior to initial registry/dbconfig writes for new deployments.
db/database.go Wires new migration status hooks into DatabaseContext and refines metadata migration arming behavior.
db/background_mgr_metadata_migration.go Implements status-doc updates for per-DB migration lifecycle (stubbed copy step for follow-up PR).
base/bootstrap.go Extends BootstrapConnection interface and CouchbaseCluster implementation for dual-collection bootstrap metadata + migration status handling.
base/rosmar_cluster.go Implements the new dual-collection bootstrap behavior and migration-status APIs for rosmar.
base/metadata_migration_status.go Introduces the bucket-level metadata migration status document model and helpers.
base/bootstrap_test.go Adds dual-collection bootstrap tests (insert/write/touch/delete fallback semantics) and updates cluster constructors.
go.mod Bumps rosmar dependency to a newer pseudo-version.
go.sum Updates checksums for the rosmar bump.

Comment thread rest/server_context.go Outdated
Comment thread rest/server_context.go
…edge

The CAS claim that flipped bootstrap.state not_started → in_progress was
unsafe in two ways: (1) UpdateMetadataMigrationStatus re-invokes the
mutator on CAS retries, so a stale `claimed = true` from an earlier
iteration could let a non-claimant proceed alongside the real winner;
(2) if a node crashed or MigrateBootstrapDocs returned an error after
the in_progress write, the bucket was permanently wedged — the
not_started guard prevented any peer from re-entering the claim path.

MigrateBootstrapDocs is already idempotent under concurrent execution
(primary Insert tolerates ErrDocumentExists, fallback Remove uses
observed-CAS), so the bucket-level claim was buying very little while
introducing the wedge. Now the function runs the copy step directly and
the only CAS-guarded write is the final not_started → complete
transition, which short-circuits if a peer has already completed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bbrks bbrks removed their assignment May 28, 2026
bbrks and others added 2 commits May 28, 2026 15:42
…bility

The bootstrap-block state machine is now two-valued (pending → complete)
since the in_progress claim was removed. The name "not_started" is
misleading in that context — multiple nodes can have attempted the
migration, so the doc's status of "not_started" doesn't mean no work
has happened. Renamed to "pending", with a comment clarifying the
two-valued machine.

Added LastAttemptedAt and Attempts to BootstrapMigrationStatus as soft
observability fields, written on each entry into the migration loop.
These do not gate any behaviour — they're for operators trying to tell
"no one has tried this bucket yet" from "we've been retrying for an
hour." Encoding the latter as a state value would invite a future
reader to add a gate against it and reintroduce the wedge condition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gregns1
gregns1 previously approved these changes May 28, 2026
Copy link
Copy Markdown
Contributor

@gregns1 gregns1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for these to be done in follow up PR/ticket so approving but everything else looks good to me.

Comment thread base/bootstrap_test.go Outdated
Comment thread base/bootstrap_test.go Outdated
bbrks and others added 2 commits May 28, 2026 17:11
seedLegacyBootstrapDoc previously branched on TestUseXattrs(), which
conflates two unrelated concerns: SG's general application-metadata
xattr mode and the bootstrap-config persistence mode (the
useXattrConfig argument to NewCouchbaseCluster). The bootstrap
persistence mode is its own setting and should be exercised explicitly
in both states.

Threaded an explicit useXattrs parameter through seedLegacyBootstrapDoc
and the dual-collection bootstrap test fixture, and converted the five
tests built on that fixture to subtests covering both modes. Rosmar's
bootstrap path has no xattr-mode variant, so the bootstrap_xattr=true
subtest skips when running against Rosmar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bbrks bbrks merged commit 1b34a6a into main May 28, 2026
48 checks passed
@bbrks bbrks deleted the CBG-5226 branch May 28, 2026 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants