Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add WAL failover configuration #120509

Merged
merged 2 commits into from
Mar 18, 2024
Merged

Commits on Mar 18, 2024

  1. storage/fs: add reference counting to Envs

    Add reference counting to fs.Envs, so that multiple Engines can take on
    references to an Env. The underlying Env's resources won't be released until
    all references are released. This will be used by WAL failover to ensure that
    an fs.Env used as a failover destination isn't closed prematurely.
    
    Epic: none
    Release note: none
    jbowens committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    52c410b View commit details
    Browse the repository at this point in the history
  2. storage: add WAL failover

    Introduce support for configuring a multi-store CockroachDB node to failover a
    store's write-ahead log (WAL) to another store's data directory. Failing over
    the write-ahead log may allow some operations against a store to continue to
    complete despite temporary unavailability of the underlying storage.
    
    Customers must opt into WAL failover by passing `--wal-failover=among-stores`
    to `cockroach start` or setting the env var
    `COCKROACH_WAL_FAILOVER=among-stores`. On start, cockroach will assign each
    store another store to be its failover destination. Cockroach will begin
    monitoring the latency of all WAL writes. If latency to the WAL exceeds the
    value of the storage.wal_failover.unhealthy_op_threshold cluster setting,
    Cockroach will attempt to write WAL entries to its secondary store's volume.
    
    If a user wishes to disable WAL failover, they must restart the node setting
    `--wal-failover=disabled`.
    
    Close cockroachdb#119418.
    Informs cockroachdb/pebble#3230
    Epic: CRDB-35401
    
    Release note (ops change): Introduces a new start option (--wal-failover or
    COCKROACH_WAL_FAILOVER env var) to opt into failing over WALs between stores in
    multi-store nodes. Introduces a new storage.wal_failover.unhealthy_op_threshold
    cluster setting for configuring the latency threshold at which a WAL write is
    considered unhealthy.
    jbowens committed Mar 18, 2024
    Configuration menu
    Copy the full SHA
    1245a7e View commit details
    Browse the repository at this point in the history