Skip to content
Browse files
Release note 4.11 note
### Motivation
Adding Release-note for 4.11.0

Please review commit 98dbb91

*Note:* This PR is based on #2360 . I will rebase this PR once #2360 is merged.

Reviewers: Enrico Olivelli <>

This closes #2361 from rdhabalia/release_note_4.11_note
  • Loading branch information
rdhabalia committed Jul 8, 2020
1 parent b6ed41f commit 20eb8d9897b1f41dbd42a10533751dbf5a8f2c24
Showing 67 changed files with 7,786 additions and 6 deletions.
@@ -8,6 +8,8 @@ destination: local-generated

- "4.11.1"
- "4.11.0"
- "4.10.0"
- "4.9.2"
- "4.9.1"
@@ -37,8 +39,8 @@ archived_versions:
- "4.2.0"
- "4.1.0"
- "4.0.0"
latest_version: "4.11.0-SNAPSHOT"
latest_release: "4.10.0"
latest_version: "4.12.0-SNAPSHOT"
latest_release: "4.11.1"
stable_release: "4.9.2"
- "4.7.3"
@@ -0,0 +1,125 @@
title: Using AutoRecovery

When a {% pop bookie %} crashes, all {% pop ledgers %} on that bookie become under-replicated. In order to bring all ledgers in your BookKeeper cluster back to full replication, you'll need to *recover* the data from any offline bookies. There are two ways to recover bookies' data:

1. Using [manual recovery](#manual-recovery)
1. Automatically, using [*AutoRecovery*](#autorecovery)

## Manual recovery

You can manually recover failed bookies using the [`bookkeeper`](../../reference/cli) command-line tool. You need to specify:

* the `shell recover` option
* the IP and port for the failed bookie

Here's an example:

$ bin/bookkeeper shell recover \ # IP and port for the failed bookie

If you wish, you can also specify which ledgers you'd like to recover. Here's an example:

$ bin/bookkeeper shell recover \ \ # IP and port for the failed bookie
--ledger ledgerID # ledgerID which you want to recover

### The manual recovery process

When you initiate a manual recovery process, the following happens:

1. The client (the process running ) reads the metadata of active ledgers from ZooKeeper.
1. The ledgers that contain fragments from the failed bookie in their ensemble are selected.
1. A recovery process is initiated for each ledger in this list and the rereplication process is run for each ledger.
1. Once all the ledgers are marked as fully replicated, bookie recovery is finished.

## AutoRecovery

AutoRecovery is a process that:

* automatically detects when a {% pop bookie %} in your BookKeeper cluster has become unavailable and then
* rereplicates all the {% pop ledgers %} that were stored on that bookie.

AutoRecovery can be run in two ways:

1. On dedicated nodes in your BookKeeper cluster
1. On the same machines on which your bookies are running

## Running AutoRecovery

You can start up AutoRecovery using the [`autorecovery`](../../reference/cli#bookkeeper-autorecovery) command of the [`bookkeeper`](../../reference/cli) CLI tool.

$ bin/bookkeeper autorecovery

> The most important thing to ensure when starting up AutoRecovery is that the ZooKeeper connection string specified by the [`zkServers`](../../reference/config#zkServers) parameter points to the right ZooKeeper cluster.
If you start up AutoRecovery on a machine that is already running a bookie, then the AutoRecovery process will run alongside the bookie on a separate thread.

You can also start up AutoRecovery on a fresh machine if you'd like to create a dedicated cluster of AutoRecovery nodes.

## Configuration

There are a handful of AutoRecovery-related configs in the [`bk_server.conf`](../../reference/config) configuration file. For a listing of those configs, see [AutoRecovery settings](../../reference/config#autorecovery-settings).

## Disable AutoRecovery

You can disable AutoRecovery at any time, for example during maintenance. Disabling AutoRecovery ensures that bookies' data isn't unnecessarily rereplicated when the bookie is only taken down for a short period of time, for example when the bookie is being updated or the configuration if being changed.

You can disable AutoRecover using the [`bookkeeper`](../../reference/cli#bookkeeper-shell-autorecovery) CLI tool:

$ bin/bookkeeper shell autorecovery -disable

Once disabled, you can reenable AutoRecovery using the [`enable`](../../reference/cli#bookkeeper-shell-autorecovery) shell command:

$ bin/bookkeeper shell autorecovery -enable

## AutoRecovery architecture

AutoRecovery has two components:

1. The [**auditor**](#auditor) (see the [`Auditor`](../../api/javadoc/org/apache/bookkeeper/replication/Auditor.html) class) is a singleton node that watches bookies to see if they fail and creates rereplication tasks for the ledgers on failed bookies.
1. The [**replication worker**](#replication-worker) (see the [`ReplicationWorker`](../../api/javadoc/org/apache/bookkeeper/replication/ReplicationWorker.html) class) runs on each bookie and executes rereplication tasks provided by the auditor.

Both of these components run as threads in the [`AutoRecoveryMain`](../../api/javadoc/org/apache/bookkeeper/replication/AutoRecoveryMain) process, which runs on each bookie in the cluster. All recovery nodes participate in leader election---using ZooKeeper---to decide which node becomes the auditor. Nodes that fail to become the auditor watch the elected auditor and run an election process again if they see that the auditor node has failed.

### Auditor

The auditor watches all bookies in the cluster that are registered with ZooKeeper. Bookies register with ZooKeeper at startup. If the bookie crashes or is killed, the bookie's registration in ZooKeeper disappears and the auditor is notified of the change in the list of registered bookies.

When the auditor sees that a bookie has disappeared, it immediately scans the complete {% pop ledger %} list to find ledgers that have data stored on the failed bookie. Once it has a list of ledgers for that bookie, the auditor will publish a rereplication task for each ledger under the `/underreplicated/` [znode]( in ZooKeeper.

### Replication Worker

Each replication worker watches for tasks being published by the auditor on the `/underreplicated/` znode in ZooKeeper. When a new task appears, the replication worker will try to get a lock on it. If it cannot acquire the lock, it will try the next entry. The locks are implemented using ZooKeeper ephemeral znodes.

The replication worker will scan through the rereplication task's ledger for fragments of which its local bookie is not a member. When it finds fragments matching this criterion, it will replicate the entries of that fragment to the local bookie. If, after this process, the ledger is fully replicated, the ledgers entry under /underreplicated/ is deleted, and the lock is released. If there is a problem replicating, or there are still fragments in the ledger which are still underreplicated (due to the local bookie already being part of the ensemble for the fragment), then the lock is simply released.

If the replication worker finds a fragment which needs rereplication, but does not have a defined endpoint (i.e. the final fragment of a ledger currently being written to), it will wait for a grace period before attempting rereplication. If the fragment needing rereplication still does not have a defined endpoint, the ledger is fenced and rereplication then takes place.

This avoids the situation in which a client is writing to a ledger and one of the bookies goes down, but the client has not written an entry to that bookie before rereplication takes place. The client could continue writing to the old fragment, even though the ensemble for the fragment had changed. This could lead to data loss. Fencing prevents this scenario from happening. In the normal case, the client will try to write to the failed bookie within the grace period, and will have started a new fragment before rereplication starts.

You can configure this grace period using the [`openLedgerRereplicationGracePeriod`](../../reference/config#openLedgerRereplicationGracePeriod) parameter.

### The rereplication process

The ledger rereplication process happens in these steps:

1. The client goes through all ledger fragments in the ledger, selecting those that contain the failed bookie.
1. A recovery process is initiated for each ledger fragment in this list.
1. The client selects a bookie to which all entries in the ledger fragment will be replicated; In the case of autorecovery, this will always be the local bookie.
1. The client reads entries that belong to the ledger fragment from other bookies in the ensemble and writes them to the selected bookie.
1. Once all entries have been replicated, the zookeeper metadata for the fragment is updated to reflect the new ensemble.
1. The fragment is marked as fully replicated in the recovery tool.
1. Once all ledger fragments are marked as fully replicated, the ledger is marked as fully replicated.

@@ -0,0 +1,174 @@
title: BookKeeper administration
subtitle: A guide to deploying and administering BookKeeper

This document is a guide to deploying, administering, and maintaining BookKeeper. It also discusses [best practices](#best-practices) and [common problems](#common-problems).

## Requirements

A typical BookKeeper installation consists of an ensemble of {% pop bookies %} and a ZooKeeper quorum. The exact number of bookies depends on the quorum mode that you choose, desired throughput, and the number of clients using the installation simultaneously.

The minimum number of bookies depends on the type of installation:

* For *self-verifying* entries you should run at least three bookies. In this mode, clients store a message authentication code along with each {% pop entry %}.
* For *generic* entries you should run at least four

There is no upper limit on the number of bookies that you can run in a single ensemble.

### Performance

To achieve optimal performance, BookKeeper requires each server to have at least two disks. It's possible to run a bookie with a single disk but performance will be significantly degraded.

### ZooKeeper

There is no constraint on the number of ZooKeeper nodes you can run with BookKeeper. A single machine running ZooKeeper in [standalone mode]( is sufficient for BookKeeper, although for the sake of higher resilience we recommend running ZooKeeper in [quorum mode]( with multiple servers.

## Starting and stopping bookies

You can run bookies either in the foreground or in the background, using [nohup]( You can also run [local bookies](#local-bookie) for development purposes.

To start a bookie in the foreground, use the [`bookie`](../../reference/cli#bookkeeper-bookie) command of the [`bookkeeper`](../../reference/cli#bookkeeper) CLI tool:

$ bin/bookkeeper bookie

To start a bookie in the background, use the [``](../../reference/ script and run `start bookie`:

$ bin/ start bookie

### Local bookies

The instructions above showed you how to run bookies intended for production use. If you'd like to experiment with ensembles of bookies locally, you can use the [`localbookie`](../../reference/cli#bookkeeper-localbookie) command of the `bookkeeper` CLI tool and specify the number of bookies you'd like to run.

This would spin up a local ensemble of 6 bookies:

$ bin/bookkeeper localbookie 6

> When you run a local bookie ensemble, all bookies run in a single JVM process.
## Configuring bookies

There's a wide variety of parameters that you can set in the bookie configuration file in `bookkeeper-server/conf/bk_server.conf` of your [BookKeeper installation](../../reference/config). A full listing can be found in [Bookie configuration](../../reference/config).

Some of the more important parameters to be aware of:

Parameter | Description | Default
`bookiePort` | The TCP port that the bookie listens on | `3181`
`zkServers` | A comma-separated list of ZooKeeper servers in `hostname:port` format | `localhost:2181`
`journalDirectory` | The directory where the [log device](../../getting-started/concepts#log-device) stores the bookie's write-ahead log (WAL) | `/tmp/bk-txn`
`ledgerDirectories` | The directories where the [ledger device](../../getting-started/concepts#ledger-device) stores the bookie's ledger entries (as a comma-separated list) | `/tmp/bk-data`

> Ideally, the directories specified `journalDirectory` and `ledgerDirectories` should be on difference devices.
## Logging

BookKeeper uses [slf4j]( for logging, with [log4j]( bindings enabled by default.

To enable logging for a bookie, create a `` file and point the `BOOKIE_LOG_CONF` environment variable to the configuration file. Here's an example:

$ export BOOKIE_LOG_CONF=/some/path/
$ bin/bookkeeper bookie

## Upgrading

From time to time you may need to make changes to the filesystem layout of bookies---changes that are incompatible with previous versions of BookKeeper and require that directories used with previous versions are upgraded. If a filesystem upgrade is required when updating BookKeeper, the bookie will fail to start and return an error like this:

2017-05-25 10:41:50,494 - ERROR - [main:Bookie@246] - Directory layout version is less than 3, upgrade needed

BookKeeper provides a utility for upgrading the filesystem. You can perform an upgrade using the [`upgrade`](../../reference/cli#bookkeeper-upgrade) command of the `bookkeeper` CLI tool. When running `bookkeeper upgrade` you need to specify one of three flags:

Flag | Action
`--upgrade` | Performs an upgrade
`--rollback` | Performs a rollback to the initial filesystem version
`--finalize` | Marks the upgrade as complete

### Upgrade pattern

A standard upgrade pattern is to run an upgrade...

$ bin/bookkeeper upgrade --upgrade

...then check that everything is working normally, then kill the bookie. If everything is okay, finalize the upgrade...

$ bin/bookkeeper upgrade --finalize

...and then restart the server:

$ bin/bookkeeper bookie

If something has gone wrong, you can always perform a rollback:

$ bin/bookkeeper upgrade --rollback

## Formatting

You can format bookie metadata in ZooKeeper using the [`metaformat`](../../reference/cli#bookkeeper-shell-metaformat) command of the [BookKeeper shell](../../reference/cli#the-bookkeeper-shell).

By default, formatting is done in interactive mode, which prompts you to confirm the format operation if old data exists. You can disable confirmation using the `-nonInteractive` flag. If old data does exist, the format operation will abort *unless* you set the `-force` flag. Here's an example:

$ bin/bookkeeper shell metaformat

You can format the local filesystem data on a bookie using the [`bookieformat`](../../reference/cli#bookkeeper-shell-bookieformat) command on each bookie. Here's an example:

$ bin/bookkeeper shell bookieformat

> The `-force` and `-nonInteractive` flags are also available for the `bookieformat` command.
## AutoRecovery

For a guide to AutoRecovery in BookKeeper, see [this doc](../autorecovery).

## Missing disks or directories

Accidentally replacing disks or removing directories can cause a bookie to fail while trying to read a ledger fragment that, according to the ledger metadata, exists on the bookie. For this reason, when a bookie is started for the first time, its disk configuration is fixed for the lifetime of that bookie. Any change to its disk configuration, such as a crashed disk or an accidental configuration change, will result in the bookie being unable to start. That will throw an error like this:

2017-05-29 18:19:13,790 - ERROR - [main:BookieServer314] – Exception running bookie server : @
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException org.apache.bookkeeper.bookie.Cookie.verify( org.apache.bookkeeper.bookie.Bookie.checkEnvironment( org.apache.bookkeeper.bookie.Bookie.<init>(

If the change was the result of an accidental configuration change, the change can be reverted and the bookie can be restarted. However, if the change *cannot* be reverted, such as is the case when you want to add a new disk or replace a disk, the bookie must be wiped and then all its data re-replicated onto it.

1. Increment the [`bookiePort`](../../reference/config#bookiePort) parameter in the [`bk_server.conf`](../../reference/config)
1. Ensure that all directories specified by [`journalDirectory`](../../reference/config#journalDirectory) and [`ledgerDirectories`](../../reference/config#ledgerDirectories) are empty.
1. [Start the bookie](#starting-and-stopping-bookies).
1. Run the following command to re-replicate the data:

$ bin/bookkeeper shell recover <oldbookie>

The ZooKeeper server, old bookie, and new bookie, are all identified by their external IP and `bookiePort` (3181 by default). Here's an example:

$ bin/bookkeeper shell recover

See the [AutoRecovery](../autorecovery) documentation for more info on the re-replication process.
@@ -0,0 +1,41 @@
title: Decommission Bookies

In case the user wants to decommission a bookie, the following process is useful to follow in order to verify if the
decommissioning was safely done.

### Before we decommission
1. Ensure state of your cluster can support the decommissioning of the target bookie.
Check if `EnsembleSize >= Write Quorum >= Ack Quorum` stays true with one less bookie

2. Ensure target bookie shows up in the listbookies command.

3. Ensure that there is no other process ongoing (upgrade etc).

### Process of Decommissioning
1. Log on to the bookie node, check if there are underreplicated ledgers.

If there are, the decommission command will force them to be replicated.
`$ bin/bookkeeper shell listunderreplicated`

2. Stop the bookie
`$ bin/ stop bookie`

3. Run the decommission command.
If you have logged onto the node you wish to decommission, you don't need to provide `-bookieid`
If you are running the decommission command for target bookie node from another bookie node you should mention
the target bookie id in the arguments for `-bookieid`
`$ bin/bookkeeper shell decommissionbookie`
`$ bin/bookkeeper shell decommissionbookie -bookieid <target bookieid>`

4. Validate that there are no ledgers on decommissioned bookie
`$ bin/bookkeeper shell listledgers -bookieid <target bookieid>`

Last step to verify is you could run this command to check if the bookie you decommissioned doesn’t show up in list bookies:

./bookkeeper shell listbookies -rw -h
./bookkeeper shell listbookies -ro -h
@@ -0,0 +1,22 @@
title: Geo-replication
subtitle: Replicate data across BookKeeper clusters

*Geo-replication* is the replication of data across BookKeeper clusters. In order to enable geo-replication for a group of BookKeeper clusters,

## Global ZooKeeper

Setting up a global ZooKeeper quorum is a lot like setting up a cluster-specific quorum. The crucial difference is that

### Geo-replication across three clusters

Let's say that you want to set up geo-replication across clusters in regions A, B, and C. First, the BookKeeper clusters in each region must have their own local (cluster-specific) ZooKeeper quorum.

> BookKeeper clusters use global ZooKeeper only for metadata storage. Traffic from bookies to ZooKeeper should thus be fairly light in general.
The crucial difference between using cluster-specific ZooKeeper and global ZooKeeper is that {% pop bookies %} is that you need to point all bookies to use the global ZooKeeper setup.

## Region-aware placement polocy

## Autorecovery

0 comments on commit 20eb8d9

Please sign in to comment.