Title: BookKeeper Metadata Management
Notice: Licensed under the Apache License, Version 2.0 (the “License”);
you may not use this file except in compliance with the License. You may
obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an “AS IS”
BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the specific language governing permissions
and limitations under the License.
There are two kinds of metadata needs to be managed in BookKeeper: one is the list of available bookies, which is used to track server availability (ZooKeeper is designed naturally for this); while the other is ledger metadata, which could be handle by different kinds of key/value storages efficiently with CAS (Compare And Set) semantics.
Ledger metadata is handled by LedgerManager and can be plugged with various storage mediums.
Ledger Metadata Management
The operations on the metadata of a ledger are quite straightforward. They are:
createLedger: create an new entry to store given ledger metadata. A unique id should be generated as the ledger id for the new ledger.
removeLedgerMetadata: remove the entry of a ledger from metadata store. A Version object is provided to do conditional remove. If given Version object doesn’t match current Version in metadata store, MetadataVersionException should be thrown to indicate version confliction. NoSuchLedgerExistsException should be returned if the ledger metadata entry doesn’t exists.
readLedgerMetadata: read the metadata of a ledger from metadata store. The new version should be set to the returned LedgerMetadata object. NoSuchLedgerExistsException should be returned if the entry of the ledger metadata doesn’t exists.
writeLedgerMetadata: update the metadata of a ledger matching the given Version. The update should be rejected and MetadataVersionException should be returned whe then given Version doesn’t match the current Version in metadata store. NoSuchLedgerExistsException should be returned if the entry of the ledger metadata doesn’t exists. The version of the LedgerMetadata object should be set to the new Version generated by applying this update.
asyncProcessLedgers: loops through all existed ledgers in metadata store and applies a Processor. The Processor provided is executed for each ledger. If a failure happens during iteration, the iteration should be teminated and final callback triggered with failure. Otherwise, final callback is triggered after all ledgers are processed. No ordering nor transactional guarantees need to be provided for in the implementation of this interface.
getLedgerRanges: return a list of ranges for ledgers in the metadata store. The ledger metadata itself does not need to be fetched. Only the ledger ids are needed. No ordering is required, but there must be no overlap between ledger ranges and each ledger range must be contain all the ledgers in the metadata store between the defined endpoint (i.e. a ledger range [x, y], all ledger ids larger or equal to x and smaller or equal to y should exist only in this range). getLedgerRanges is used in the ScanAndCompare gc algorithm.
How to choose a metadata storage medium for BookKeeper.
From the interface, several requirements need to met before choosing a metadata storage medium for BookKeeper:
Check and Set (CAS): The ability to do strict update according to specific conditional. Etc, a specific version (ZooKeeper) and same content (HBase).
Optimized for Writes: The metadata access pattern for BookKeeper is read first and continuous updates.
Optimized for Scans: Scans are required for a ScanAndCompare gc algorithm.
ZooKeeper is the default implemention for BookKeeper metadata management, ZooKeeper holds data in memory and provides filesystem-like namespace and also meets all the above requirements. ZooKeeper could meet most of usages for BookKeeper. However, if you application needs to manage millions of ledgers, a more scalable solution would be HBase, which also meet the above requirements, but it more complicated to set up.