-
Notifications
You must be signed in to change notification settings - Fork 538
Description
Status
ToDo
Motivation
Accelerate Kafka node recovery from unclean shutdowns.
Goal
Recover from unclean shutdown within 1 minute for a single partition.
Proposed Changes
Brief
Involves three entities: time index, transaction index, and Producer snapshot. They can be further categorized:
- Can be omitted: time index. Missing it only results in longer index lookup times.
- Cannot be omitted: transaction index and Producer snapshot.
The basic strategy of the transformation: during an unclean shutdown, instead of discarding data from these three, supplement the existing data. Also, reduce the playback range of records.
Details
Time index, transaction index, and Producer snapshot each have checkpoint information. Record playback and recovery are based on these three checkpoints. The starting offset for recovery is:
Starting offset = min(txnindex checkpoint, Producer snapshot checkpoint)
Time index
The time index checkpoint is the offset of the last index record.
During recovery from unclean shutdown, preserve the data in the time index stream. During record playback, if the offset is greater than the checkpoint, add new entries in a supplementary manner.
Transaction index
The transaction index does not have write actions unless there is an interrupted transaction. Therefore, an additional checkpoint information needs to be recorded in the partition's meta, written by a periodic task.
The checkpoint used for recovery is:
checkpoint_fin = max (periodically refreshed checkpoint, offset of the last entry)
Recovery strategy is similar to the time index.
Producer snapshot
The checkpoint of the Producer snapshot is the inherent offset information (key of the snapshot).
Increase the refresh frequency of the Producer snapshot. Recovery strategy is the same as above.
Whether the recycling strategy needs to be changed after increasing the refresh frequency: Not necessary. The latest producer snapshot (whether generated by the original logic or our added logic) is always retained.