[FLINK-9491] Implement timer data structure based on RocksDB #6227

StefanRRichter · 2018-06-29T08:47:13Z

What is the purpose of the change

This PR is another step towards integrating the timer state with the keyed state backends.

First, the PR generalizes the data structure InternalTimerHeap to InternalPriorityQueue so that the functionality of a heap-set-organized state is decoupled from storing timers. The main reason for this is that state/backend related code lives in flink-runtime and timers are a concept from flink-streaming.

Second, the PR also introduced an implementation of InternalPriorityQueue with set semantics (i.e. the data structure we require to manage timers) that is based on RocksDB. State in RocksDB is always partitioned into key-groups, so the general idea is to organize the implementation as a heap-of-heaps, where each sub-heap represents elements from exactly one key-group, that merges by priority over the key-group boundaries. The implementation reuses the in-memory implementation of InternalPriorityQueue (without set-properties) as the super-heap that holds the sub-heaps. Further more each sub-heap is an instance of CachingInternalPriorityQueueSet, consisting of a "fast", "small" cache (OrderedSetCache) and a "slow", "unbounded" store (OrderedSetStore), currently applying simple write-through synchronization between cache and store. In the current implementation, the cache is based on a an AVL-Tree and restricted in capacity. The store is backed by a RocksDB column family. We utilize caching to reduced read-accesses to RocksDB.

Please note that the RocksDB implementation is currently not yet integrated with the timer service or the backend. This will happen in the next steps.

Brief change log

Refactored InternalTimerHeap to decouple it from timers, moved the data structures from flink-streaming to flink-runtime (-> InternalPriorityQueue).
Split the data-structure into a hierarchy, a heap without set-semantics (HeapPriorityQueue) and a heap extended with set-semantics (HeapPriorityQueueSet).
Introduced an implementation of RocksDB-based InternalPriorityQueue with set-semantics. Starting point is KeyGroupPartitionedPriorityQueue. This class uses a HeapPriorityQueue of CachingInternalPriorityQueueSet elements that each contains elements for exactly one key-group (heap-of-heaps). For RocksDB, we configure each CachingInternalPriorityQueueSet to use a TreeOrderedSetCache and a RocksDBOrderedStore.

Verifying this change

I added dedicated tests for all data structures.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes, fastutil)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (yes)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not applicable)

…ed to timers and implementation for RocksDB.

StefanRRichter added 2 commits June 26, 2018 10:14

Introduce MAX_ARRAY_SIZE as general constant

b3261a1

Generalization of timer queue to a queue(set) that is no longer coupl…

b5522ba

…ed to timers and implementation for RocksDB.

StefanRRichter changed the title ~~Heap abstractions rocks~~ [FLINK-9491] Implement timer data structure based on RocksDB Jun 29, 2018

StefanRRichter closed this Jun 29, 2018

rmetzger added the component=Runtime/StateBackends label Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLINK-9491] Implement timer data structure based on RocksDB #6227

[FLINK-9491] Implement timer data structure based on RocksDB #6227

Uh oh!

StefanRRichter commented Jun 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[FLINK-9491] Implement timer data structure based on RocksDB #6227

[FLINK-9491] Implement timer data structure based on RocksDB #6227

Uh oh!

Conversation

StefanRRichter commented Jun 29, 2018

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants