Convert RocksDB kvstores to coroutines#13186
Conversation
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
…tore-rocksdb-coroutines
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
tclinkenbeard-oai
left a comment
There was a problem hiding this comment.
Generated by Codex.
What is it trying to do?
This PR converts the RocksDB and sharded RocksDB key-value store implementations from Flow actor syntax to standard coroutines, then extends race() so ThreadFutureStream can participate directly. The latter is used to preserve the old choose { delay(...) / waitNext(...) } behavior in refreshReadIteratorPool() without reintroducing the single-waiter violation described in the PR. It also adds focused coroutine coverage for ready, successful, and loser-cleanup ThreadFutureStream race paths.
Is it correct?
I think so. I inspected the coroutine rewrites in both RocksDB kvstores, the ThreadFutureStream additions in the coroutine race plumbing, and the new tests. The race() change is consistent with the existing FutureStream handling: ThreadFutureStream is classified as a stream type, uses the same single-callback registration path, and therefore gets the same callback removal behavior on losing branches. That is exactly the property refreshReadIteratorPool() needs.
I also checked the more failure-prone migration areas: close/dispose flows remain intentionally fire-and-forget via Uncancellable, semaphore-backed read helpers still retain their releasers across awaits, commit paths preserve post/wait ordering, and the follow-up checkpoint/test changes eliminate the visible awaited brace-init temporary lifetime hazards introduced in the first version.
I did not run builds or tests myself. The latest visible PR status rollup is green across the FoundationDB PR builder, clang, clang-arm, clang-ide, macOS, macOS M1, and cluster-test checks.
Are there bugs?
I did not find any correctness bugs.
Are there omissions?
None that I think block this.
A direct race(ThreadFutureStream, ...) error-path test would tighten the new specialization a bit further, but the current coverage already exercises the important new behavior: ready consumption, successful async wakeup, and loser cleanup/reuse of the same stream after another race wins.
Are there better ways of doing things?
The direct race() support for ThreadFutureStream looks like the better design than keeping a local adapter future in refreshReadIteratorPool(). Centralizing the ownership and loser-cleanup behavior in the shared race implementation is simpler and more robust than preserving a one-off wrapper at the call site.
Should this CL be LGTMd?
Yes, LGTM.
I reviewed the coroutine race implementation, the ThreadFutureStream integration and tests, and the RocksDB/sharded RocksDB coroutine conversions around close, commit, read, iterator refresh, and checkpoint-related awaits. The main residual risk is the usual one for a broad actor-to-coroutine migration: a subtle lifetime or semantic mismatch hiding in a mechanically converted path. I did not find one here, and the targeted follow-up fixes/tests address the sharpest surfaced edge case.
|
Will you please be able to test performance with mako that you did for other critical changes(like storage server) |
|
@neethuhaneesha I did detect some regression, working on fixing that |

Summary
Convert the RocksDB and sharded RocksDB key-value store implementations from Flow actor syntax to standard coroutines.
As part of the migration, extend coroutine
race()support to handleThreadFutureStreamdirectly and add focused coverage for that path.Details
The bulk of this change is a mechanical migration of:
KeyValueStoreRocksDB.actor.cpp→KeyValueStoreRocksDB.cppKeyValueStoreShardedRocksDB.actor.cpp→KeyValueStoreShardedRocksDB.cppDuring validation, the initial coroutine rewrite exposed an important behavioral difference in
refreshReadIteratorPool(). The actor version usedchoose { ... }over a timer and aThreadFutureStream. The first coroutine version wrapped the stream in a helperFutureso it could participate inrace(), but on timer wins that allowed the next loop iteration to register a second waiter on the sameThreadFutureStreambefore the losing waiter had been torn down. That violates the stream’s single-waiter invariant and reproduced as:SingleCallback<T>::next == thisRather than preserve the helper workaround, this change teaches the coroutine machinery to understand
ThreadFutureStreamas a first-class stream input torace().refreshReadIteratorPool()can then race the stream directly, which keeps loser cleanup inside the shared race implementation and matches the actor-era ownership model more closely.This also adds focused coroutine tests for
ThreadFutureStreamparticipation inrace().Validation
package_tests_u/flow/coro/raceThreadFutureStreamReady/flow/coro/raceThreadFutureStreamSuccesstests/rare/FailoverWithSSLag.toml1160450224263431510952634448320260509-204651-joshua-proxy-35ee7283f8e63538