Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-32345][state] Improve parallel download of RocksDB incremental state. #22788

Conversation

StefanRRichter
Copy link
Contributor

@StefanRRichter StefanRRichter commented Jun 15, 2023

What is the purpose of the change

This commit improves RocksDBStateDownloader to support parallelized state download across multiple state types and across multiple state handles. This can improve our download times for scale-in.

Brief change log

  • Introduced StateHandleDownloadSpec to combine incremental remote handles with their download paths.
  • Modified RocksDBStateDownloader to accept a list of StateHandleDownloadSpec that can combine files from multiple handles and sub-states (shared, private).
  • Modified RocksDBIncrementalRestoreOperation to first assemble a complete list of StateHandleDownloadSpec before invoking the download process.
  • Adapted and added unit tests in RocksDBStateDownloaderTest.

Verifying this change

This change is already covered by existing tests, such as

  • RocksDBStateDownloaderTest. I added more tests there.
  • EmbeddedRocksDBStateBackendTest
  • EmbeddedRocksDBStateBackendMigrationTest
  • RescalingBenchmarkTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 15, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@StefanRRichter
Copy link
Contributor Author

@flinkbot run azure

@StefanRRichter StefanRRichter force-pushed the srichter-FLINK-32345-parallel-download branch 2 times, most recently from 7553cf6 to a10fcd4 Compare June 16, 2023 07:54
@StefanRRichter
Copy link
Contributor Author

@flinkbot run azure

This commit improves RocksDBStateDownloader to support parallelized state download across multiple state types and across multiple state handles. This can improve our download times for scale-in.
@StefanRRichter StefanRRichter force-pushed the srichter-FLINK-32345-parallel-download branch from a10fcd4 to e697f77 Compare June 16, 2023 12:52
@StefanRRichter
Copy link
Contributor Author

@flinkbot run azure

StateHandleDownloadSpec downloadRequest =
new StateHandleDownloadSpec(
(IncrementalRemoteKeyedStateHandle) stateHandle,
absolutInstanceBasePath.resolve(UUID.randomUUID().toString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, @zoltar9264 is changing how the local path is derived in #22669.

Comment on lines +122 to +137
entry -> {
StateHandleID stateHandleID = entry.getKey();
StreamStateHandle remoteFileHandle =
entry.getValue();
Path downloadDest =
downloadRequest
.getDownloadDestination()
.resolve(
stateHandleID
.toString());
return ThrowingRunnable.unchecked(
() ->
downloadDataForStateHandle(
downloadDest,
remoteFileHandle,
closeableRegistry));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about extracting a method createDownloadRunnables(k, v, dst)?

@StefanRRichter
Copy link
Contributor Author

@rkhachatryan I addressed all remaining comments, please take another look to approve the PR.

Copy link
Contributor

@rkhachatryan rkhachatryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR @StefanRRichter ,
LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants