Skip to content

Conversation

dchristle
Copy link
Contributor

@dchristle dchristle commented Jan 19, 2023

What is the purpose of the change

This PR corrects and improves the readability of the documentation for RocksDB predefined option sets. References to disableDataSync, which is not set and has been removed from RocksDB for some time, are removed.

While fixing this error, I also noticed other confusing aspects of this documentation. The options are not sorted, which makes it harder for a reader to find a particular one, and for maintainers to ensure the code & documentation match. I found an error of this kind in SPINNING_DISK_OPTIMIZED_HIGH_MEM, where Bloom filter usage is enabled but not documented. The unsorted documentation/code may have played a role in this accidental omission. I kept the Bloom filter setting, so that users' job performance does not change unexpectedly between versions, and I added the Bloom filter setting to the documentation along with a description of where its input parameters are set.

References to Flink not needing to sync changes to the file system due to its checkpointing ability are also present, but it is confusing to a reader because this point isn't relevant to any of the listed options in the Java documentation. setUseFsync(false) is documented for each option set in the Python documentation, but isn't in the Java documentation. setUseFsync(false) is not set in the predefined options, so we should not emphasize it. Instead, I added a passing reference to it at the top of both Java and Python introductions.

Brief change log

  • Remove references to disableDataSync in RocksDB documentation. This is not set, and has been removed from RocksDB for some time.
  • Sort options in both Java and Python documentation. Sort the code setting configuration to match the sorted-order documentation.
  • Make Python documentation consistent with Java documentation by removing setUseFsync references on each option.
  • Change use of setIncreaseParallelism in Python documentation to setMaxBackgroundJobs, which matches the Java usage.
  • Add missing Bloom filter option to the SPINNING_DISK_OPTIMIZED_HIGH_MEM documentation, and describe where its parameters are set.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 19, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@MartijnVisser MartijnVisser requested a review from Myasuka January 19, 2023 08:27
Copy link
Member

@Myasuka Myasuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

I think this PR should be split into two commits:

  1. Remove references to disableDataSync in RocksDB documentation.
  2. Align pyflink rocksdb options with java side.

@dchristle
Copy link
Contributor Author

dchristle commented Jan 19, 2023

Hi @Myasuka,

Thanks for reviewing this PR! I split the changes into separate commits so that each change is separated & easier to review. The first commit removes disableDataSync, and the last commit aligns the PyFlink RocksDB documentation with its Java equivalent, as you requested.

@dchristle dchristle requested a review from Myasuka January 19, 2023 17:52
@dchristle
Copy link
Contributor Author

Hi @Myasuka Is there any chance we can merge this PR?

@Myasuka
Copy link
Member

Myasuka commented May 31, 2023

@dchristle Thanks for the reminder, I will take a look recently.

@Myasuka Myasuka self-assigned this May 31, 2023
Copy link
Member

@Myasuka Myasuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @dchristle could you rebase it with latest master branch so that we can get a green CI.

dchristle added 5 commits June 4, 2023 13:28
…ftover from mentioning that setUseFsync is false. But this is true for all predefined options, as mentioned at the top of the doc. We should remove this for clarity, and can mention it elsewhere a single time.
… code for improved readability and maintainability.
…, irrespective of the PredefinedOptions setting. Add the explanation of why syncing is not necessary in Flink to the introduction, where it has better context, and remove the leftover reference to syncing from the DEFAULT.
@dchristle dchristle force-pushed the dchristle/FLINK-30751 branch from 38f4da5 to f557045 Compare June 4, 2023 20:31
@dchristle
Copy link
Contributor Author

@Myasuka Got it - I just rebased on the latest master branch.

@Myasuka Myasuka closed this in b232a7e Jun 5, 2023
Myasuka pushed a commit that referenced this pull request Jun 5, 2023
Myasuka pushed a commit that referenced this pull request Jun 5, 2023
@dchristle
Copy link
Contributor Author

@Myasuka Thanks for your review & help here!

gpkc pushed a commit to aiven/flink that referenced this pull request Mar 6, 2024
gpkc pushed a commit to aiven/flink that referenced this pull request Mar 6, 2024
RocMarshal pushed a commit to RocMarshal/flink that referenced this pull request May 9, 2024
@dchristle dchristle deleted the dchristle/FLINK-30751 branch January 6, 2025 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants