Skip to content

Conversation

@NicoK
Copy link
Contributor

@NicoK NicoK commented Dec 5, 2019

What is the purpose of the change

With Flink's default settings for RocksDB, it will write a LOG file (not the WAL, but pure logging statements) into the data folder. Besides periodic statistics, it will log compaction attempts, new memtable creations, flushes, etc. This file grows indefinitely and may fill the disk without this log being actually used anywhere (it will be deleted with the job anway).

With this PR, we are effectively disabling the log by default. If anyone wants to retain it, it can be re-configured at will providing an own OptionsFactory.

Brief change log

  • change the default RocksDB configuration for all PredefinedOptions
    so that they use log level HEADER_LEVEL
  • disable periodic statistics dumps to the LOG file
  • hotfix the description of the DefaultConfigurableOptionsFactory

Verifying this change

I ran a Flink cluster with these changes and the LOG file now only contains some headers and is then never written to again. Normal behaviour is otherwise covered by existing tests.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? JavaDocs

This change is Reviewable

NicoK added 2 commits December 5, 2019 10:34
This commit changes the default RocksDB configuration for all PredefinedOptions
so that they use log level HEADER_LEVEL and disable periodic statistics dumps
to the LOG file.
Please note that there is no need to also change
DefaultConfigurableOptionsFactory since this is only applied after any
PredefinedOptions, and there is always one - at least PredefinedOptions#DEFAULT.

The problem with this file is that is will grow indefinitely until it is deleted
when the job is cancelled/restarted since it lives in RocksDB's local directory.
Therefore, it cannot be used for troubleshooting errors. For looking into
performance, metrics are probably better in the first place.

Note: Theoretically, we could even set the log level to NUM_INFO_LOG_LEVELS
which even removes (most of) the headers, but although that is working, it is
practically an invalid value for the log level and would be a bit hacky.
@NicoK NicoK requested a review from azagrebin December 5, 2019 11:57
@flinkbot
Copy link
Collaborator

flinkbot commented Dec 5, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 50314f6 (Thu Dec 05 12:01:26 UTC 2019)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@StephanEwen
Copy link
Contributor

Do we still want to put all options in the "predefined options"?

I would assume that from 1.10 on, the predefined options would not be used by default any more, because we configure the memory out of the box differently.

@NicoK
Copy link
Contributor Author

NicoK commented Dec 5, 2019

From the order that I saw in current master:

  • at first any PredefinedOption (if not set, then PredefinedOptions#DEFAULT), then (afterwards)
  • any OptionsFactory (default: DefaultConfigurableOptionsFactory

Where does the memory management come in? This seems to be the same with #10416

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 5, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build

@StephanEwen
Copy link
Contributor

There is a discussion about adjusting this order, but for now, the change makes sense as it is.

+1, will merge this...

StephanEwen pushed a commit to StephanEwen/flink that referenced this pull request Dec 6, 2019
This commit changes the default RocksDB configuration for all PredefinedOptions
so that they use log level HEADER_LEVEL and disable periodic statistics dumps
to the LOG file.
Please note that there is no need to also change
DefaultConfigurableOptionsFactory since this is only applied after any
PredefinedOptions, and there is always one - at least PredefinedOptions#DEFAULT.

The problem with this file is that is will grow indefinitely until it is deleted
when the job is cancelled/restarted since it lives in RocksDB's local directory.
Therefore, it cannot be used for troubleshooting errors. For looking into
performance, metrics are probably better in the first place.

Note: Theoretically, we could even set the log level to NUM_INFO_LOG_LEVELS
which even removes (most of) the headers, but although that is working, it is
practically an invalid value for the log level and would be a bit hacky.

This closes apache#10437
@asfgit asfgit closed this in 798b22c Dec 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants