Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-9681] [table] Make sure difference between minRetentionTime and maxRetentionTime at least 5 minutes #6255

Closed
wants to merge 2 commits into from

Conversation

hequn8128
Copy link
Contributor

What is the purpose of the change

This PR aims to make sure difference between minRetentionTime and maxRetentionTime at least 5 minutes. Currently, for a group by(or other operators), if minRetentionTime equals to maxRetentionTime, the group by operator will register a timer for each record coming at different time which cause performance problem. The reasoning for having two parameters is that we can avoid to register many timers if we have more freedom when to discard state. As min equals to max cause performance problem it is better to make sure these two parameters are not same.

Brief change log

  • Throw exception when difference between minRetentionTime and maxRetentionTime smaller than 5 minutes
  • Add a QueryConfigTest class extends from StreamQueryConfig. QueryConfigTest don't have the 5min limitation which makes test more convenient.
  • Adapt tests.

Verifying this change

This change added tests and can be verified as follows:

  • Added test that validates that min and max retention time are set correctly.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

Copy link
Contributor

@fhueske fhueske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hequn8128, the PR looks good. I left two minor comments.

Best, Fabian


{% endhighlight %}
</div>
</div>

Configuring different minimum and maximum idle state retention times is more efficient because it reduces the internal book-keeping of a query for when to remove state.
Configuring different minimum and maximum idle state retention times is more efficient because it reduces the internal book-keeping of a query for when to remove state. Difference between minTime and maxTime shoud be at least 5 minutes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "... more efficient ..." does not apply anymore. Maybe rephrase to

Cleaning up state requires additional bookkeeping which becomes less expensive for larger differences of minTime and maxTime. The difference between minTime and maxTime must be at least 5 minutes.

/**
* Test class used to test min and max retention time.
*/
class StreamQueryConfigTest(min: Time, max: Time) extends StreamQueryConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename the class to TestStreamQueryConfig because the Test at the end suggests that this class is testing something instead of being a util for a test.

@hequn8128
Copy link
Contributor Author

@fhueske Comments have been addressed. Thanks for your review.

@fhueske
Copy link
Contributor

fhueske commented Jul 5, 2018

Thanks for the update @hequn8128.

I'll merge this

@asfgit asfgit closed this in cfd0206 Jul 5, 2018
sampathBhat pushed a commit to sampathBhat/flink that referenced this pull request Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants