Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-13034] Introduce isEmpty method for MapState #9255

Closed
wants to merge 3 commits into from

Conversation

Myasuka
Copy link
Member

@Myasuka Myasuka commented Jul 29, 2019

What is the purpose of the change

Introduce #isEmpty() method for MapState, this aims to improve the performance when we use MapState.iterator().hasNext() to judge whether the map is empty for RocksDB.

Brief change log

  • Introduce #isEmpty() method for MapState.
  • Introduce #isEmpty() method for MapView.
  • Replace previous usage of MapState.iterator().hasNext() to MapState.isEmpty().

Verifying this change

This change added tests and can be verified as follows:

  • Add test for StateBackendTestBase#testMapState().

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes, improve
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: ** no**
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? JavaDocs

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 29, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 3a929e8 (Wed Dec 04 14:59:46 UTC 2019)

✅no warnings

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❗ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 29, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build

@Myasuka
Copy link
Member Author

Myasuka commented Jul 29, 2019

@flinkbot attention @aljoscha @wuchong as I also changed the MapView.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request @Myasuka . The flink-table side looks good to me.

Copy link
Member

@carp84 carp84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, but could you add a micro benchmark for checking whether a map state is empty in our MapStateBenchmark and post the comparison result before and after the change here @Myasuka ? Thanks.

@Myasuka
Copy link
Member Author

Myasuka commented Sep 9, 2019

@carp84 thanks for your review. Actually, I have created FLINK-13846 to track this. If we want to know the improvement after introducing isEmpty() compared to previous solution to use iterator.hasNext(), I think I need to use a temp implementation of our micro benchmark.

@carp84
Copy link
Member

carp84 commented Sep 10, 2019

That's fine @Myasuka , just attach the patch here if anyone would like to try reproducing the result. Since this one claims to be a performance enhancement, I believe it's better to show some detailed data.

@Myasuka
Copy link
Member Author

Myasuka commented Sep 15, 2019

@carp84, I have compare the new API isEmpty() with previous usage iterator().hasNext() in my forked flink-benchmarks. I tested this on my local mac, and as you can see, the performance improvement is significant.

map state benchmark iteratorHasNext isEmpty improvement
Throughput 326.689 ± 11.680 ops/ms 471.627 ± 6.119 ops/ms 44.37%

Copy link
Member

@carp84 carp84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark data looks good, thanks for the testing @Myasuka .

Regarding the commit, my main concern is about unit test. The existing added tests are scattered and I think it's better to add a formal test case in StateBackendTestBase.testMapState.

What's more, since this adds a method in MapState interface which is PublicEvolving, we will need a release note.

@Myasuka
Copy link
Member Author

Myasuka commented Sep 22, 2019

There have been some tests for MapState.isEmpty in StateBackendTestBase.testMapState, do you actually mean other tests for MapState.isEmpty could be ignored? @carp84

@carp84
Copy link
Member

carp84 commented Sep 25, 2019

There have been some tests for MapState.isEmpty in StateBackendTestBase.testMapState, do you actually mean other tests for MapState.isEmpty could be ignored? @carp84

I meant to add an explicit testMapStateIsEmpty case like testListStateAddNull in StateBackendTestBase instead of squeezing everything in testMapState. Similarly, add an explicit testIsEmpty method in ImmutableMapStateTest instead of scattered statements in existing cases.

@carp84
Copy link
Member

carp84 commented Sep 25, 2019

What's more, I think we should split the StateBackendTestBase. testMapState case to explicitly test each MapState operation interface, but this is out of the scope for this PR/JIRA and worth a dedicated one.

@Myasuka
Copy link
Member Author

Myasuka commented Sep 25, 2019

@flinkbot run travis

Copy link
Member

@carp84 carp84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest change LGTM, +1.

Checked the CI build, the failure on python should be irrelative, but we still need a green build.

@carp84
Copy link
Member

carp84 commented Sep 26, 2019

What's more, since this adds a method in MapState interface which is PublicEvolving, we will need a release note.

And don't forget about the release note.

@StephanEwen
Copy link
Contributor

@flinkbot run travis

@StephanEwen
Copy link
Contributor

Thanks for addressing the comments.
Looks good, +1 to merge this now, pending the Travis run.

@Myasuka
Copy link
Member Author

Myasuka commented Oct 14, 2019

@StephanEwen It seems @flinkbot run travis not working ...

@Myasuka
Copy link
Member Author

Myasuka commented Oct 17, 2019

Rebase with latest master branch to re-trigger new CI build.

@Myasuka
Copy link
Member Author

Myasuka commented Oct 18, 2019

Failed due to FLINK-14445

@flinkbot run travis

@Myasuka
Copy link
Member Author

Myasuka commented Oct 22, 2019

It seems run travis command not working very well, rebase with latest code to trigger CI again.

@Myasuka
Copy link
Member Author

Myasuka commented Oct 25, 2019

CI failed due to known issue FLINK-14370 , but the CI which triggered on my branch succeed. @StephanEwen

StephanEwen pushed a commit to StephanEwen/flink that referenced this pull request Nov 6, 2019
@asfgit asfgit closed this in 051692a Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants