Skip to content

Conversation

@KurtYoung
Copy link
Contributor

What is the purpose of the change

We can do further optimizations if we know the join key fits in long, a single long field or two integer fields are both ok. For example, we can combine the hash code and actual key into one field, there will be no hash collision, can save a lot of unnecessary logic.

Brief change log

  • Introduce a new data structure: LongHybridHashTable

Verifying this change

This change added some unit tests.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (yes)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

…mprove performance when join key fits in long
@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@KurtYoung
Copy link
Contributor Author

cc @JingsongLi

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening a PR for this @KurtYoung , LGTM +1 , left some minor comments.

if (curSegRemain > 0) {
int copySize = Math.min(curSegRemain, sizeInBytes);

byte[] bytes = new byte[copySize];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get bytes from allocateReuseBytes or add TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think allocate reuse bytes make sense

}
}

// public void append(long key, BinaryRow row) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@KurtYoung
Copy link
Contributor Author

@JingsongLi Thanks for the reviewing, i will address the comments and merge this later.

@KurtYoung KurtYoung closed this in 299747f Mar 19, 2019
@KurtYoung KurtYoung deleted the long branch March 20, 2019 01:42
HuangZhenQiu pushed a commit to HuangZhenQiu/flink that referenced this pull request Mar 20, 2019
…mprove performance when join key fits in long

This closes apache#7996
HuangZhenQiu pushed a commit to HuangZhenQiu/flink that referenced this pull request Apr 22, 2019
…mprove performance when join key fits in long

This closes apache#7996
sunhaibotb pushed a commit to sunhaibotb/flink that referenced this pull request May 8, 2019
…mprove performance when join key fits in long

This closes apache#7996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants