Skip to content

Conversation

@gmzz123
Copy link

@gmzz123 gmzz123 commented Nov 19, 2021

What is the purpose of the change

This pull request fixes StreamElementSerializer#deserialize(StreamElement reuse, DataInputView source) not handle tag == TAG_STREAM_STATUS.

Brief change log

  1. deserialize(StreamElement reuse, DataInputView source) return the WatermarkStatus when tag == TAG_STREAM_STATUS.

  2. copy(StreamElement from, StreamElement reuse) need not check reuse is a StreamRecord.

  3. copy(DataInputView source, DataOutputView target), there is no need to deserialize then serialize. As for each type of StreamElement, the binary length is fixed. So we can use binary copy to speed up.

Verifying this change

This change added tests and can be verified as follows:

  1. add test for copy(with/without object reuse). this copy will not go through DataInputView/DataOutputView.

  2. add test for StreamElement type WatermarkStatus.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (yes)
  • The runtime per-record code paths (performance sensitive): (yes)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

…use, source) forgets to handle tag == TAG_STREAM_STATUS.
@flinkbot
Copy link
Collaborator

flinkbot commented Nov 19, 2021

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 6658dc7 (Fri Nov 19 12:20:46 UTC 2021)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@gmzz123
Copy link
Author

gmzz123 commented Nov 23, 2021

@AHeise
Could you help review it? Thanks!

Copy link
Contributor

@AHeise AHeise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your patience and your contribution. The change looks good to me and I also like the secondary changes.

Could you please extract some changes to a separate commit in this PR? So we should have
[hotfix][DataStream] Use diamond operator in StreamElementSerializer(Test)
[FLINK-24915][DataStream] Improve StreamElementSerializer#copy
[FLINK-24915][DataStream] Fix StreamElementSerializer#deserialize for StreamStatus

If Azure pipeline is green, could you please also prepare backport PRs against Flink 1.14, 1.13 and maybe 1.12 if trivial (we don't usually release a new 1.12 build but sometimes we do).

Comment on lines +175 to +187
StreamRecord<String> withoutTimestamp = new StreamRecord<>("test 1 2 分享基督耶穌的愛給們,開拓雙贏!");
assertEquals(
withoutTimestamp,
serializeAndDeserializeWithReuse(withoutTimestamp, serializer, reuse));

StreamRecord<String> withTimestamp = new StreamRecord<>("one more test 拓 們 分", 77L);
assertEquals(
withTimestamp, serializeAndDeserializeWithReuse(withTimestamp, serializer, reuse));

StreamRecord<String> negativeTimestamp = new StreamRecord<>("他", Long.MIN_VALUE);
assertEquals(
negativeTimestamp,
serializeAndDeserializeWithReuse(negativeTimestamp, serializer, reuse));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to see an assert that reuse is actually returned.

You could switch to assertj-style assertions to make it more beautiful:

assertThat(serializeAndDeserializeWithReuse(negativeTimestamp, serializer, reuse))
  .isEqualTo(negativeTimestamp)
  .isSameAs(reuse)

Comment on lines -139 to +150
target.writeLong(source.readLong());
target.write(source, 8);
typeSerializer.copy(source, target);
} else if (tag == TAG_REC_WITHOUT_TIMESTAMP) {
typeSerializer.copy(source, target);
} else if (tag == TAG_WATERMARK) {
target.writeLong(source.readLong());
target.write(source, 8);
} else if (tag == TAG_STREAM_STATUS) {
target.writeInt(source.readInt());
target.write(source, 4);
} else if (tag == TAG_LATENCY_MARKER) {
target.writeLong(source.readLong());
target.writeLong(source.readLong());
target.writeLong(source.readLong());
target.writeInt(source.readInt());
target.write(source, 28);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract these performance optimizations into a separate commit.

Comment on lines -117 to +119
if (from.isRecord() && reuse.isRecord()) {
// need not check reuse is really a StreamRecord, otherwise reuse.asRecord() will throw
// ClassCastException, similar as cannot copy does.
if (from.isRecord()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract these performance optimizations into a separate commit.

public StreamElementSerializer<T> duplicate() {
TypeSerializer<T> copy = typeSerializer.duplicate();
return (copy == typeSerializer) ? this : new StreamElementSerializer<T>(copy);
return (copy == typeSerializer) ? this : new StreamElementSerializer<>(copy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract these warning fixes to a separate commit.

@github-actions
Copy link

This PR is being marked as stale since it has not had any activity in the last 180 days.
If you would like to keep this PR alive, please leave a comment asking for a review.
If the PR has merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out to the [community](https://flink.apache.org/what-is-flink/community/).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 90 days, it will be automatically closed.

@github-actions
Copy link

This PR has been closed since it has not had any activity in 120 days.
If you feel like this was a mistake, or you would like to continue working on it,
please feel free to re-open the PR and ask for a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants