-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-24915] [DataStream] fix StreamElementSerializer#deserialize(reuse, source) forgets to handle tag == TAG_STREAM_STATUS. #17838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…use, source) forgets to handle tag == TAG_STREAM_STATUS.
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 6658dc7 (Fri Nov 19 12:20:46 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
|
@AHeise |
AHeise
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your patience and your contribution. The change looks good to me and I also like the secondary changes.
Could you please extract some changes to a separate commit in this PR? So we should have
[hotfix][DataStream] Use diamond operator in StreamElementSerializer(Test)
[FLINK-24915][DataStream] Improve StreamElementSerializer#copy
[FLINK-24915][DataStream] Fix StreamElementSerializer#deserialize for StreamStatus
If Azure pipeline is green, could you please also prepare backport PRs against Flink 1.14, 1.13 and maybe 1.12 if trivial (we don't usually release a new 1.12 build but sometimes we do).
| StreamRecord<String> withoutTimestamp = new StreamRecord<>("test 1 2 分享基督耶穌的愛給們,開拓雙贏!"); | ||
| assertEquals( | ||
| withoutTimestamp, | ||
| serializeAndDeserializeWithReuse(withoutTimestamp, serializer, reuse)); | ||
|
|
||
| StreamRecord<String> withTimestamp = new StreamRecord<>("one more test 拓 們 分", 77L); | ||
| assertEquals( | ||
| withTimestamp, serializeAndDeserializeWithReuse(withTimestamp, serializer, reuse)); | ||
|
|
||
| StreamRecord<String> negativeTimestamp = new StreamRecord<>("他", Long.MIN_VALUE); | ||
| assertEquals( | ||
| negativeTimestamp, | ||
| serializeAndDeserializeWithReuse(negativeTimestamp, serializer, reuse)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also like to see an assert that reuse is actually returned.
You could switch to assertj-style assertions to make it more beautiful:
assertThat(serializeAndDeserializeWithReuse(negativeTimestamp, serializer, reuse))
.isEqualTo(negativeTimestamp)
.isSameAs(reuse)
| target.writeLong(source.readLong()); | ||
| target.write(source, 8); | ||
| typeSerializer.copy(source, target); | ||
| } else if (tag == TAG_REC_WITHOUT_TIMESTAMP) { | ||
| typeSerializer.copy(source, target); | ||
| } else if (tag == TAG_WATERMARK) { | ||
| target.writeLong(source.readLong()); | ||
| target.write(source, 8); | ||
| } else if (tag == TAG_STREAM_STATUS) { | ||
| target.writeInt(source.readInt()); | ||
| target.write(source, 4); | ||
| } else if (tag == TAG_LATENCY_MARKER) { | ||
| target.writeLong(source.readLong()); | ||
| target.writeLong(source.readLong()); | ||
| target.writeLong(source.readLong()); | ||
| target.writeInt(source.readInt()); | ||
| target.write(source, 28); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract these performance optimizations into a separate commit.
| if (from.isRecord() && reuse.isRecord()) { | ||
| // need not check reuse is really a StreamRecord, otherwise reuse.asRecord() will throw | ||
| // ClassCastException, similar as cannot copy does. | ||
| if (from.isRecord()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract these performance optimizations into a separate commit.
| public StreamElementSerializer<T> duplicate() { | ||
| TypeSerializer<T> copy = typeSerializer.duplicate(); | ||
| return (copy == typeSerializer) ? this : new StreamElementSerializer<T>(copy); | ||
| return (copy == typeSerializer) ? this : new StreamElementSerializer<>(copy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract these warning fixes to a separate commit.
|
This PR is being marked as stale since it has not had any activity in the last 180 days. If you are having difficulty finding a reviewer, please reach out to the [community](https://flink.apache.org/what-is-flink/community/). If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 90 days, it will be automatically closed. |
|
This PR has been closed since it has not had any activity in 120 days. |
What is the purpose of the change
This pull request fixes StreamElementSerializer#deserialize(StreamElement reuse, DataInputView source) not handle tag == TAG_STREAM_STATUS.
Brief change log
deserialize(StreamElement reuse, DataInputView source) return the WatermarkStatus when tag == TAG_STREAM_STATUS.
copy(StreamElement from, StreamElement reuse) need not check reuse is a StreamRecord.
copy(DataInputView source, DataOutputView target), there is no need to deserialize then serialize. As for each type of StreamElement, the binary length is fixed. So we can use binary copy to speed up.
Verifying this change
This change added tests and can be verified as follows:
add test for copy(with/without object reuse). this copy will not go through DataInputView/DataOutputView.
add test for StreamElement type WatermarkStatus.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (no)Documentation