New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-11820][serialization] SimpleStringSchema handle message record which value is null #8583
Conversation
… which value is null
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit bebe5fb (Fri Sep 06 09:08:42 UTC 2019) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
Yes, I think this should work. +1 |
Scratch that, I don't think we can do this. Our Kafka consumer silently swallows null values: Lines 407 to 410 in 0499942
null values. The fact that StringSerializer does is more of an anomaly. (also thanks to @GJL for pointing this out to me 😃)
|
hi, thanks for your commnet @aljoscha. here is detail stackstrace, flink-version: 1.6.3
|
from the NPE stackstrace, we can see that the NPE happens before |
Hi, @aljoscha @GJL, from
|
@lamber-ken The contract of
That is, returning |
hi, @GJL , I can't fully understand your comment. from my side, as a message queue like kafka, consumers and producers are decoupled. Consumers don't known whether producers put a valid data or not, may producers put a The flink job will go into an infinite circle because of the NPE, so this pr just aims to fix the NPE. This pr don't care about whether null needs to be sent to downstream operators. If downstream operators needs to consumer null valuses, may need to modify |
I wrote this message to the ML: https://lists.apache.org/thread.html/2991b6b3c520380a9172588bc1f6d7e6d632c3d421458a1b44c71c01@%3Cdev.flink.apache.org%3E Regarding this PR and Jira issue, I think it is wrong to return |
@aljoscha Thanks to bring up the discussion, I think it's meaningful. |
Hi, @aljoscha, your point is right. I have two cents: 1, If I understand correctly that user can use 2, but, the DISCUSS Connectors and NULL handling
|
I think |
@ajaybhat thanks for your comment. I think this pr aims to fix NPE only, the |
as your comment, the Kafka consumer silently swallows null values not |
Yes, the Kafka consumer swallows them silently, but if you change |
Thanks for your comment. But, I have a slightly different view from yours. Users may not care about the null values in many scenarios, and if mq queue contains a null value, the consumer will can not comsume any records because of the NPE and the kafka offset can't commit anymore. Users can use |
Users may or may not care about nulls. If nulls have a meaning for a use case then dropping null records can be fatal or lead to unexpected behavior in the worst case. According to the ML thread Flink currently does not consistently handle null records – changing that would be probably a bigger effort. All in all I think it is better to accept that
I will close this issue. Feel free to re-open if you think otherwise. |
ok, no problem. btw, we had modify it in our production. |
My branch has been deleted unexpectly, so I open this one, for more detail, please see #7987
What is the purpose of the change
when kafka msg queue contains some records which value is null,
SimpleStringSchema
can't process these records.for example, msg queue like bellow.
for normal, use SimpleStringSchema to process msg queue data
but, will get NullPointerException