Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#12912 Fix KafkaEmitter not emitting queryType for a native query #12915

Conversation

BartMiki
Copy link
Contributor

@BartMiki BartMiki commented Aug 17, 2022

Fixes #12912.

Description

Fixes KafkaEmitter not emitting queryType for a native query. The Event to JSON serialization was extracted to the external class: EventToJsonSerializer. This was done to simplify the testing logic for the serialization as well as extract the responsibility of serialization to the separate class.

The logic builds ObjectNode incrementally based on the event .toMap method. Parsing each entry individually ensures that the Jackson polymorphic annotations are respected. Not respecting these annotation caused the missing of the queryType from output event.


Key changed/added classes in this PR
  • KafkaEmitter - modified
  • EventToJsonSerializer - added

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

@FrankChen021
Copy link
Member

toMap is a public interface that could be used anywhere, seems that all callers have to use the same way as EventToJsonSerializer to serialize the map object.

I don't know if there's any better idea that callers should not care about handling the serialization in a special way.

Another point is this test case DefaultRequestLogEventTest#testDefaultrequestLogEventSerde should cover the serialization of toMap method.

@BartMiki
Copy link
Contributor Author

@FrankChen021 Thank you for the comment!

Yes, the serialization of the event is very simple, all implementations are Jackson aware, so serialization of Event to JSON is as simple as in this test or in the HttpPostEmitter:

objectMapper.writeValueAsString(event)

There are tests for checking the .toMap, but they only cover .toMap and compare it against other map (they do not serialize it further).

However, the problem is that the original KafkaEmitter requires to pass additional property to the output event, which is clusterName (if present). That is why we cannot serialize using simple POJO:

objectMapper.writeValueAsString(event)

The original author created new map based on the .toMap and this optional clusterName property.
This is the problem, as there are no polymorphism-aware maps and lists (see Jackson ticket).

The only other solution that comes to my mind is to do:

  1. If additional properties map is empty, then just return objectMapper.writeValueAsString(event)
  2. If not empty, then:
    a) serialize event to ObjectNode using objectMapper.valueToTree(event) (serialize whole object in one go)
    b) add all KV pairs from additional parameters to the ObjectNode
    c) return serialized ObjectNode as a string

We won't use .toMap at any stage as we will be using the same simple implementation of Event -> JSON as other Emitters.
What do you think @FrankChen021 ?

@FrankChen021
Copy link
Member

@BartMiki Thank you for the updated information.

This is the problem, as there are no polymorphism-aware maps and lists (see FasterXML/jackson-databind#699 (comment)).

Yes, this is the root cause. and seems that there's no way to deal with this problem without user-awareness.

Since all changes in this PR are in Kafka extension, it makes senses to me.

@BartMiki
Copy link
Contributor Author

@FrankChen021 I will update the logic to use these steps (I think it is a bit more elegant):

  1. If additional properties map is empty, then just return objectMapper.writeValueAsString(event)
  2. If not empty, then:
    a) serialize event to ObjectNode using objectMapper.valueToTree(event) (serialize whole object in one go)
    b) add all KV pairs from additional parameters to the ObjectNode
    c) return serialized ObjectNode as a string

@BartMiki
Copy link
Contributor Author

@FrankChen021 I have new findings and I would like your comment on this. If we serialize the map as I've implemented it before, then the output is the same as the previous KafkaEmitter (of course with the queryType). So for native query parameters sql and sqlQueryCtx are absent from the output JSON.

If we serialize the whole event to JSON first then for the native query the sql and sqlQueryCtx are set to null and {} respectively in the output JSON. Now the question is about backward compatibility.

If I use the second implementation then the output will be different from the original implementation. This may lead to errors in the consumers of the emitted stream. If consumer code distinguishes the SQL from native queries solely by the presence of native or sql, then it will no longer work because both entries are present. Of course, corresponding fields are null, but previously they were totally absent from the output JSON. This is because .toMap removes null entries of "query" and "sql" from the resulting map.

Example consumer in Python:

event = json.loads(rawEvent)
if "query" in event:
    # do stuff with Native Query
elif "sql" in event:
    # do stuff with SQL Query

If I leave my implementation as is right now, then the above code would work correctly (backward compatible with the original KafkaEmitter logic).
However, if I update the code as described here: #12915 (comment) then the above snippet will always execute first if branch as "query" is always present.

Should I leave the fixed implementation as is right now to maintain this compatibility or should I go with (the perhaps braking) approach of serializing the whole event at once? My take is to leave fix as is to have backward compatibility. What do you think @FrankChen021 ?

@FrankChen021
Copy link
Member

I think you can keep the implementation as is right now, because in current case, the additional property map is always not empty, the 2nd implementation seems no gain.

@FrankChen021
Copy link
Member

Failed CI task has been re-trigger. I believe it has nothing to do with changes in this PR

@BartMiki
Copy link
Contributor Author

Thank you! Ultimately I can rebase from master and push if the check keeps failing.

I have also another question @FrankChen021 : Can I backport this fix to the stable branch? How can I do it? What is the best way to address backporting?

@FrankChen021
Copy link
Member

Usually issues, such as severe security issues, will be backported. And as I know, currently there's no plan for version like 0.23.x

@BartMiki BartMiki force-pushed the feature-fix-kafka-emitter-missing-query-type branch from 868d968 to 56237ef Compare August 22, 2022 10:11
@FrankChen021
Copy link
Member

@BartMiki Thank you for the updating. The change generally LGTM with some minor comments.

@FrankChen021
Copy link
Member

@rohangarg Would you like to review this change?

@FrankChen021
Copy link
Member

@BartMiki The checkstyle task in CI fails to run. Could you fix it?

Copy link
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @BartMiki !
LGTM % comments

@rohangarg
Copy link
Member

@rohangarg Would you like to review this change?

@FrankChen021 : Yes, apologies for the delayed response - got caught up in something else meanwhile. I have posted a small review, should be good from my side once that is done. Thanks for the reminder! 👍

@BartMiki
Copy link
Contributor Author

@BartMiki The checkstyle task in CI fails to run. Could you fix it?

Sure, I'm on it

Copy link
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the CI passes.
Thanks a lot for the changes @BartMiki

@BartMiki
Copy link
Contributor Author

Thank you @FrankChen021 and @rohangarg for your support and engaging discussion! It was a pleasure to work on this with you 👍

@abhishekagarwal87 abhishekagarwal87 merged commit 0bc9f9f into apache:master Aug 24, 2022
@abhishekagarwal87
Copy link
Contributor

Thank you @BartMiki for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KafkaEmitter for request emitting does not emit "queryType" for native queries
4 participants