[FLINK-9650] [formats] add support for protobuf objects #7865
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
flink-protobuf
This library adds support to flink for running sql against protobuf objects. Flink as of now
supports avro and json files backed by JsonSchema only. To add support for sql, flink needs to know
the TypeInformation, this library provides TypeInformation for protobuf object.
It uses protobuf apis to retrieve fields and types of a prorobuf object and than provides the
field name, and type as a PojoField to flink.
Current limitations:
In protobuf object field names have underscore at the end like
loggedAt_
, so in the sql it needsto be referred as
loggedAt_
instead oflogged_at
. This should be fixable in flink apis, butwould need some digging around in the code. If we whitelist
Message
classes inPojoField
that should help.Some fields are not supported yet like
Enum
etc, but should be trivial to add support.With this it is possible to run a query like the following in the stream of events:
Note: I have been a bit hasty to get this out, as this was sitting in our internal repo for a while and I haven't had the time to clean it up to make it flink ready. But also wanted to get the code out if someone wants to work on it they can work off this code rather than working on it from scratch. We have been using this for close to an year in production. Due to other commitments I may not get a chance to work on coding style/review comments immediately, so wouldn't mind if someone wants to improve this before merge. For example some there are pending TODO items like enum support/change in
PojoField
to make the sql nicer (no underscore) etc.(Apologize for not conforming to the coding style and the rest of the guidelines yet, hoping it is still useful as a beta version patch and someone may find this useful).