New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-3226] Translation from and to POJOs for CodeGenerator #1624
Conversation
Hi, I thought about using POJOs as native types within Table/SQL operators. IMO, the gains are very little compared to the added code complexity. Given a POJO input, we can preserve the input type only for very few operations such as a Filter. For most other operations, we need to generate a new output type (Tuple or Row). I am a bit skeptical about adding a lot of codeGen code with special cases for POJOs (such as the field index mapping) which is very seldom used. Moreover, POJO field accesses (for operations and serialization) go through reflection and are not very efficient. So even the performance gain for those few cases where POJOs can be used is not clear. I do not question the native type support in general. Tuples and primitives should definitely be supported, but I don't think we need to support POJOs within Table / SQL operators. Instead, I would convert POJO datasets into Row tables during table scan. Most of the code in this PR can be used to implement a codeGen'd converter Map function. What do you think @twalthr? |
Hi @fhueske, I completely share your opinion. This PR does exactly what you described. As you can read in |
fff25df
to
e742826
Compare
Ah, OK :-) I'll start reviewing the changes. |
/** | ||
* Converts the given [[org.apache.flink.api.table.Table]] to | ||
* a DataSet. The given type must have exactly the same fields as the | ||
* [[org.apache.flink.api.table.Table]]. That is, the names of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make name equivalence only required for POJOs and generic composite types types.
Rows and tuples can be matched by position. Otherwise, fields would need to be renamed to f0
, f1
, etc. for tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see the comment is copied. But should be changed, no?
Thanks @twalthr, the PR looks pretty good. |
Thanks for reviewing @fhueske! |
Thanks @twalthr. I'll merge this. |
You can close the PR @twalthr. It's merged (but doesn't automatically close). |
This PR implements full POJO support as input and output type of the Table API. It is now possible to convert from an arbitrary type to an other arbitrary type. I fixed the failling AsITCase and implemented additional tests from/to tuples, from/to POJOs and from/to Case classes.
@vasia and @fhueske feel free to review.