Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-3226] Translation from and to POJOs for CodeGenerator #1624

Closed
wants to merge 2 commits into from

Conversation

twalthr
Copy link
Contributor

@twalthr twalthr commented Feb 11, 2016

This PR implements full POJO support as input and output type of the Table API. It is now possible to convert from an arbitrary type to an other arbitrary type. I fixed the failling AsITCase and implemented additional tests from/to tuples, from/to POJOs and from/to Case classes.

@vasia and @fhueske feel free to review.

@fhueske
Copy link
Contributor

fhueske commented Feb 12, 2016

Hi, I thought about using POJOs as native types within Table/SQL operators. IMO, the gains are very little compared to the added code complexity. Given a POJO input, we can preserve the input type only for very few operations such as a Filter. For most other operations, we need to generate a new output type (Tuple or Row). I am a bit skeptical about adding a lot of codeGen code with special cases for POJOs (such as the field index mapping) which is very seldom used. Moreover, POJO field accesses (for operations and serialization) go through reflection and are not very efficient. So even the performance gain for those few cases where POJOs can be used is not clear.

I do not question the native type support in general. Tuples and primitives should definitely be supported, but I don't think we need to support POJOs within Table / SQL operators. Instead, I would convert POJO datasets into Row tables during table scan. Most of the code in this PR can be used to implement a codeGen'd converter Map function.

What do you think @twalthr?

@twalthr
Copy link
Contributor Author

twalthr commented Feb 12, 2016

Hi @fhueske,

I completely share your opinion. This PR does exactly what you described. As you can read in DataSetSource: "It ensures that types without deterministic field order (e.g. POJOs) are not part of the plan translation." What I have implemented is proper reading of POJOs (into Table API) and writing of POJOs (out of Table API). Most of the code I have added is to provide codeGen'd converter Map functions for the source and the sink. POJOs are not used for any operatation within operations.

@fhueske
Copy link
Contributor

fhueske commented Feb 12, 2016

Ah, OK :-)
I have to admit that I didn't look at the changes in detail but noticed that they go into several places of the code gen. Hence, I was wondering if it would be possible to extract the special POJO case from the CodeGenerator, but I guess that would lead to code duplication.

I'll start reviewing the changes.

/**
* Converts the given [[org.apache.flink.api.table.Table]] to
* a DataSet. The given type must have exactly the same fields as the
* [[org.apache.flink.api.table.Table]]. That is, the names of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make name equivalence only required for POJOs and generic composite types types.
Rows and tuples can be matched by position. Otherwise, fields would need to be renamed to f0, f1, etc. for tuples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see the comment is copied. But should be changed, no?

@fhueske
Copy link
Contributor

fhueske commented Feb 12, 2016

Thanks @twalthr, the PR looks pretty good.

@twalthr
Copy link
Contributor Author

twalthr commented Feb 13, 2016

Thanks for reviewing @fhueske!
I updated my PR and rebased it.
Feel free to merge ;-)

@vasia
Copy link
Contributor

vasia commented Feb 15, 2016

Thanks @twalthr. I'll merge this.

asfgit pushed a commit that referenced this pull request Feb 15, 2016
@vasia
Copy link
Contributor

vasia commented Feb 15, 2016

You can close the PR @twalthr. It's merged (but doesn't automatically close).

@twalthr twalthr closed this Feb 15, 2016
vasia pushed a commit to vasia/flink that referenced this pull request Mar 16, 2016
asfgit pushed a commit that referenced this pull request Mar 18, 2016
fijolekProjects pushed a commit to fijolekProjects/flink that referenced this pull request May 1, 2016
hequn8128 pushed a commit to hequn8128/flink that referenced this pull request Jun 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants