-
Couldn't load subscription status.
- Fork 13.7k
[FLINK1919] add HCatOutputFormat #1079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
java api and scala api fix scala HCatInputFormat bug for complex type moved hcatalog module to hadoop1 profile. Modify the surefile configuration for hcatalog tests. Addressed review comments from the first PR. remove unused import
|
From this JIRA, It seems that HCatalog are deployed with Hadoop 2. Is it sure that the vanila version of HCatalog is only with Hadoop 1? |
|
@chiwanpark |
|
Any updates from the review process? |
|
Oh, sorry for late. I'm reviewing this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HCatInputOutputFormatITCase would be better for name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary new line. (For only this line, other lines are okay)
|
Hi @jamescao, I just reviewed your PR. There are some issues to merge.
As @twalthr said, I think also that we should split the current test cases to test cases for |
|
@chiwanpark |
|
@jamescao Yes, I think the part (including tests) related with |
|
@jamescao good observation! The fix of #1111 should be added to the HCatOutputFormatBase as well. |
|
@chiwanpark |
|
Any update on this? |
|
This PR has not found ownership by committers for quite some time. I'm not sure if it is still mergeable. Most of the functionality will also be implemented as part of FLINK-10556. I will close the PR for now. Feel free to reopen it if you think it still makes sense to keep the contribution. |
[FLINK1919]
https://issues.apache.org/jira/browse/FLINK-1919
Add
HCatOutputFormatfor Tuple data types for java and scala api also fix a bug for the scala api'sHCatInputFormatfor hive complex types.Java api includes check for whether the schema of the HCatalog table and the Flink tuples match if the user provides a
TypeInformationin the constructor. For data types other than tuples, the OutputFormat requires a preceding Map function that converts toHCatRecordsscala api includes check if the schema of the HCatalog table and the Scala tuples match. For data types other than scala Tuple, the
OutputFormatrequires a preceding Map function that converts to HCatRecords scala api requires user to importorg.apache.flink.api.scala._ to allow the type be captured by the scala macro.The Hcatalog jar in maven central is compiled using hadoop1, which is not compatible with hive jars for testing, so a cloudera hcatalog jar is pulled into the pom for testing purpose. It can be removed if not required.
java List and Map can not be cast to scala
ListandMap,JavaConvertersis used to fix a bug in HcatInputFormat scala api@chiwanpark @rmetzger
I have changed the hcatalog jar to the apache version. That requires that I move the hcatalog module to hadoop1 profile.
@chiwanpark
I had made changes to most of your comment. Except for your comment regarding the verification of Exception in the tests. I feel that it's better to verify the exception at the point it's expected to be thrown. If we use method-wide annotation, we are not sure where the exception is thrown from the test method, this is not safe especially for common exception types such as IOException. I did remove the test dependency on exception error message.