[FLINK1919] add HCatOutputFormat #1079

jamescao · 2015-08-31T00:16:27Z

[FLINK1919]
https://issues.apache.org/jira/browse/FLINK-1919
Add HCatOutputFormat for Tuple data types for java and scala api also fix a bug for the scala api's HCatInputFormat for hive complex types.
Java api includes check for whether the schema of the HCatalog table and the Flink tuples match if the user provides a TypeInformation in the constructor. For data types other than tuples, the OutputFormat requires a preceding Map function that converts to HCatRecords
scala api includes check if the schema of the HCatalog table and the Scala tuples match. For data types other than scala Tuple, the OutputFormat requires a preceding Map function that converts to HCatRecords scala api requires user to import org.apache.flink.api.scala._ to allow the type be captured by the scala macro.
The Hcatalog jar in maven central is compiled using hadoop1, which is not compatible with hive jars for testing, so a cloudera hcatalog jar is pulled into the pom for testing purpose. It can be removed if not required.
java List and Map can not be cast to scala List and Map, JavaConverters is used to fix a bug in HcatInputFormat scala api

@chiwanpark @rmetzger
I have changed the hcatalog jar to the apache version. That requires that I move the hcatalog module to hadoop1 profile.
@chiwanpark
I had made changes to most of your comment. Except for your comment regarding the verification of Exception in the tests. I feel that it's better to verify the exception at the point it's expected to be thrown. If we use method-wide annotation, we are not sure where the exception is thrown from the test method, this is not safe especially for common exception types such as IOException. I did remove the test dependency on exception error message.

java api and scala api fix scala HCatInputFormat bug for complex type moved hcatalog module to hadoop1 profile. Modify the surefile configuration for hcatalog tests. Addressed review comments from the first PR. remove unused import

chiwanpark · 2015-08-31T07:09:59Z

From this JIRA, It seems that HCatalog are deployed with Hadoop 2. Is it sure that the vanila version of HCatalog is only with Hadoop 1?

jamescao · 2015-08-31T12:35:06Z

@chiwanpark
Yes, it's sure that the apache version of HCatalog is compiled against hadoop1. Using this jar in an hadoop2 environment will lead to binary incompatibility error. I think Hive community is aware of this problem.
https://issues.apache.org/jira/browse/HIVE-7349
On the other hand. The cloudera Hcatalog jar is compiled against hadoop2, thus works with hadoop2 environment. But using cloudera jar in a hadoop1 environment will lead to binary incompatibility error. I guess one way to solve this is to add a vendor-specific hcatalog-module for users who want to use hcatalog in hadoop2 or we need to wait until hive community publish 2 artifacts in maven for both hadoop1 and hadoop2.

jamescao · 2015-09-08T17:23:26Z

Any updates from the review process?

chiwanpark · 2015-09-09T10:41:50Z

Oh, sorry for late. I'm reviewing this.

chiwanpark · 2015-09-09T11:16:41Z

...k-hcatalog/src/test/java/org/apache/flink/hcatalog/java/test/HcatInputOutputFormatITest.java

HCatInputOutputFormatITCase would be better for name.

chiwanpark · 2015-09-09T11:53:02Z

...k-hcatalog/src/test/java/org/apache/flink/hcatalog/java/test/HcatInputOutputFormatITest.java

Unnecessary new line. (For only this line, other lines are okay)

chiwanpark · 2015-09-09T12:14:37Z

Hi @jamescao, I just reviewed your PR. There are some issues to merge.

HcatInputOutputFormatITest.java must use tab characters for indentation.
There are some formatting issues which I commented.
Some Scala codes doesn't seems Scala style.

As @twalthr said, I think also that we should split the current test cases to test cases for HCatInputFormat and HCatOutputFormat. This issues and PR should cover HCatalog output format. We can post another issue to JIRA and send PR for that.

jamescao · 2015-09-10T00:26:55Z

@chiwanpark
Thanks for all you comments! I will fix them in subsequent commits. There is also a bug fix for HcatInputformat in this pr, do you think that part should aslo be removed from this pr?
@fhueske
seems that there is some fix for hadoop concurrency #1111
I will incorporate the content of that pr into this one.

chiwanpark · 2015-09-10T06:23:04Z

@jamescao Yes, I think the part (including tests) related with HCatInputFormat should be removed.

fhueske · 2015-09-10T09:15:04Z

@jamescao good observation! The fix of #1111 should be added to the HCatOutputFormatBase as well.
There was another fix for HadoopIOFormats (FLINK-2555, PR #1038) that is also relevant for this PR.

… Flink-2617

jamescao · 2015-09-15T11:30:35Z

@chiwanpark
pr is now updated, I pull out code related to HCatInputFormat and incorporated Flink-2555 and Flink-2617. I also change the test environment to from CollectionsEnviroment to LocalEnviroment to cover the test of the de-serialization for the HCatOutputFormat.

fpompermaier · 2016-05-27T15:28:59Z

Any update on this?

twalthr · 2019-01-30T10:26:11Z

This PR has not found ownership by committers for quite some time. I'm not sure if it is still mergeable. Most of the functionality will also be implemented as part of FLINK-10556. I will close the PR for now. Feel free to reopen it if you think it still makes sense to keep the contribution.

add HCatOutputFormat

b226ff0

java api and scala api fix scala HCatInputFormat bug for complex type moved hcatalog module to hadoop1 profile. Modify the surefile configuration for hcatalog tests. Addressed review comments from the first PR. remove unused import

chiwanpark reviewed Sep 9, 2015
View reviewed changes

jamescao mentioned this pull request Sep 10, 2015

[FLINK1919] add HCatOutputFormat #1064

Closed

James Cao added 5 commits September 14, 2015 18:37

addressing review comments, incorporating changes from Flink-2555 and…

ed18072

… Flink-2617

pull out changes to HCatInputFormat.scala

00f4173

fix scala style

f33b3c7

set surefire threadCount to 1

30c1000

surefire->failsafe because test class name changes

8390a6c

twalthr closed this Jan 30, 2019

Uh oh!

[FLINK1919] add HCatOutputFormat #1079

[FLINK1919] add HCatOutputFormat #1079

Uh oh!

Conversation

jamescao commented Aug 31, 2015

Uh oh!

chiwanpark commented Aug 31, 2015

Uh oh!

jamescao commented Aug 31, 2015

Uh oh!

jamescao commented Sep 8, 2015

Uh oh!

chiwanpark commented Sep 9, 2015

Uh oh!

chiwanpark Sep 9, 2015

Choose a reason for hiding this comment

Uh oh!

chiwanpark Sep 9, 2015

Choose a reason for hiding this comment

Uh oh!

chiwanpark commented Sep 9, 2015

Uh oh!

jamescao commented Sep 10, 2015

Uh oh!

chiwanpark commented Sep 10, 2015

Uh oh!

fhueske commented Sep 10, 2015

Uh oh!

jamescao commented Sep 15, 2015

Uh oh!

fpompermaier commented May 27, 2016

Uh oh!

twalthr commented Jan 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants