[FLINK1919] add HCatOutputFormat #1064

jamescao · 2015-08-27T03:46:04Z

[FLINK1919]
Add HCatOutputFormat for Tuple data types for java and scala api also fix a bug for the scala api's HCatInputFormat for hive complex types.
Java api includes check for whether the schema of the HCatalog table and the Flink tuples match if the user provides a TypeInformation in the constructor. For data types other than tuples, the OutputFormat requires a preceding Map function that converts to HCatRecords
scala api includes check if the schema of the HCatalog table and the Scala tuples match. For data types other than scala Tuple, the OutputFormat requires a preceding Map function that converts to HCatRecords scala api require suser to import org.apache.flink.api.scala._ to allow the type be captured by the scala macro.
The Hcatalog jar in maven central is compiled using hadoop1, which is not compatible with hive jars for testing, so a cloudera hcatalog jar is pulled into the pom for testing purpose. It can be removed if not required.
java List and Map can not be cast to scala List and Map,JavaConverters is used to fix a bug in HcatInputFormat scala api

java api and scala api fix scala HCatInputFormat bug for complex type pull in cloudera Hcatalog jar for end to end test

chiwanpark · 2015-08-27T09:30:52Z

flink-staging/flink-hcatalog/pom.xml

I'm not a HCatalog expert. But I'm not sure that this 3rd-party repository is needed.

We should not depend on vendor specific repositories / versions in the normal builds.
In the parent pom, there is a profile to enable vendor repositories.

chiwanpark · 2015-08-27T09:59:56Z

Hi @jamescao, Thanks for your pull request!
I reviewed roughly and will review more detail in few days.

About the version of hcatalog release, it would be better to use vanila release.

jamescao · 2015-08-27T12:00:08Z

@chiwanpark @rmetzger
Thanks for all your comment, I will work to improve it. The reason I have to use a cloudera pom is that the hcatalog jar in maven central is compiled against hadoop1. Which make it incompatible with hive testing utilities. It seems that the hive test enviroment has some issues in travis CI which is not shown in my Mac, I will have a look at it and try to have it resolved soon.

jamescao · 2015-08-29T01:36:41Z

I need to work offline to debug the travis builds So close the pr for now. Thanks for all your time and comments! I will reopen once all the tests are fixed.

twalthr · 2015-09-02T13:32:44Z

@jamescao: It seems that you also wrote tests for the HCatInputFormat, right? is it possible to split the PR into a OutputFormat part and open a separate PR for the HCatInputFormat tests. I'm still working on FLINK-2167 and require a HCatalog testing infrastructure. Otherwise I have to write it my own. Anyway, I wonder why all HCat I/O format classes have no tests so far...

jamescao · 2015-09-10T01:15:15Z

@twalthr : sorry I missed your message, this pr is reopened in
#1079
I didn't check this closed page. I should have continued working on this one instead of close and reopen.
I will split the testing code and a bug fix in HCatInputFormat into a standalone pr. Meanwhile, you can refer to the code in the existing pr #1079. The code has passed the test in Travis CI and the hive test environment is quite simple to setup. One catch is that the hive junit tests is not thread safe, so you may need to tweak the surefire configuration to control the test concurrency. The other catch is that the hcatalog jar from maven central only works in hadoop1 profile, please my discussion with @chiwanpark in that page #1079

add HCatOutputFormat

aa149f4

java api and scala api fix scala HCatInputFormat bug for complex type pull in cloudera Hcatalog jar for end to end test

chiwanpark reviewed Aug 27, 2015
View reviewed changes

James Cao added 3 commits August 27, 2015 22:15

move hcatalog to hadoop1

420d115

throttle the test concurrency for hive metastore

b89d464

throttle the test concurrency for hive metastore

d0e39b2

jamescao closed this Aug 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[FLINK1919] add HCatOutputFormat #1064

[FLINK1919] add HCatOutputFormat #1064

Uh oh!

jamescao commented Aug 27, 2015

Uh oh!

chiwanpark Aug 27, 2015

Uh oh!

rmetzger Aug 27, 2015

Uh oh!

chiwanpark commented Aug 27, 2015

Uh oh!

jamescao commented Aug 27, 2015

Uh oh!

jamescao commented Aug 29, 2015

Uh oh!

twalthr commented Sep 2, 2015

Uh oh!

jamescao commented Sep 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[FLINK1919] add HCatOutputFormat #1064

[FLINK1919] add HCatOutputFormat #1064

Uh oh!

Conversation

jamescao commented Aug 27, 2015

Uh oh!

chiwanpark Aug 27, 2015

Choose a reason for hiding this comment

Uh oh!

rmetzger Aug 27, 2015

Choose a reason for hiding this comment

Uh oh!

chiwanpark commented Aug 27, 2015

Uh oh!

jamescao commented Aug 27, 2015

Uh oh!

jamescao commented Aug 29, 2015

Uh oh!

twalthr commented Sep 2, 2015

Uh oh!

jamescao commented Sep 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants