Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address hive 3.x related test issues #125

Merged
merged 4 commits into from
Aug 27, 2020

Conversation

marton-bod
Copy link

@marton-bod marton-bod commented Aug 25, 2020

Addressing the issues outlined in #120:

  1. InsertIntoTableIntegrationTest.insertDataIntoTablePrimitiveParsedStrings - in Hive 3, date/timestamp types are cast to the Hadoop-specific date/timestamp types, which failed since it attempted to cast a java.sql.Date to the Hadoop type. Introduced new converters to solve the issue.

  2. PreV200HiveShellHiveCliEmulationTest - I think we can drop these tests, as this CLI is no longer supported. Uncommented the Ignore for now, but could also be deleted in due time.

  3. Warning messages: "Caused by: java.sql.SQLSyntaxErrorException: Table/View 'TBLS' does not exist." - in Hive 3, if hive.in.test=false (which is the default value), it's expected that the metastore DB has already been initialised with tables (e.g. TBLS), therefore some initial test queries are run against them. But in our tests, these tables don't exist in derby, hence the warning messages. Setting hive.in.test=true by default in all unit tests ensures the tables are first initialized, circumventing the problem.

  4. Warning messages: "WARN DataNucleus.MetaData:96 - Metadata has jdbc-type of null yet this is not valid. Ignored" - Couldn't find the exact root cause, but seems to be a commonplace issue based on some research, without breaking functionality. Added a log throttler to limit it to a single warn log per test.

Copy link
Collaborator

@massdosage massdosage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, thanks for the updates and the detailed explanations. I think we can remove the broken test case, otherwise all the rest LGTM.

* with some basic tables and will try to run initial test queries against them.
* This results in multiple warning stacktraces if the rdbms has not actually been initialized.
*/
config.getHiveConfSystemOverride().put("hive.in.test", "true");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a feeling there must be something like this to control it, fantastic!

@@ -29,7 +30,7 @@
import com.klarna.hiverunner.config.HiveRunnerConfig;
import com.klarna.hiverunner.sql.cli.hive.PreV200HiveCliEmulator;

//@Ignore TODO: some of these tests fail with Hive 3, should we fix or remove this functionality?
@Ignore("Disabling the test due to failures when upgrading to Hive 3. This CLI has been deprecated.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just delete this class, I don't think it has any value for Hive 3.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!
I've deleted the test class

Copy link
Collaborator

@patduin patduin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks!

@massdosage massdosage merged commit a1eb1b1 into HiveRunner:hive-3.x Aug 27, 2020
@marton-bod
Copy link
Author

Thanks for the reviews @massdosage @patduin!
Do you have any indication when this Hive3-compatible version could be released?

@massdosage
Copy link
Collaborator

No problem, I just noticed we need to use HIVE_IN_TEST.getVarname() instead of HIVE_IN_TEST.name(). I'll make that change now and push it shortly.

In terms of doing a release, I'll send a mail out to the HiveRunner list suggesting a major release for Hive 3 and that this will be the default version going forward with some suggestions what we do for Hive 2 in case there are changes in future. If there are no objections we can probably get a release done next week.

@marton-bod
Copy link
Author

Thanks for the correction. And great news regarding the release, looking forward to it 👍

massdosage added a commit that referenced this pull request Sep 2, 2020
Changes made include:
- update pom dependencies for newer Hive/Hadoop versions
- alter packaging to create a shaded uber-jar (Hadoop2 and its
  transitive deps are still widely used, and lag behind Hadoop3's
  transitive deps quite a bit.  Shading our use of Hadoop3 and some deps
  eases integration with other projects.
- alter 'HiveConf' initialization to account for Metastore configuration
  changes in Hive 3
- Address hive 3.x related test issues (#125)
- Fix ClassCastExceptions for Date and Timestamp in Hive3
- Set hive.in.test=true by default to avoid warmup sql exceptions in hive 3; use log rate limiting for datanucleus
- Delete outdated Hive CLI test

Co-authored-by: Marton Bod <mbod@cloudera.com>
Co-authored-by: Nicola Vitucci <nicola.vitucci@gmail.com>
Co-authored-by: Jason Gerlowski <gerlowskija@apache.org>
Co-authored-by: Patrick Duin <pduin@hotels.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants