Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Hive from 2.3.7 to 3.1.2 #126

Merged
merged 12 commits into from
Sep 2, 2020
Merged

Upgrade Hive from 2.3.7 to 3.1.2 #126

merged 12 commits into from
Sep 2, 2020

Conversation

massdosage
Copy link
Collaborator

This PR moves HiveRunner to use Hive 3.1.2 as the default supported Hive version. Hive 2 can still be used with HiveRunner 5.2.1. Going forward if we needed to do maintenance releases for Hive 2.x I'd suggest releasing these off a branch from 5.2.1 and keeping the version numbers in the 5.x range with version numbers >=6.0.0 for Hive 3.

gerlowskija and others added 10 commits August 21, 2019 12:00
This commit produces a version of HiveRunner that works with Hive 3.0.0
All tests run and pass, with the sole exception of
PreV200HiveShellHiveCliEmulationTest.

This commit is a starting point for 3.0.0 support, but it takes a
pretty blunt approach in updating HiveRunner.  I've only made those
changes strictly necessary to compile and pass tests on the updated Hive
version, and I did so without tons of context on Hive 3 itself.  There
might be changes needed, or new Hive3 features that should be tested,
or...

Changes made include:
- update pom dependencies for newer Hive/Hadoop versions
- alter packaging to create a shaded uber-jar (Hadoop2 and its
  transitive deps are still widely used, and lag behind Hadoop3's
  transitive deps quite a bit.  Shading our use of Hadoop3 and some deps
  eases integration with other projects.
- alter 'HiveConf' initialization to account for Metastore configuration
  changes in Hive 3
* Fix ClassCastExceptions for Date and Timestamp in Hive3
* Set hive.in.test=true by default to avoid warmup sql exceptions in hive 3; use log rate limiting for datanucleus
* Delete outdated Hive CLI test

Co-authored-by: Marton Bod <mbod@cloudera.com>
@massdosage
Copy link
Collaborator Author

Fixes #120

conf.setVar(var, newFolder(basedir, folder).toAbsolutePath().toString());
}

protected void configureMrExecutionEngine(HiveConf conf) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might not be necessary anymore since Hive3 dropped M/R support. But not a blocker happy to merge this in for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point, I think we leave it for now, we could have a follow on PR that goes through the code and removes everything to do with MR as there's probably more than just this.

Copy link
Collaborator

@patduin patduin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just one comment but feel free to ignore that or we can perhaps visit at a later date, very minor

Copy link
Contributor

@nvitucci nvitucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple questions/suggestions.

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.1</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most recent version is 3.2.4, any specific reason to use 3.1.1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was probably just the most recent version when the original change was made. There are quite a few other plugins that probably have older versions. I'd rather leave it like this as we know it works, we can always have a separate PR to update various versions later.

Co-authored-by: Nicola Vitucci <nicola.vitucci@gmail.com>
@massdosage massdosage merged commit 9f362d1 into master Sep 2, 2020
@massdosage massdosage deleted the hive-3.x branch September 2, 2020 08:38
@massdosage
Copy link
Collaborator Author

I've merged this into master and will hopefully get a release out in the next few days. Thanks for all your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants