Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37069][SQL] Properly fallback when Hive.getWithoutRegisterFns is not available #34360

Closed
wants to merge 2 commits into from

Conversation

sunchao
Copy link
Member

@sunchao sunchao commented Oct 21, 2021

What changes were proposed in this pull request?

Properly fallback to Hive.get when Hive.getWithoutRegisterFns is unavailable to a Hive version.

Does this PR introduce any user-facing change?

In SPARK-35321 we switched to use the new method Hive.getWithoutRegisterFns introduced by HIVE-21563. The code path is supposed to only active for Hive versions that are >= 2.3.9. However, due to how HiveVersion is initialized in IsolatedClientLoader, if users set spark.sql.hive.metastore.version to 2.3.8, Spark will still convert it to "2.3.9" and thus will subsequently fail with NoSuchMethodError.

This fixes it by always fallback on NoSuchMethodError. By doing this we are also able to support other Hive versions with Hive.getWithoutRegisterFns implemented.

How was this patch tested?

I manually tested via launching a Spark session with custom Hive version:

$SPARK_HOME/bin/spark-shell --conf spark.sql.hive.metastore.version=2.3.8 --conf spark.sql.hive.metastore.jars="/tmp/apache-hive-2.3.8-bin/lib/*

And then tried this command:

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
  .master("local[*]")
  .enableHiveSupport()
  .config("spark.sql.hive.metastore.version", "2.3.8")
  .config("spark.sql.hive.metastore.jars", "/tmp/apache-hive-2.3.8-bin/lib/*")
  .getOrCreate()
spark.sql("show tables").show

The command is failing before this PR, but working afterwards.

@github-actions github-actions bot added the SQL label Oct 21, 2021
@SparkQA
Copy link

SparkQA commented Oct 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48987/

@SparkQA
Copy link

SparkQA commented Oct 21, 2021

Test build #144516 has finished for PR 34360 at commit 84ccdfd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48987/

@sunchao
Copy link
Member Author

sunchao commented Oct 21, 2021

retest this please

@SparkQA
Copy link

SparkQA commented Oct 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48989/

@SparkQA
Copy link

SparkQA commented Oct 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48989/

@SparkQA
Copy link

SparkQA commented Oct 21, 2021

Test build #144518 has finished for PR 34360 at commit 84ccdfd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48994/

@SparkQA
Copy link

SparkQA commented Oct 22, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48994/

@SparkQA
Copy link

SparkQA commented Oct 22, 2021

Test build #144523 has finished for PR 34360 at commit b5c7054.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.2.

HyukjinKwon pushed a commit that referenced this pull request Oct 22, 2021
…is not available

Properly fallback to `Hive.get` when `Hive.getWithoutRegisterFns` is unavailable to a Hive version.

In SPARK-35321 we switched to use the new method `Hive.getWithoutRegisterFns` introduced by [HIVE-21563](https://issues.apache.org/jira/browse/HIVE-21563). The code path is supposed to only active for Hive versions that are >= 2.3.9. However, due to how `HiveVersion` is initialized in `IsolatedClientLoader`, if users set `spark.sql.hive.metastore.version` to `2.3.8`, Spark will still convert it to "2.3.9" and thus will subsequently fail with `NoSuchMethodError`.

This fixes it by always fallback on `NoSuchMethodError`. By doing this we are also able to support other Hive versions with `Hive.getWithoutRegisterFns` implemented.

I manually tested via launching a Spark session with custom Hive version:

```
$SPARK_HOME/bin/spark-shell --conf spark.sql.hive.metastore.version=2.3.8 --conf spark.sql.hive.metastore.jars="/tmp/apache-hive-2.3.8-bin/lib/*
```

And then tried this command:
```
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
  .master("local[*]")
  .enableHiveSupport()
  .config("spark.sql.hive.metastore.version", "2.3.8")
  .config("spark.sql.hive.metastore.jars", "/tmp/apache-hive-2.3.8-bin/lib/*")
  .getOrCreate()
spark.sql("show tables").show
```

The command is failing before this PR, but working afterwards.

Closes #34360 from sunchao/SPARK-37069.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 39a0c22)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@dongjoon-hyun
Copy link
Member

+1, LGTM. Thank you, @sunchao and all.

@sunchao sunchao deleted the SPARK-37069 branch October 22, 2021 16:41
sunchao added a commit to sunchao/spark that referenced this pull request Dec 8, 2021
…is not available

Properly fallback to `Hive.get` when `Hive.getWithoutRegisterFns` is unavailable to a Hive version.

In SPARK-35321 we switched to use the new method `Hive.getWithoutRegisterFns` introduced by [HIVE-21563](https://issues.apache.org/jira/browse/HIVE-21563). The code path is supposed to only active for Hive versions that are >= 2.3.9. However, due to how `HiveVersion` is initialized in `IsolatedClientLoader`, if users set `spark.sql.hive.metastore.version` to `2.3.8`, Spark will still convert it to "2.3.9" and thus will subsequently fail with `NoSuchMethodError`.

This fixes it by always fallback on `NoSuchMethodError`. By doing this we are also able to support other Hive versions with `Hive.getWithoutRegisterFns` implemented.

I manually tested via launching a Spark session with custom Hive version:

```
$SPARK_HOME/bin/spark-shell --conf spark.sql.hive.metastore.version=2.3.8 --conf spark.sql.hive.metastore.jars="/tmp/apache-hive-2.3.8-bin/lib/*
```

And then tried this command:
```
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
  .master("local[*]")
  .enableHiveSupport()
  .config("spark.sql.hive.metastore.version", "2.3.8")
  .config("spark.sql.hive.metastore.jars", "/tmp/apache-hive-2.3.8-bin/lib/*")
  .getOrCreate()
spark.sql("show tables").show
```

The command is failing before this PR, but working afterwards.

Closes apache#34360 from sunchao/SPARK-37069.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 39a0c22)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…is not available

Properly fallback to `Hive.get` when `Hive.getWithoutRegisterFns` is unavailable to a Hive version.

In SPARK-35321 we switched to use the new method `Hive.getWithoutRegisterFns` introduced by [HIVE-21563](https://issues.apache.org/jira/browse/HIVE-21563). The code path is supposed to only active for Hive versions that are >= 2.3.9. However, due to how `HiveVersion` is initialized in `IsolatedClientLoader`, if users set `spark.sql.hive.metastore.version` to `2.3.8`, Spark will still convert it to "2.3.9" and thus will subsequently fail with `NoSuchMethodError`.

This fixes it by always fallback on `NoSuchMethodError`. By doing this we are also able to support other Hive versions with `Hive.getWithoutRegisterFns` implemented.

I manually tested via launching a Spark session with custom Hive version:

```
$SPARK_HOME/bin/spark-shell --conf spark.sql.hive.metastore.version=2.3.8 --conf spark.sql.hive.metastore.jars="/tmp/apache-hive-2.3.8-bin/lib/*
```

And then tried this command:
```
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
  .master("local[*]")
  .enableHiveSupport()
  .config("spark.sql.hive.metastore.version", "2.3.8")
  .config("spark.sql.hive.metastore.jars", "/tmp/apache-hive-2.3.8-bin/lib/*")
  .getOrCreate()
spark.sql("show tables").show
```

The command is failing before this PR, but working afterwards.

Closes apache#34360 from sunchao/SPARK-37069.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 39a0c22)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…is not available

Properly fallback to `Hive.get` when `Hive.getWithoutRegisterFns` is unavailable to a Hive version.

In SPARK-35321 we switched to use the new method `Hive.getWithoutRegisterFns` introduced by [HIVE-21563](https://issues.apache.org/jira/browse/HIVE-21563). The code path is supposed to only active for Hive versions that are >= 2.3.9. However, due to how `HiveVersion` is initialized in `IsolatedClientLoader`, if users set `spark.sql.hive.metastore.version` to `2.3.8`, Spark will still convert it to "2.3.9" and thus will subsequently fail with `NoSuchMethodError`.

This fixes it by always fallback on `NoSuchMethodError`. By doing this we are also able to support other Hive versions with `Hive.getWithoutRegisterFns` implemented.

I manually tested via launching a Spark session with custom Hive version:

```
$SPARK_HOME/bin/spark-shell --conf spark.sql.hive.metastore.version=2.3.8 --conf spark.sql.hive.metastore.jars="/tmp/apache-hive-2.3.8-bin/lib/*
```

And then tried this command:
```
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
  .master("local[*]")
  .enableHiveSupport()
  .config("spark.sql.hive.metastore.version", "2.3.8")
  .config("spark.sql.hive.metastore.jars", "/tmp/apache-hive-2.3.8-bin/lib/*")
  .getOrCreate()
spark.sql("show tables").show
```

The command is failing before this PR, but working afterwards.

Closes apache#34360 from sunchao/SPARK-37069.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 39a0c22)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants