Skip to content

[SPARK-37121] [HIVE][TEST] Fix Python version detection bug in TestUtils used by HiveExternalCatalogVersionsSuite#34395

Closed
xkrogen wants to merge 3 commits intoapache:masterfrom
xkrogen:xkrogen-SPARK-37121-testutils-python38-fix
Closed

[SPARK-37121] [HIVE][TEST] Fix Python version detection bug in TestUtils used by HiveExternalCatalogVersionsSuite#34395
xkrogen wants to merge 3 commits intoapache:masterfrom
xkrogen:xkrogen-SPARK-37121-testutils-python38-fix

Conversation

@xkrogen
Copy link
Contributor

@xkrogen xkrogen commented Oct 26, 2021

What changes were proposed in this pull request?

Fix a bug in TestUtils.isPythonVersionAtLeast38 to allow for HiveExternalCatalogVersionsSuite to test against Spark 2.x releases in environments with Python <= 3.7.

Why are the changes needed?

The logic in TestUtils.isPythonVersionAtLeast38 was added in #30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set for HiveExternalCatalogVersionsSuite, regardless of the system-installed version of Python.

The problem is here:

def isPythonVersionAtLeast38(): Boolean = {
val attempt = if (Utils.isWindows) {
Try(Process(Seq("cmd.exe", "/C", "python3 --version"))
.run(ProcessLogger(s => s.startsWith("Python 3.8") || s.startsWith("Python 3.9")))
.exitValue())
} else {
Try(Process(Seq("sh", "-c", "python3 --version"))
.run(ProcessLogger(s => s.startsWith("Python 3.8") || s.startsWith("Python 3.9")))
.exitValue())
}
attempt.isSuccess && attempt.get == 0
}

It's trying to evaluate the version of Python using a ProcessLogger, but the logger accepts a String => Unit function, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of the startsWith checks are thrown away, and attempt.isSuccess && attempt.get == 0 will always be true as long as your system has a python3 binary (of any version).

Does this PR introduce any user-facing change?

No, test changes only.

How was this patch tested?

Confirmed by checking that HiveExternalCatalogVersionsSuite downloads binary distros for Spark 2.x lines as well as 3.x when I symlink my python3 to Python 3.7, and only downloads distros for the 3.x lines when I symlink my python3 to Python 3.9.

brew link --force python@3.7
# run HiveExternalCatalogVersionsSuite and validate that 2.x and 3.x tests get executed
brew unlink python@3.7
brew link --force python@3.9
# run HiveExternalCatalogVersionsSuite and validate that only 3.x tests get executed

@github-actions github-actions bot added the CORE label Oct 26, 2021
@xkrogen
Copy link
Contributor Author

xkrogen commented Oct 26, 2021

cc @dongjoon-hyun @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Test build #144627 has started for PR 34395 at commit 09769f9.

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49097/

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49097/

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @xkrogen .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49101/

@SparkQA
Copy link

SparkQA commented Oct 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49101/

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Test build #144631 has finished for PR 34395 at commit bfb9883.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49138/

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49138/

@SparkQA
Copy link

SparkQA commented Oct 27, 2021

Test build #144670 has finished for PR 34395 at commit 535fe44.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Thanks @xkrogen.

Merged to master and branch-3.2.

HyukjinKwon pushed a commit that referenced this pull request Oct 28, 2021
…ls used by HiveExternalCatalogVersionsSuite

### What changes were proposed in this pull request?
Fix a bug in `TestUtils.isPythonVersionAtLeast38` to allow for `HiveExternalCatalogVersionsSuite` to test against Spark 2.x releases in environments with Python <= 3.7.

### Why are the changes needed?
The logic in `TestUtils.isPythonVersionAtLeast38` was added in #30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set for `HiveExternalCatalogVersionsSuite`, regardless of the system-installed version of Python.

The problem is here:
https://github.com/apache/spark/blob/951efb80856e2a92ba3690886c95643567dae9d0/core/src/main/scala/org/apache/spark/TestUtils.scala#L280-L291
It's trying to evaluate the version of Python using a `ProcessLogger`, but the logger accepts a `String => Unit` function, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of the `startsWith` checks are thrown away, and `attempt.isSuccess && attempt.get == 0` will always be true as long as your system has a `python3` binary (of any version).

### Does this PR introduce _any_ user-facing change?
No, test changes only.

### How was this patch tested?
Confirmed by checking that `HiveExternalCatalogVersionsSuite` downloads binary distros for Spark 2.x lines as well as 3.x when I symlink my `python3` to Python 3.7, and only downloads distros for the 3.x lines when I symlink my `python3` to Python 3.9.

```bash
brew link --force python3.7
# run HiveExternalCatalogVersionsSuite and validate that 2.x and 3.x tests get executed
brew unlink python3.7
brew link --force python3.9
# run HiveExternalCatalogVersionsSuite and validate that only 3.x tests get executed
```

Closes #34395 from xkrogen/xkrogen-SPARK-37121-testutils-python38-fix.

Authored-by: Erik Krogen <xkrogen@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 30e1261)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@xkrogen xkrogen deleted the xkrogen-SPARK-37121-testutils-python38-fix branch November 1, 2021 21:04
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
…ls used by HiveExternalCatalogVersionsSuite

### What changes were proposed in this pull request?
Fix a bug in `TestUtils.isPythonVersionAtLeast38` to allow for `HiveExternalCatalogVersionsSuite` to test against Spark 2.x releases in environments with Python <= 3.7.

### Why are the changes needed?
The logic in `TestUtils.isPythonVersionAtLeast38` was added in apache#30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set for `HiveExternalCatalogVersionsSuite`, regardless of the system-installed version of Python.

The problem is here:
https://github.com/apache/spark/blob/951efb80856e2a92ba3690886c95643567dae9d0/core/src/main/scala/org/apache/spark/TestUtils.scala#L280-L291
It's trying to evaluate the version of Python using a `ProcessLogger`, but the logger accepts a `String => Unit` function, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of the `startsWith` checks are thrown away, and `attempt.isSuccess && attempt.get == 0` will always be true as long as your system has a `python3` binary (of any version).

### Does this PR introduce _any_ user-facing change?
No, test changes only.

### How was this patch tested?
Confirmed by checking that `HiveExternalCatalogVersionsSuite` downloads binary distros for Spark 2.x lines as well as 3.x when I symlink my `python3` to Python 3.7, and only downloads distros for the 3.x lines when I symlink my `python3` to Python 3.9.

```bash
brew link --force python3.7
# run HiveExternalCatalogVersionsSuite and validate that 2.x and 3.x tests get executed
brew unlink python3.7
brew link --force python3.9
# run HiveExternalCatalogVersionsSuite and validate that only 3.x tests get executed
```

Closes apache#34395 from xkrogen/xkrogen-SPARK-37121-testutils-python38-fix.

Authored-by: Erik Krogen <xkrogen@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 30e1261)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 31af87c)
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…ls used by HiveExternalCatalogVersionsSuite

### What changes were proposed in this pull request?
Fix a bug in `TestUtils.isPythonVersionAtLeast38` to allow for `HiveExternalCatalogVersionsSuite` to test against Spark 2.x releases in environments with Python <= 3.7.

### Why are the changes needed?
The logic in `TestUtils.isPythonVersionAtLeast38` was added in apache#30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set for `HiveExternalCatalogVersionsSuite`, regardless of the system-installed version of Python.

The problem is here:
https://github.com/apache/spark/blob/951efb80856e2a92ba3690886c95643567dae9d0/core/src/main/scala/org/apache/spark/TestUtils.scala#L280-L291
It's trying to evaluate the version of Python using a `ProcessLogger`, but the logger accepts a `String => Unit` function, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of the `startsWith` checks are thrown away, and `attempt.isSuccess && attempt.get == 0` will always be true as long as your system has a `python3` binary (of any version).

### Does this PR introduce _any_ user-facing change?
No, test changes only.

### How was this patch tested?
Confirmed by checking that `HiveExternalCatalogVersionsSuite` downloads binary distros for Spark 2.x lines as well as 3.x when I symlink my `python3` to Python 3.7, and only downloads distros for the 3.x lines when I symlink my `python3` to Python 3.9.

```bash
brew link --force python3.7
# run HiveExternalCatalogVersionsSuite and validate that 2.x and 3.x tests get executed
brew unlink python3.7
brew link --force python3.9
# run HiveExternalCatalogVersionsSuite and validate that only 3.x tests get executed
```

Closes apache#34395 from xkrogen/xkrogen-SPARK-37121-testutils-python38-fix.

Authored-by: Erik Krogen <xkrogen@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 30e1261)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…ls used by HiveExternalCatalogVersionsSuite

### What changes were proposed in this pull request?
Fix a bug in `TestUtils.isPythonVersionAtLeast38` to allow for `HiveExternalCatalogVersionsSuite` to test against Spark 2.x releases in environments with Python <= 3.7.

### Why are the changes needed?
The logic in `TestUtils.isPythonVersionAtLeast38` was added in apache#30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set for `HiveExternalCatalogVersionsSuite`, regardless of the system-installed version of Python.

The problem is here:
https://github.com/apache/spark/blob/951efb80856e2a92ba3690886c95643567dae9d0/core/src/main/scala/org/apache/spark/TestUtils.scala#L280-L291
It's trying to evaluate the version of Python using a `ProcessLogger`, but the logger accepts a `String => Unit` function, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of the `startsWith` checks are thrown away, and `attempt.isSuccess && attempt.get == 0` will always be true as long as your system has a `python3` binary (of any version).

### Does this PR introduce _any_ user-facing change?
No, test changes only.

### How was this patch tested?
Confirmed by checking that `HiveExternalCatalogVersionsSuite` downloads binary distros for Spark 2.x lines as well as 3.x when I symlink my `python3` to Python 3.7, and only downloads distros for the 3.x lines when I symlink my `python3` to Python 3.9.

```bash
brew link --force python3.7
# run HiveExternalCatalogVersionsSuite and validate that 2.x and 3.x tests get executed
brew unlink python3.7
brew link --force python3.9
# run HiveExternalCatalogVersionsSuite and validate that only 3.x tests get executed
```

Closes apache#34395 from xkrogen/xkrogen-SPARK-37121-testutils-python38-fix.

Authored-by: Erik Krogen <xkrogen@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 30e1261)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants