[SPARK-37121] [HIVE][TEST] Fix Python version detection bug in TestUtils used by HiveExternalCatalogVersionsSuite#34395
Conversation
|
Test build #144627 has started for PR 34395 at commit |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Thank you for pinging me, @xkrogen . |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #144631 has finished for PR 34395 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #144670 has finished for PR 34395 at commit
|
|
Thanks @xkrogen. Merged to master and branch-3.2. |
…ls used by HiveExternalCatalogVersionsSuite ### What changes were proposed in this pull request? Fix a bug in `TestUtils.isPythonVersionAtLeast38` to allow for `HiveExternalCatalogVersionsSuite` to test against Spark 2.x releases in environments with Python <= 3.7. ### Why are the changes needed? The logic in `TestUtils.isPythonVersionAtLeast38` was added in #30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set for `HiveExternalCatalogVersionsSuite`, regardless of the system-installed version of Python. The problem is here: https://github.com/apache/spark/blob/951efb80856e2a92ba3690886c95643567dae9d0/core/src/main/scala/org/apache/spark/TestUtils.scala#L280-L291 It's trying to evaluate the version of Python using a `ProcessLogger`, but the logger accepts a `String => Unit` function, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of the `startsWith` checks are thrown away, and `attempt.isSuccess && attempt.get == 0` will always be true as long as your system has a `python3` binary (of any version). ### Does this PR introduce _any_ user-facing change? No, test changes only. ### How was this patch tested? Confirmed by checking that `HiveExternalCatalogVersionsSuite` downloads binary distros for Spark 2.x lines as well as 3.x when I symlink my `python3` to Python 3.7, and only downloads distros for the 3.x lines when I symlink my `python3` to Python 3.9. ```bash brew link --force python3.7 # run HiveExternalCatalogVersionsSuite and validate that 2.x and 3.x tests get executed brew unlink python3.7 brew link --force python3.9 # run HiveExternalCatalogVersionsSuite and validate that only 3.x tests get executed ``` Closes #34395 from xkrogen/xkrogen-SPARK-37121-testutils-python38-fix. Authored-by: Erik Krogen <xkrogen@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 30e1261) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…ls used by HiveExternalCatalogVersionsSuite ### What changes were proposed in this pull request? Fix a bug in `TestUtils.isPythonVersionAtLeast38` to allow for `HiveExternalCatalogVersionsSuite` to test against Spark 2.x releases in environments with Python <= 3.7. ### Why are the changes needed? The logic in `TestUtils.isPythonVersionAtLeast38` was added in apache#30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set for `HiveExternalCatalogVersionsSuite`, regardless of the system-installed version of Python. The problem is here: https://github.com/apache/spark/blob/951efb80856e2a92ba3690886c95643567dae9d0/core/src/main/scala/org/apache/spark/TestUtils.scala#L280-L291 It's trying to evaluate the version of Python using a `ProcessLogger`, but the logger accepts a `String => Unit` function, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of the `startsWith` checks are thrown away, and `attempt.isSuccess && attempt.get == 0` will always be true as long as your system has a `python3` binary (of any version). ### Does this PR introduce _any_ user-facing change? No, test changes only. ### How was this patch tested? Confirmed by checking that `HiveExternalCatalogVersionsSuite` downloads binary distros for Spark 2.x lines as well as 3.x when I symlink my `python3` to Python 3.7, and only downloads distros for the 3.x lines when I symlink my `python3` to Python 3.9. ```bash brew link --force python3.7 # run HiveExternalCatalogVersionsSuite and validate that 2.x and 3.x tests get executed brew unlink python3.7 brew link --force python3.9 # run HiveExternalCatalogVersionsSuite and validate that only 3.x tests get executed ``` Closes apache#34395 from xkrogen/xkrogen-SPARK-37121-testutils-python38-fix. Authored-by: Erik Krogen <xkrogen@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 30e1261) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 31af87c)
…ls used by HiveExternalCatalogVersionsSuite ### What changes were proposed in this pull request? Fix a bug in `TestUtils.isPythonVersionAtLeast38` to allow for `HiveExternalCatalogVersionsSuite` to test against Spark 2.x releases in environments with Python <= 3.7. ### Why are the changes needed? The logic in `TestUtils.isPythonVersionAtLeast38` was added in apache#30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set for `HiveExternalCatalogVersionsSuite`, regardless of the system-installed version of Python. The problem is here: https://github.com/apache/spark/blob/951efb80856e2a92ba3690886c95643567dae9d0/core/src/main/scala/org/apache/spark/TestUtils.scala#L280-L291 It's trying to evaluate the version of Python using a `ProcessLogger`, but the logger accepts a `String => Unit` function, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of the `startsWith` checks are thrown away, and `attempt.isSuccess && attempt.get == 0` will always be true as long as your system has a `python3` binary (of any version). ### Does this PR introduce _any_ user-facing change? No, test changes only. ### How was this patch tested? Confirmed by checking that `HiveExternalCatalogVersionsSuite` downloads binary distros for Spark 2.x lines as well as 3.x when I symlink my `python3` to Python 3.7, and only downloads distros for the 3.x lines when I symlink my `python3` to Python 3.9. ```bash brew link --force python3.7 # run HiveExternalCatalogVersionsSuite and validate that 2.x and 3.x tests get executed brew unlink python3.7 brew link --force python3.9 # run HiveExternalCatalogVersionsSuite and validate that only 3.x tests get executed ``` Closes apache#34395 from xkrogen/xkrogen-SPARK-37121-testutils-python38-fix. Authored-by: Erik Krogen <xkrogen@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 30e1261) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…ls used by HiveExternalCatalogVersionsSuite ### What changes were proposed in this pull request? Fix a bug in `TestUtils.isPythonVersionAtLeast38` to allow for `HiveExternalCatalogVersionsSuite` to test against Spark 2.x releases in environments with Python <= 3.7. ### Why are the changes needed? The logic in `TestUtils.isPythonVersionAtLeast38` was added in apache#30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set for `HiveExternalCatalogVersionsSuite`, regardless of the system-installed version of Python. The problem is here: https://github.com/apache/spark/blob/951efb80856e2a92ba3690886c95643567dae9d0/core/src/main/scala/org/apache/spark/TestUtils.scala#L280-L291 It's trying to evaluate the version of Python using a `ProcessLogger`, but the logger accepts a `String => Unit` function, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of the `startsWith` checks are thrown away, and `attempt.isSuccess && attempt.get == 0` will always be true as long as your system has a `python3` binary (of any version). ### Does this PR introduce _any_ user-facing change? No, test changes only. ### How was this patch tested? Confirmed by checking that `HiveExternalCatalogVersionsSuite` downloads binary distros for Spark 2.x lines as well as 3.x when I symlink my `python3` to Python 3.7, and only downloads distros for the 3.x lines when I symlink my `python3` to Python 3.9. ```bash brew link --force python3.7 # run HiveExternalCatalogVersionsSuite and validate that 2.x and 3.x tests get executed brew unlink python3.7 brew link --force python3.9 # run HiveExternalCatalogVersionsSuite and validate that only 3.x tests get executed ``` Closes apache#34395 from xkrogen/xkrogen-SPARK-37121-testutils-python38-fix. Authored-by: Erik Krogen <xkrogen@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 30e1261) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Fix a bug in
TestUtils.isPythonVersionAtLeast38to allow forHiveExternalCatalogVersionsSuiteto test against Spark 2.x releases in environments with Python <= 3.7.Why are the changes needed?
The logic in
TestUtils.isPythonVersionAtLeast38was added in #30044 to prevent Spark 2.4 from being run in an environment where the Python3 version installed was >= Python 3.8, which is not compatible with Spark 2.4. However, this method always returns true, so only Spark 3.x versions will ever be included in the version set forHiveExternalCatalogVersionsSuite, regardless of the system-installed version of Python.The problem is here:
spark/core/src/main/scala/org/apache/spark/TestUtils.scala
Lines 280 to 291 in 951efb8
It's trying to evaluate the version of Python using a
ProcessLogger, but the logger accepts aString => Unitfunction, i.e., it does not make use of the return value in any way (since it's meant for logging). So the result of thestartsWithchecks are thrown away, andattempt.isSuccess && attempt.get == 0will always be true as long as your system has apython3binary (of any version).Does this PR introduce any user-facing change?
No, test changes only.
How was this patch tested?
Confirmed by checking that
HiveExternalCatalogVersionsSuitedownloads binary distros for Spark 2.x lines as well as 3.x when I symlink mypython3to Python 3.7, and only downloads distros for the 3.x lines when I symlink mypython3to Python 3.9.