[SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip `test_perf_profiler_data_source` if `pyarrow` is absent #53162

dongjoon-hyun · 2025-11-21T22:23:43Z

What changes were proposed in this pull request?

This PR aims to skip test_perf_profiler_data_source if pyarrow is absent.

Why are the changes needed?

To recover the failed PyPy CIs.

https://github.com/apache/spark/actions/workflows/build_python_pypy3.10.yml
- https://github.com/apache/spark/actions/runs/19574648782
  - https://github.com/apache/spark/actions/runs/19574648782/job/56056836234

======================================================================
ERROR: test_perf_profiler_data_source (pyspark.sql.tests.test_udf_profiler.UDFProfiler2Tests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/sql/tests/test_udf_profiler.py", line 609, in test_perf_profiler_data_source
    self.spark.read.format("TestDataSource").load().collect()
  File "/__w/spark/spark/python/pyspark/sql/classic/dataframe.py", line 469, in collect
    sock_info = self._jdf.collectToPython()
  File "/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py", line 1362, in __call__
    return_value = get_return_value(
  File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", line 263, in deco
    return f(*a, **kw)
  File "/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py", line 327, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o235.collectToPython.
: org.apache.spark.SparkException: 
Error from python worker:
  Traceback (most recent call last):
    File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 199, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 86, in _run_code
      exec(code, run_globals)
    File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 37, in <module>
    File "/usr/local/pypy/pypy3.10/lib/pypy3.10/importlib/__init__.py", line 126, in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
    File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
    File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
    File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
    File "<builtin>/frozen importlib._bootstrap_external", line 897, in exec_module
    File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
    File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/worker/plan_data_source_read.py", line 21, in <module>
      import pyarrow as pa
  ModuleNotFoundError: No module named 'pyarrow'

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

…yarrow is absent

dongjoon-hyun · 2025-11-21T22:33:47Z

Could you review this too when you are here, please, @sunchao ? 😄

sunchao

LGTM

dongjoon-hyun · 2025-11-21T22:53:38Z

Thank you so much, @sunchao ! Have a nice Thanksgiving Holiday~

…source` if `pyarrow` is absent ### What changes were proposed in this pull request? This PR aims to skip `test_perf_profiler_data_source` if `pyarrow` is absent. ### Why are the changes needed? To recover the failed `PyPy` CIs. - https://github.com/apache/spark/actions/workflows/build_python_pypy3.10.yml - https://github.com/apache/spark/actions/runs/19574648782 - https://github.com/apache/spark/actions/runs/19574648782/job/56056836234 ``` ====================================================================== ERROR: test_perf_profiler_data_source (pyspark.sql.tests.test_udf_profiler.UDFProfiler2Tests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/sql/tests/test_udf_profiler.py", line 609, in test_perf_profiler_data_source self.spark.read.format("TestDataSource").load().collect() File "/__w/spark/spark/python/pyspark/sql/classic/dataframe.py", line 469, in collect sock_info = self._jdf.collectToPython() File "/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py", line 1362, in __call__ return_value = get_return_value( File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", line 263, in deco return f(*a, **kw) File "/__w/spark/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/protocol.py", line 327, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o235.collectToPython. : org.apache.spark.SparkException: Error from python worker: Traceback (most recent call last): File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 199, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/pypy/pypy3.10/lib/pypy3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 37, in <module> File "/usr/local/pypy/pypy3.10/lib/pypy3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<builtin>/frozen importlib._bootstrap_external", line 897, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/worker/plan_data_source_read.py", line 21, in <module> import pyarrow as pa ModuleNotFoundError: No module named 'pyarrow' ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53162 from dongjoon-hyun/SPARK-54153. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 9b0b1ce) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

sunchao · 2025-11-21T22:58:56Z

@dongjoon-hyun Ha thanks, you too!

dongjoon-hyun · 2025-11-22T00:30:20Z

For the record, this recovers PyPy CI.

https://github.com/apache/spark/actions/runs/19585748630

[SPARK-54153][PYTHON][TESTS] Skip test_perf_profiler_data_source if p…

693552e

…yarrow is absent

github-actions bot added SQL PYTHON labels Nov 21, 2025

dongjoon-hyun changed the title ~~[SPARK-54153][PYTHON][TESTS] Skip test_perf_profiler_data_source if pyarrow is absent~~ [SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip test_perf_profiler_data_source if pyarrow is absent Nov 21, 2025

sunchao approved these changes Nov 21, 2025

View reviewed changes

dongjoon-hyun closed this in 9b0b1ce Nov 21, 2025

dongjoon-hyun deleted the SPARK-54153 branch November 21, 2025 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip `test_perf_profiler_data_source` if `pyarrow` is absent #53162

[SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip `test_perf_profiler_data_source` if `pyarrow` is absent #53162

Uh oh!

dongjoon-hyun commented Nov 21, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented Nov 21, 2025 •

edited

Loading

Uh oh!

sunchao left a comment

Uh oh!

dongjoon-hyun commented Nov 21, 2025

Uh oh!

sunchao commented Nov 21, 2025

Uh oh!

dongjoon-hyun commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip test_perf_profiler_data_source if pyarrow is absent #53162

[SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip test_perf_profiler_data_source if pyarrow is absent #53162

Uh oh!

Conversation

dongjoon-hyun commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunchao left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Nov 21, 2025

Uh oh!

sunchao commented Nov 21, 2025

Uh oh!

dongjoon-hyun commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip `test_perf_profiler_data_source` if `pyarrow` is absent #53162

[SPARK-54153][PYTHON][TESTS][FOLLOWUP] Skip `test_perf_profiler_data_source` if `pyarrow` is absent #53162

dongjoon-hyun commented Nov 21, 2025 •

edited

Loading

dongjoon-hyun commented Nov 21, 2025 •

edited

Loading