Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Dec 6, 2022

What changes were proposed in this pull request?

This PR aims to fix test_connect_function to import PandasOnSparkTestCase properly. If we handle import properly, the test cases are ignored properly because should_test_connect assumes have_pandas

should_test_connect = connect_requirement_message is None and have_pandas

Why are the changes needed?

SPARK-41346 imported PandasOnSparkTestCase outside of if have_pandas:.

if have_pandas:
from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
from pyspark.sql.dataframe import DataFrame
from pyspark.testing.connectutils import should_test_connect, connect_requirement_message
from pyspark.testing.pandasutils import PandasOnSparkTestCase

Does this PR introduce any user-facing change?

No.

How was this patch tested?

BEFORE

$ python/run-tests --testnames pyspark.sql.tests.connect.test_connect_function
...
ModuleNotFoundError: No module named 'pandas'

AFTER

$ python/run-tests --testnames pyspark.sql.tests.connect.test_connect_function
...
Skipped tests in pyspark.sql.tests.connect.test_connect_function with python3.9:
      test_aggregation_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.004s)
      test_math_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.004s)
      test_normal_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.002s)
      test_sort_with_nulls_order (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.001s)
      test_sorting_functions_with_column (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.001s)

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-41346][CONNECT][TESTS][FOLLOWUP] Fix test_connect_function to import PandasOnSparkTestCase properly [SPARK-41346][CONNECT][TESTS][FOLLOWUP] Fix test_connect_function to import PandasOnSparkTestCase properly Dec 6, 2022
@dongjoon-hyun
Copy link
Member Author

cc @zhengruifeng , @HyukjinKwon

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon !

@dongjoon-hyun
Copy link
Member Author

This is a single test file, test_connect_function.py, change and verified manually. Merged to master.

@zhengruifeng
Copy link
Contributor

@dongjoon-hyun Thank you for doing this!

@amaliujia
Copy link
Contributor

Thanks for keep driving this!

@dongjoon-hyun dongjoon-hyun deleted the SPARK-41346 branch December 6, 2022 18:21
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…o import `PandasOnSparkTestCase` properly

### What changes were proposed in this pull request?

This PR aims to fix `test_connect_function` to import `PandasOnSparkTestCase` properly. If we handle `import` properly, the test cases are ignored properly because `should_test_connect` assumes `have_pandas`

https://github.com/apache/spark/blob/97976a5cc915597fd2606602d18c52c075a03bf6/python/pyspark/testing/connectutils.py#L49

### Why are the changes needed?

SPARK-41346 imported `PandasOnSparkTestCase` outside of  `if have_pandas:`.

https://github.com/apache/spark/blob/97976a5cc915597fd2606602d18c52c075a03bf6/python/pyspark/sql/tests/connect/test_connect_function.py#L25-L29

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

**BEFORE**
```
$ python/run-tests --testnames pyspark.sql.tests.connect.test_connect_function
...
ModuleNotFoundError: No module named 'pandas'
```

**AFTER**
```
$ python/run-tests --testnames pyspark.sql.tests.connect.test_connect_function
...
Skipped tests in pyspark.sql.tests.connect.test_connect_function with python3.9:
      test_aggregation_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.004s)
      test_math_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.004s)
      test_normal_functions (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.002s)
      test_sort_with_nulls_order (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.001s)
      test_sorting_functions_with_column (pyspark.sql.tests.connect.test_connect_function.SparkConnectFunctionTests) ... skip (0.001s)
```

Closes apache#38929 from dongjoon-hyun/SPARK-41346.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants