Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29188][PYTHON][FOLLOW-UP] Explicitly disable Arrow execution for the test of toPandas empty types #27247

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR proposes to explicitly disable Arrow execution for the test of toPandas empty types. If spark.sql.execution.arrow.pyspark.enabled is enabled by default, this test alone fails as below:

======================================================================
ERROR [0.205s]: test_to_pandas_from_empty_dataframe (pyspark.sql.tests.test_dataframe.DataFrameTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/.../pyspark/sql/tests/test_dataframe.py", line 568, in test_to_pandas_from_empty_dataframe
    self.assertTrue(np.all(dtypes_when_empty_df == dtypes_when_nonempty_df))
AssertionError: False is not true
----------------------------------------------------------------------

it should be best to explicitly disable for the test that only works when it's disabled.

Why are the changes needed?

To make the test independent of default values of configuration.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested and Jenkins should test.

dtypes_when_empty_df = self.spark.sql(sql).filter("False").toPandas().dtypes
self.assertTrue(np.all(dtypes_when_empty_df == dtypes_when_nonempty_df))
with self.sql_conf({"spark.sql.execution.arrow.pyspark.enabled": False}):
# SPARK-29188 test that toPandas() on an empty dataframe has the correct dtypes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, do we need to re-open SPARK-29188 then because we know that toPandas() will fail when spark.sql.execution.arrow.pyspark.enabled=True?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm .. I will just open another one just to make the management simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116888 has finished for PR 27247 at commit 507b625.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master. Thank you, @HyukjinKwon .

@HyukjinKwon
Copy link
Member Author

Thank you @dongjoon-hyun !

@HyukjinKwon HyukjinKwon deleted the SPARK-29188-followup branch March 3, 2020 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants