Skip to content

[SPARK-32161] [PYTHON] Removing JVM logs from SparkUpgradeException#34275

Closed
pralabhkumar wants to merge 2 commits intoapache:masterfrom
pralabhkumar:rk_spark_upgrade_exception
Closed

[SPARK-32161] [PYTHON] Removing JVM logs from SparkUpgradeException#34275
pralabhkumar wants to merge 2 commits intoapache:masterfrom
pralabhkumar:rk_spark_upgrade_exception

Conversation

@pralabhkumar
Copy link
Contributor

What changes were proposed in this pull request?

Hide JVM traceback for SparkUpgradeException
Following PR will result into

from pyspark.sql.functions import to_date, unix_timestamp, from_unixtime
df2 = df.select('date_str',to_date(from_unixtime(unix_timestamp('date_str', 'yyyy-dd-aa'))))
df2.show(1, False)


 raise converted from None

pyspark.sql.utils.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'yyyy-dd-aa' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Why are the changes needed?

This change will remove JVM traceback for pyspark in SparkUpgradeException. This will help to have stack trace more pythonic way

Does this PR introduce any user-facing change?

Yes user will be able to see only python stacktrace

How was this patch tested?

unit tests

@HyukjinKwon
Copy link
Member

ok to test


try:
import xmlrunner # type: ignore[import]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

raise converted from None
else:
raise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

except AnalysisException as e:
self.assertRegex(str(e), "Column '`中文字段`' does not exist")

def test_spark_upgrade_exception(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a short comment like # SPARK-XXXXX: .... see also pull request in https://spark.apache.org/contributing.html

df = self.spark.createDataFrame([("2014-31-12",)], ['date_str'])
df2 = df.select('date_str',
to_date(from_unixtime(unix_timestamp('date_str', 'yyyy-dd-aa'))))
self.assertRaises(SparkUpgradeException, lambda: df2.show(1, False))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.assertRaises(SparkUpgradeException, lambda: df2.show(1, False))
self.assertRaises(SparkUpgradeException, df2.collect)

self.assertRegex(str(e), "Column '`中文字段`' does not exist")

def test_spark_upgrade_exception(self):
from pyspark.sql.functions import to_date, unix_timestamp, from_unixtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to import this on the top of this file

@SparkQA
Copy link

SparkQA commented Oct 14, 2021

Test build #144219 has finished for PR 34275 at commit 2c38415.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkUpgradeException(CapturedException):

@SparkQA
Copy link

SparkQA commented Oct 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48698/

@SparkQA
Copy link

SparkQA commented Oct 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48698/

@SparkQA
Copy link

SparkQA commented Oct 14, 2021

Test build #144239 has finished for PR 34275 at commit e57cd52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48719/

@SparkQA
Copy link

SparkQA commented Oct 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48719/

@HyukjinKwon
Copy link
Member

Merged to master.

@pralabhkumar
Copy link
Contributor Author

Thx @HyukjinKwon for reviewing and merging it .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants