Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42752][PYSPARK][SQL] Make PySpark exceptions printable during initialization #40372

Closed
wants to merge 1 commit into from

Conversation

gerashegalov
Copy link
Contributor

Ignore SQLConf initialization exceptions during Python exception creation.

Otherwise there is no diagnostics for the issue in the following scenario:

  1. download a standard "Hadoop Free" build
  2. Start PySpark REPL with Hive support
SPARK_DIST_CLASSPATH=$(~/dist/hadoop-3.4.0-SNAPSHOT/bin/hadoop classpath) \
  ~/dist/spark-3.2.3-bin-without-hadoop/bin/pyspark --conf spark.sql.catalogImplementation=hive
  1. Execute any simple dataframe operation
>>> spark.range(100).show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py", line 416, in range
    jdf = self._jsparkSession.range(0, int(start), int(step), int(numPartitions))
  File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
  File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", line 117, in deco
    raise converted from None
pyspark.sql.utils.IllegalArgumentException: <exception str() failed>
  1. In fact just spark.conf already exhibits the issue
>>> spark.conf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py", line 347, in conf
    self._conf = RuntimeConfig(self._jsparkSession.conf())
  File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
  File "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", line 117, in deco
    raise converted from None
pyspark.sql.utils.IllegalArgumentException: <exception str() failed>

There are probably two issues here:

  1. that Hive support should be gracefully disabled if it the dependency not on the classpath as claimed by https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
  2. but at the very least the user should be able to see the exception to understand the issue, and take an action

What changes were proposed in this pull request?

Ignore exceptions during CapturedException creation

Why are the changes needed?

To make the cause visible to the user

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/gits/apache/spark/python/pyspark/sql/session.py", line 679, in conf
    self._conf = RuntimeConfig(self._jsparkSession.conf())
  File "/home/user/gits/apache/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
  File "/home/user/gits/apache/spark/python/pyspark/errors/exceptions/captured.py", line 166, in deco
    raise converted from None
pyspark.errors.exceptions.captured.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':

JVM stacktrace:
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
        at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1237)
        at org.apache.spark.sql.SparkSession.$anonfun$sessionState$2(SparkSession.scala:162)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:160)
        at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:157)
        at org.apache.spark.sql.SparkSession.conf$lzycompute(SparkSession.scala:185)
        at org.apache.spark.sql.SparkSession.conf(SparkSession.scala:185)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveSessionStateBuilder
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
        at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1232)
        ... 18 more

Does this PR introduce any user-facing change?

The only semantic change is that the conf spark.sql.pyspark.jvmStacktrace.enabled is ignored if the SQLConf is broken.

How was this patch tested?

Manual testing using the repro steps above

Ignore SQLConf initialization exceptions during Python exception
creation
@zhengruifeng
Copy link
Contributor

cc @itholic @HyukjinKwon

Copy link
Contributor

@itholic itholic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, looks good otherwise.

python/pyspark/errors/exceptions/captured.py Show resolved Hide resolved
@srowen
Copy link
Member

srowen commented Mar 14, 2023

Merged to master

@srowen srowen closed this in b2a7f14 Mar 14, 2023
@gerashegalov gerashegalov deleted the SPARK-42752 branch March 14, 2023 16:13
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@gerashegalov
Copy link
Contributor Author

Thanks for reviews and merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants