Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39319][CORE][SQL] Make query contexts as a part of SparkThrowable #37209

Closed
wants to merge 34 commits into from
Closed
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
2ddaf26
Add QueryContext
MaxGekk Jul 17, 2022
9575a41
Make the compiler happy
MaxGekk Jul 17, 2022
51e7813
queryContext -> _queryContext
MaxGekk Jul 17, 2022
48cc9eb
Add SqlQueryContext
MaxGekk Jul 17, 2022
758ecde
Re-gen sql.out
MaxGekk Jul 17, 2022
a6ab09f
Context for DIVIDE_BY_ZERO
MaxGekk Jul 17, 2022
2dc083a
Context for arithmetic overflow
MaxGekk Jul 18, 2022
c421925
Context for CANNOT_CHANGE_DECIMAL_PRECISION
MaxGekk Jul 18, 2022
0b26f54
Context for INVALID_ARRAY_INDEX
MaxGekk Jul 18, 2022
d4431f6
Context for MAP_KEY_DOES_NOT_EXIST
MaxGekk Jul 18, 2022
cfc1ad2
Context for ARITHMETIC_OVERFLOW
MaxGekk Jul 18, 2022
d77a7f6
Context for CAST_INVALID_INPUT of boolean
MaxGekk Jul 18, 2022
508cea4
Context for CAST_INVALID_INPUT
MaxGekk Jul 19, 2022
29e9832
Remove _context
MaxGekk Jul 19, 2022
27df456
Remove _context from TreeNode
MaxGekk Jul 19, 2022
ee98642
Add comments
MaxGekk Jul 19, 2022
694c272
Merge remote-tracking branch 'origin/master' into query-context-in-sp…
MaxGekk Jul 19, 2022
c6d3a5a
Fix DecimalDivideWithOverflowCheck
MaxGekk Jul 19, 2022
d0542c0
Refactoring: rename to context
MaxGekk Jul 21, 2022
36a2fbf
Merge remote-tracking branch 'origin/master' into query-context-in-sp…
MaxGekk Jul 21, 2022
4338354
Override def getQueryContext
MaxGekk Jul 21, 2022
18f767f
Split QueryContext
MaxGekk Jul 22, 2022
d02cdeb
Use org.apache.spark._
MaxGekk Jul 22, 2022
4d74636
Check all context fields in TreeNodeSuite
MaxGekk Jul 22, 2022
a93e2d4
Revert error context code
MaxGekk Jul 22, 2022
d1f9276
Merge remote-tracking branch 'origin/master' into query-context-in-sp…
MaxGekk Jul 22, 2022
59a284e
Merge remote-tracking branch 'origin/master' into query-context-in-sp…
MaxGekk Jul 23, 2022
9159969
Merge remote-tracking branch 'origin/master' into query-context-in-sp…
MaxGekk Jul 24, 2022
0c92987
Rename SqlQueryContext to SQLQueryContext
MaxGekk Jul 25, 2022
8dd5b60
lazy val context -> val context
MaxGekk Jul 25, 2022
113c562
def summary -> lazy val summary
MaxGekk Jul 25, 2022
248a2fa
Make QueryContextSummary as a private interface
MaxGekk Jul 25, 2022
c533267
Remove QueryContextSummary
MaxGekk Jul 26, 2022
7095af5
Merge remote-tracking branch 'origin/master' into query-context-in-sp…
MaxGekk Jul 26, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 48 additions & 0 deletions core/src/main/java/org/apache/spark/QueryContext.java
@@ -0,0 +1,48 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark;

import org.apache.spark.annotation.Evolving;

/**
* Query context of a {@link SparkThrowable}. It helps users understand where error occur
* while executing queries.
*
* @since 3.4.0
*/
@Evolving
public interface QueryContext {
// The object type of the query which throws the exception.
// If the exception is directly from the main query, it should be an empty string.
// Otherwise, it should be the exact object type in upper case. For example, a "VIEW".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but we should use javadoc for public APIs, which will show up in our API doc. The same to SparkThrowable. We can fix them all in a followup.

String objectType();

// The object name of the query which throws the exception.
// If the exception is directly from the main query, it should be an empty string.
// Otherwise, it should be the object name. For example, a view name "V1".
String objectName();

// The starting index in the query text which throws the exception. The index starts from 0.
int startIndex();

// The stopping index in the query which throws the exception. The index starts from 0.
int stopIndex();

// The corresponding fragment of the query which throws the exception.
String fragment();
}
31 changes: 31 additions & 0 deletions core/src/main/java/org/apache/spark/QueryContextSummary.java
@@ -0,0 +1,31 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark;

import org.apache.spark.annotation.Evolving;

/**
* Build textual summary from a query context.
*
* @since 3.4.0
*/
@Evolving
public interface QueryContextSummary extends QueryContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be a public API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it as a private one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use private[spark]? to make sure it won't show up in the API doc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Java. Does such syntax work here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's for internal use only, we can write it in Scala.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I don't think we need this interface. Having SQLQueryContext is enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to remove it ...

// Textual summary of the query context
String summary();
}
2 changes: 2 additions & 0 deletions core/src/main/java/org/apache/spark/SparkThrowable.java
Expand Up @@ -59,4 +59,6 @@ default String[] getMessageParameters() {
default String[] getParameterNames() {
return SparkThrowableHelper.getParameterNames(this.getErrorClass(), this.getErrorSubClass());
}

default QueryContext[] getQueryContext() { return new QueryContext[0]; }
}
Expand Up @@ -39,8 +39,7 @@ public SparkOutOfMemoryError(OutOfMemoryError e) {
}

public SparkOutOfMemoryError(String errorClass, String[] messageParameters) {
super(SparkThrowableHelper.getMessage(errorClass, null,
messageParameters, ""));
super(SparkThrowableHelper.getMessage(errorClass, null, messageParameters));
this.errorClass = errorClass;
this.messageParameters = messageParameters;
}
Expand Down
16 changes: 10 additions & 6 deletions core/src/main/scala/org/apache/spark/ErrorInfo.scala
Expand Up @@ -71,11 +71,18 @@ private[spark] object SparkThrowableHelper {
mapper.readValue(errorClassesUrl, new TypeReference[SortedMap[String, ErrorInfo]]() {})
}

def getMessage(
errorClass: String,
errorSubClass: String,
messageParameters: Array[String]): String = {
getMessage(errorClass, errorSubClass, messageParameters, None)
}

def getMessage(
errorClass: String,
errorSubClass: String,
messageParameters: Array[String],
queryContext: String = ""): String = {
context: Option[QueryContextSummary]): String = {
val errorInfo = errorClassToInfoMap.getOrElse(errorClass,
throw new IllegalArgumentException(s"Cannot find error class '$errorClass'"))
val (displayClass, displayMessageParameters, displayFormat) = if (errorInfo.subClass.isEmpty) {
Expand All @@ -93,11 +100,8 @@ private[spark] object SparkThrowableHelper {
val displayMessage = String.format(
displayFormat.replaceAll("<[a-zA-Z0-9_-]+>", "%s"),
displayMessageParameters : _*)
val displayQueryContext = if (queryContext.isEmpty) {
""
} else {
s"\n$queryContext"
}
val displayQueryContext = context.map(q => "\n" + q.summary()).getOrElse("")

s"[$displayClass] $displayMessage$displayQueryContext"
}

Expand Down
52 changes: 27 additions & 25 deletions core/src/main/scala/org/apache/spark/SparkException.scala
Expand Up @@ -119,15 +119,16 @@ private[spark] class SparkArithmeticException(
errorClass: String,
errorSubClass: Option[String] = None,
messageParameters: Array[String],
queryContext: String = "")
context: Option[QueryContextSummary] = None)
MaxGekk marked this conversation as resolved.
Show resolved Hide resolved
extends ArithmeticException(
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull,
messageParameters, queryContext))
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull, messageParameters, context))
with SparkThrowable {

override def getMessageParameters: Array[String] = messageParameters
override def getErrorClass: String = errorClass
override def getErrorSubClass: String = errorSubClass.orNull}
override def getErrorSubClass: String = errorSubClass.orNull
override def getQueryContext: Array[QueryContext] = context.toArray
}

/**
* Unsupported operation exception thrown from Spark with an error class.
Expand Down Expand Up @@ -193,15 +194,16 @@ private[spark] class SparkDateTimeException(
errorClass: String,
errorSubClass: Option[String] = None,
messageParameters: Array[String],
queryContext: String = "")
context: Option[QueryContextSummary] = None)
extends DateTimeException(
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull,
messageParameters, queryContext))
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull, messageParameters, context))
with SparkThrowable {

override def getMessageParameters: Array[String] = messageParameters
override def getErrorClass: String = errorClass
override def getErrorSubClass: String = errorSubClass.orNull}
override def getErrorSubClass: String = errorSubClass.orNull
override def getQueryContext: Array[QueryContext] = context.toArray
}

/**
* Hadoop file already exists exception thrown from Spark with an error class.
Expand Down Expand Up @@ -240,15 +242,16 @@ private[spark] class SparkNumberFormatException(
errorClass: String,
errorSubClass: Option[String] = None,
messageParameters: Array[String],
queryContext: String)
context: Option[QueryContextSummary])
extends NumberFormatException(
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull,
messageParameters, queryContext))
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull, messageParameters, context))
with SparkThrowable {

override def getMessageParameters: Array[String] = messageParameters
override def getErrorClass: String = errorClass
override def getErrorSubClass: String = errorSubClass.orNull}
override def getErrorSubClass: String = errorSubClass.orNull
override def getQueryContext: Array[QueryContext] = context.toArray
}

/**
* No such method exception thrown from Spark with an error class.
Expand Down Expand Up @@ -317,23 +320,22 @@ private[spark] class SparkRuntimeException(
errorSubClass: Option[String] = None,
messageParameters: Array[String],
cause: Throwable = null,
queryContext: String = "")
context: Option[QueryContextSummary] = None)
extends RuntimeException(
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull,
messageParameters, queryContext),
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull, messageParameters, context),
cause)
with SparkThrowable {

def this(errorClass: String,
errorSubClass: String,
messageParameters: Array[String],
cause: Throwable,
queryContext: String)
context: Option[QueryContextSummary])
= this(errorClass = errorClass,
errorSubClass = Some(errorSubClass),
messageParameters = messageParameters,
cause = cause,
queryContext = queryContext)
context = context)

def this(errorClass: String,
errorSubClass: String,
Expand All @@ -342,11 +344,12 @@ private[spark] class SparkRuntimeException(
errorSubClass = Some(errorSubClass),
messageParameters = messageParameters,
cause = null,
queryContext = "")
context = None)

override def getMessageParameters: Array[String] = messageParameters
override def getErrorClass: String = errorClass
override def getErrorSubClass: String = errorSubClass.orNull
override def getQueryContext: Array[QueryContext] = context.toArray
}

/**
Expand All @@ -372,16 +375,15 @@ private[spark] class SparkArrayIndexOutOfBoundsException(
errorClass: String,
errorSubClass: Option[String] = None,
messageParameters: Array[String],
queryContext: String)
context: Option[QueryContextSummary])
extends ArrayIndexOutOfBoundsException(
// scalastyle:off line.size.limit
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull, messageParameters, queryContext))
// scalastyle:on line.size.limit
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull, messageParameters, context))
with SparkThrowable {

override def getMessageParameters: Array[String] = messageParameters
override def getErrorClass: String = errorClass
override def getErrorSubClass: String = errorSubClass.orNull
override def getQueryContext: Array[QueryContext] = context.toArray
}

/**
Expand Down Expand Up @@ -413,15 +415,15 @@ private[spark] class SparkNoSuchElementException(
errorClass: String,
errorSubClass: Option[String] = None,
messageParameters: Array[String],
queryContext: String)
context: Option[QueryContextSummary])
extends NoSuchElementException(
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull,
messageParameters, queryContext))
SparkThrowableHelper.getMessage(errorClass, errorSubClass.orNull, messageParameters, context))
with SparkThrowable {

override def getMessageParameters: Array[String] = messageParameters
override def getErrorClass: String = errorClass
override def getErrorSubClass: String = errorSubClass.orNull
override def getQueryContext: Array[QueryContext] = context.toArray
}

/**
Expand Down
Expand Up @@ -25,7 +25,7 @@ import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, TypeCoercion}
import org.apache.spark.sql.catalyst.expressions.codegen._
import org.apache.spark.sql.catalyst.expressions.codegen.Block._
import org.apache.spark.sql.catalyst.trees.TreeNodeTag
import org.apache.spark.sql.catalyst.trees.{SqlQueryContext, TreeNodeTag}
import org.apache.spark.sql.catalyst.trees.TreePattern._
import org.apache.spark.sql.catalyst.util._
import org.apache.spark.sql.catalyst.util.DateTimeConstants._
Expand Down Expand Up @@ -481,10 +481,10 @@ case class Cast(

override def nullable: Boolean = child.nullable || Cast.forceNullable(child.dataType, dataType)

override def initQueryContext(): String = if (ansiEnabled) {
origin.context
override def initQueryContext(): Option[SqlQueryContext] = if (ansiEnabled) {
Some(origin.context)
} else {
""
None
}

// When this cast involves TimeZone, it's only resolved if the timeZoneId is set;
Expand Down Expand Up @@ -995,9 +995,12 @@ case class Cast(
* If overflow occurs, if `spark.sql.ansi.enabled` is false, null is returned;
* otherwise, an `ArithmeticException` is thrown.
*/
private[this] def toPrecision(value: Decimal, decimalType: DecimalType): Decimal =
private[this] def toPrecision(
value: Decimal,
decimalType: DecimalType,
context: Option[SqlQueryContext]): Decimal =
value.toPrecision(
decimalType.precision, decimalType.scale, Decimal.ROUND_HALF_UP, !ansiEnabled)
decimalType.precision, decimalType.scale, Decimal.ROUND_HALF_UP, !ansiEnabled, context)


private[this] def castToDecimal(from: DataType, target: DecimalType): Any => Any = from match {
Expand All @@ -1010,14 +1013,15 @@ case class Cast(
buildCast[UTF8String](_,
s => changePrecision(Decimal.fromStringANSI(s, target, queryContext), target))
case BooleanType =>
buildCast[Boolean](_, b => toPrecision(if (b) Decimal.ONE else Decimal.ZERO, target))
buildCast[Boolean](_,
b => toPrecision(if (b) Decimal.ONE else Decimal.ZERO, target, queryContext))
case DateType =>
buildCast[Int](_, d => null) // date can't cast to decimal in Hive
case TimestampType =>
// Note that we lose precision here.
buildCast[Long](_, t => changePrecision(Decimal(timestampToDouble(t)), target))
case dt: DecimalType =>
b => toPrecision(b.asInstanceOf[Decimal], target)
b => toPrecision(b.asInstanceOf[Decimal], target, queryContext)
case t: IntegralType =>
b => changePrecision(Decimal(t.integral.asInstanceOf[Integral[Any]].toLong(b)), target)
case x: FractionalType =>
Expand Down
Expand Up @@ -24,7 +24,7 @@ import org.apache.spark.sql.catalyst.analysis.{FunctionRegistry, TypeCheckResult
import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction
import org.apache.spark.sql.catalyst.expressions.codegen._
import org.apache.spark.sql.catalyst.expressions.codegen.Block._
import org.apache.spark.sql.catalyst.trees.{BinaryLike, LeafLike, QuaternaryLike, TernaryLike, TreeNode, UnaryLike}
import org.apache.spark.sql.catalyst.trees.{BinaryLike, LeafLike, QuaternaryLike, SqlQueryContext, TernaryLike, TreeNode, UnaryLike}
import org.apache.spark.sql.catalyst.trees.TreePattern.{RUNTIME_REPLACEABLE, TreePattern}
import org.apache.spark.sql.catalyst.util.truncatedString
import org.apache.spark.sql.errors.QueryExecutionErrors
Expand Down Expand Up @@ -593,9 +593,9 @@ abstract class UnaryExpression extends Expression with UnaryLike[Expression] {
* to executors. It will also be kept after rule transforms.
*/
trait SupportQueryContext extends Expression with Serializable {
protected var queryContext: String = initQueryContext()
protected var queryContext: Option[SqlQueryContext] = initQueryContext()

def initQueryContext(): String
def initQueryContext(): Option[SqlQueryContext]

// Note: Even though query contexts are serialized to executors, it will be regenerated from an
// empty "Origin" during rule transforms since "Origin"s are not serialized to executors
Expand Down