New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29783][SQL] Support SQL Standard/ISO_8601 output style for interval type #26418
Changes from 18 commits
88418e0
02be22b
429ee49
da119c5
fff17d5
2af8593
b3c9e08
005a28d
5ea0ff8
fa9c41e
f89e7c1
981eae5
be65a4d
748fbad
3aa723c
0f54bac
3cf71e9
8d18fac
0c530cc
08b7359
7641e5e
0f54af8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -409,6 +409,7 @@ case class Literal (value: Any, dataType: DataType) extends LeafExpression { | |
DateTimeUtils.getZoneId(SQLConf.get.sessionLocalTimeZone)) | ||
s"TIMESTAMP('${formatter.format(v)}')" | ||
case (v: Array[Byte], BinaryType) => s"X'${DatatypeConverter.printHexBinary(v)}'" | ||
case (v: CalendarInterval, CalendarIntervalType) => IntervalUtils.toMultiUnitsString(v) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry if this is already asked above but why we didn't change this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have not supported to parse interval from iso/SQL standard format yet There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did we not support iso/SQL standard format here together? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
case _ => value.toString | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,8 @@ import com.fasterxml.jackson.core.{JsonFactory, JsonParser} | |
import org.apache.spark.internal.Logging | ||
import org.apache.spark.sql.catalyst.util._ | ||
import org.apache.spark.sql.internal.SQLConf | ||
import org.apache.spark.sql.internal.SQLConf.IntervalStyle | ||
import org.apache.spark.sql.internal.SQLConf.IntervalStyle.IntervalStyle | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unused import |
||
|
||
/** | ||
* Options for parsing JSON data into Spark SQL rows. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -119,6 +119,10 @@ private[sql] class JacksonGenerator( | |
(row: SpecializedGetters, ordinal: Int) => | ||
gen.writeNumber(row.getDouble(ordinal)) | ||
|
||
case CalendarIntervalType => | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's deal with json in another PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this comment #26102 (comment) is valid for JSON datasource as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why wasn't this comment addressed? |
||
(row: SpecializedGetters, ordinal: Int) => | ||
gen.writeString(IntervalUtils.toMultiUnitsString(row.getInterval(ordinal))) | ||
|
||
case StringType => | ||
(row: SpecializedGetters, ordinal: Int) => | ||
gen.writeString(row.getUTF8String(ordinal).toString) | ||
|
@@ -214,10 +218,15 @@ private[sql] class JacksonGenerator( | |
private def writeMapData( | ||
map: MapData, mapType: MapType, fieldWriter: ValueWriter): Unit = { | ||
val keyArray = map.keyArray() | ||
val keyString = mapType.keyType match { | ||
case CalendarIntervalType => | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can't happen actually. We don't allow writing out interval values. Do you have an example that can hit this code path? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yaooqinn, how about |
||
(i: Int) => IntervalUtils.toMultiUnitsString(keyArray.getInterval(i)) | ||
case _ => (i: Int) => keyArray.get(i, mapType.keyType).toString | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's fragile to rely on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I am sorry I missed this cc. in JSON the key should be a string. We should either make it string always or explicitly disallow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @viirya I think we talked about this before. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, I think currently the map key is not very useful for some types. To make human readable map keys, we need do specific serialization for some map key types. Maybe I create a JIRA ticket to follow it up? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah .. +1 ! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Created https://issues.apache.org/jira/browse/SPARK-29946 to follow it up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This code path shouldn't be here per each map here BTW. |
||
} | ||
val valueArray = map.valueArray() | ||
var i = 0 | ||
while (i < map.numElements()) { | ||
gen.writeFieldName(keyArray.get(i, mapType.keyType).toString) | ||
gen.writeFieldName(keyString(i)) | ||
if (!valueArray.isNullAt(i)) { | ||
fieldWriter.apply(valueArray, i) | ||
} else { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
|
||
package org.apache.spark.sql.catalyst.util | ||
|
||
import java.math.BigDecimal | ||
import java.util.concurrent.TimeUnit | ||
|
||
import scala.util.control.NonFatal | ||
|
@@ -424,6 +425,111 @@ object IntervalUtils { | |
fromDoubles(interval.months / num, interval.days / num, interval.microseconds / num) | ||
} | ||
|
||
def toMultiUnitsString(interval: CalendarInterval): String = { | ||
if (interval.months == 0 && interval.days == 0 && interval.microseconds == 0) { | ||
return "0 seconds" | ||
} | ||
val sb = new StringBuilder | ||
if (interval.months != 0) { | ||
appendUnit(sb, interval.months / 12, "years") | ||
appendUnit(sb, interval.months % 12, "months") | ||
} | ||
appendUnit(sb, interval.days, "days") | ||
if (interval.microseconds != 0) { | ||
var rest = interval.microseconds | ||
appendUnit(sb, rest / MICROS_PER_HOUR, "hours") | ||
rest %= MICROS_PER_HOUR | ||
appendUnit(sb, rest / MICROS_PER_MINUTE, "minutes") | ||
rest %= MICROS_PER_MINUTE | ||
if (rest != 0) { | ||
val s = BigDecimal.valueOf(rest, 6).stripTrailingZeros.toPlainString | ||
sb.append(s).append(" seconds ") | ||
} | ||
} | ||
sb.setLength(sb.length - 1) | ||
sb.toString | ||
} | ||
|
||
private def appendUnit(sb: StringBuilder, value: Long, unit: String): Unit = { | ||
if (value != 0) sb.append(value).append(' ').append(unit).append(' ') | ||
} | ||
|
||
def toSqlStandardString(interval: CalendarInterval): String = { | ||
yaooqinn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
val yearMonthPart = if (interval.months < 0) { | ||
val ma = math.abs(interval.months) | ||
"-" + ma / 12 + "-" + ma % 12 | ||
} else if (interval.months > 0) { | ||
"+" + interval.months / 12 + "-" + interval.months % 12 | ||
} else { | ||
"" | ||
} | ||
|
||
val dayPart = if (interval.days < 0) { | ||
interval.days.toString | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shouldn't we add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes it is likely |
||
} else if (interval.days > 0) { | ||
"+" + interval.days | ||
} else { | ||
"" | ||
} | ||
|
||
val timePart = if (interval.microseconds != 0) { | ||
val sign = if (interval.microseconds > 0) "+" else "-" | ||
val sb = new StringBuilder(sign) | ||
var rest = math.abs(interval.microseconds) | ||
sb.append(rest / MICROS_PER_HOUR) | ||
sb.append(':') | ||
rest %= MICROS_PER_HOUR | ||
val minutes = rest / MICROS_PER_MINUTE; | ||
if (minutes < 10) { | ||
sb.append(0) | ||
} | ||
sb.append(minutes) | ||
sb.append(':') | ||
rest %= MICROS_PER_MINUTE | ||
val bd = BigDecimal.valueOf(rest, 6) | ||
if (bd.compareTo(new BigDecimal(10)) < 0) { | ||
sb.append(0) | ||
} | ||
val s = bd.stripTrailingZeros().toPlainString | ||
sb.append(s) | ||
sb.toString() | ||
} else { | ||
"" | ||
} | ||
|
||
val intervalList = Seq(yearMonthPart, dayPart, timePart).filter(_.nonEmpty) | ||
if (intervalList.nonEmpty) intervalList.mkString(" ") else "0" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wow, a single There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. postgres=# set IntervalStyle=sql_standard;
SET
postgres=# select interval '0';
interval
----------
0
(1 row)
postgres=# set IntervalStyle=postgres;
SET
postgres=# select interval '0';
interval
----------
00:00:00
(1 row) |
||
} | ||
|
||
def toIso8601String(interval: CalendarInterval): String = { | ||
val sb = new StringBuilder("P") | ||
|
||
val year = interval.months / 12 | ||
if (year != 0) sb.append(year + "Y") | ||
val month = interval.months % 12 | ||
if (month != 0) sb.append(month + "M") | ||
|
||
if (interval.days != 0) sb.append(interval.days + "D") | ||
|
||
if (interval.microseconds != 0) { | ||
sb.append('T') | ||
var rest = interval.microseconds | ||
val hour = rest / MICROS_PER_HOUR | ||
if (hour != 0) sb.append(hour + "H") | ||
rest %= MICROS_PER_HOUR | ||
val minute = rest / MICROS_PER_MINUTE | ||
if (minute != 0) sb.append(minute + "M") | ||
rest %= MICROS_PER_MINUTE | ||
if (rest != 0) { | ||
val bd = BigDecimal.valueOf(rest, 6) | ||
sb.append(bd.stripTrailingZeros().toPlainString + "S") | ||
} | ||
} else if (interval.days == 0 && interval.months == 0) { | ||
sb.append("T0S") | ||
} | ||
sb.toString() | ||
} | ||
|
||
private object ParseState extends Enumeration { | ||
type ParseState = Value | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,7 +37,6 @@ import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode | |
import org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator | ||
import org.apache.spark.sql.catalyst.plans.logical.HintErrorHandler | ||
import org.apache.spark.sql.connector.catalog.CatalogManager.SESSION_CATALOG_NAME | ||
import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy | ||
import org.apache.spark.unsafe.array.ByteArrayMethods | ||
import org.apache.spark.util.Utils | ||
|
||
|
@@ -1784,6 +1783,23 @@ object SQLConf { | |
.booleanConf | ||
.createWithDefault(false) | ||
|
||
object IntervalStyle extends Enumeration { | ||
type IntervalStyle = Value | ||
val SQL_STANDARD, ISO_8601, MULTI_UNITS = Value | ||
} | ||
|
||
val INTERVAL_STYLE = buildConf("spark.sql.intervalOutputStyle") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fyi: we might need to move this config into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know, but the behavior of this config is beyond the meaning of one |
||
.doc("When converting interval values to strings (i.e. for display), this config decides the" + | ||
" interval string format. The value SQL_STANDARD will produce output matching SQL standard" + | ||
" interval literals (i.e. '+3-2 +10 -00:00:01'). The value ISO_8601 will produce output" + | ||
" matching the ISO 8601 standard (i.e. 'P3Y2M10DT-1S'). The value MULTI_UNITS (which is the" + | ||
" default) will produce output in form of value unit pairs, (i.e. '3 year 2 months 10 days" + | ||
" -1 seconds'") | ||
.stringConf | ||
.transform(_.toUpperCase(Locale.ROOT)) | ||
.checkValues(IntervalStyle.values.map(_.toString)) | ||
.createWithDefault(IntervalStyle.MULTI_UNITS.toString) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I personally think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, I guess some users may already rely on the output string |
||
|
||
val SORT_BEFORE_REPARTITION = | ||
buildConf("spark.sql.execution.sortBeforeRepartition") | ||
.internal() | ||
|
@@ -2512,6 +2528,8 @@ class SQLConf extends Serializable with Logging { | |
def storeAssignmentPolicy: StoreAssignmentPolicy.Value = | ||
StoreAssignmentPolicy.withName(getConf(STORE_ASSIGNMENT_POLICY)) | ||
|
||
def intervalOutputStyle: IntervalStyle.Value = IntervalStyle.withName(getConf(INTERVAL_STYLE)) | ||
|
||
def ansiEnabled: Boolean = getConf(ANSI_ENABLED) | ||
|
||
def usePostgreSQLDialect: Boolean = getConf(DIALECT) == Dialect.POSTGRESQL.toString() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we use such string representation now? Was it in order to put the same logics into
IntervalUtils
? If that's the case, we didn't have to move but usetoString
of this class until this case becomes completely exposed.