Skip to content

Conversation

docete
Copy link
Contributor

@docete docete commented Nov 6, 2019

What is the purpose of the change

Since FLINK-14080 introduced an internal representation(SqlTimestamp) of TimestampType with precision. This subtask will replace current long with SqlTimestamp, and let blink planner support precision of TimestampType.

Note:

  • Tables defined by DDL is not supported in this PR since de/serializing DataTypes lost precision information. Would open another ticket to cover.
  • Built-in time functions/operators which use long to compute is not supported in this PR too. Would open another ticket to cover.

Brief change log

  • The first commit(ee687db) introduces setTimestamp/getTimestamp interface to TypeGetterSetters/VectorizedColumnBatch and writeTimestamp interface to BinaryWriter
  • be1beee to 0b57ee3 replace current long with SqlTimestamp, and Fix all existing failure cases
  • 915064c introduces LegacyTimestampTypeInfo and LegacyLocalDateTimeTypeInfo to represent Types.SQL_TIMESTAMP and Types.LOCAL_DATE_TIME with precision
  • 3d1de92 supports precision of Timestamp type in table source
  • 8751385 supports precision of Timestamp type for timestamp literals
  • 90f3316 Make the default precision of Timestamp type as 6
  • b659001 Make the interpreted string of Timestamp type conforms to the definition of timestamp literal which defined in SQL standard

Verifying this change

This change is already covered by existing tests and new tests.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Nov 6, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 212e234 (Wed Apr 15 11:35:06 UTC 2020)

Warnings:

  • 1 pom.xml files were touched: Check for build and licensing issues.
  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Nov 6, 2019

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Havn't finished yet...


val STRING_UTIL: String = className[BinaryStringUtil]

val SQL_TIMESTAMP_TERM: String = className[SqlTimestamp]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to SQL_TIMESTAMP, term normally used as variable name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


// declaration
reusableMemberStatements.add(s"private long $fieldTerm;")
reusableMemberStatements.add(s"private $SQL_TIMESTAMP_TERM $fieldTerm;")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use SqlTimestamp when we want a ReusableLocalDateTime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All java.sql.Timestamp and java.time.LocalDateTime in our engine should be represented as SqlTimestamp now.

case SqlTypeName.TIMESTAMP =>
val reducedValue = reduced.getField(reducedIdx)
val value = if (reducedValue != null) {
Long.box(reducedValue.asInstanceOf[SqlTimestamp].getMillisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we reduce timestamp to Long but not SqlTimestamp?

Copy link
Contributor Author

@docete docete Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want not break the original sql timestamp literal logic.
I think an individual commit would catch this, and let the sql timestamp literal support precision.

case TIMESTAMP_WITHOUT_TIME_ZONE =>
// TODO: support Timestamp(3) now
val fieldTerm = newName("timestamp")
val millis = literalValue.asInstanceOf[Long]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why when we generating a literal for TIMESTAMP type, the value will be passed in as Long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

| $contextTerm.timerService().currentProcessingTime());
|""".stripMargin.trim
// the proctime has been materialized, so it's TIMESTAMP now, not PROCTIME_INDICATOR
GeneratedExpression(resultTerm, NEVER_NULL, resultCode, new TimestampType(3))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new TimestampType(3) -> resultType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this nice work! It makes Timestamp more close to sql standard. I left some comments.

@JingsongLi Please also take a look at this PR, especially code generation and blink-runtime part.

datetime = valueLiteral.getValueAs(Timestamp.class)
.orElseThrow(() -> new TableException("Invalid literal.")).toLocalDateTime();
} else {
throw new TableException("Invalid literal.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record the illegal class here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

if (precision > 9 || precision < 0) {
throw new TableException(
s"TIMESTAMP precision is not supported: ${relDataType.getPrecision}")
s"TIMESTAMP precision is not supported: ${precision}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need {}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

case TIMESTAMP =>
if (relDataType.getPrecision > 3) {
val precision = relDataType.getPrecision
if (precision > 9 || precision < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change 9 to TimestampType.MAX_PRECISION?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Int.MaxValue

// The maximal precision of TIMESTAMP is 3, change it to 9 to support nanoseconds precision
case SqlTypeName.TIMESTAMP => 9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change 9 to TimestampType.MAX_PRECISION?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, please also check line #46, I remember our default precision for TIMESTAMP is 6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But the default precision of Timestamp in Calcite is 3.
Should we change this behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the impact of changing it to 9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change it 6 and fix failures in 90f3316

}

// TODO: we copied the logical of TimestampString::getMillisSinceEpoch since the copied
// DateTimeUtils.ymdToJulian is wrong.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the comment of DateTimeUtils, I think we already aware of CALCITE-1884, do you mean we didn't fix it in copied DateTimeUtils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. CALCITE-1884 is not fixed in our copied DateTimeUtils. And it's the root cause of some delicate cases, such as:
SELECT TIMESTAMP '1500-04-30 00:00:00.123456789' FROM docs; SELECT CAST('1500-04-30 00:00:00.123456789' AS DATETIME(9)) FROM docs;
should returns
1500-05-10T00:00:00.123456789

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how about fixing it in our copied DateTimeUtils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons to not fixing our copied DateTimeUtils in this PR

  1. two copies of DateTimeUtils should remain the same in legacy planner and blink planner
  2. and the impact should be evaluated for both planner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLINK-11935 should do this, and after that this copied code could be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to create another issue for this, since I don't think the author of FLINK-11935 will be aware of these codes and remove them.

LocalTimeConverter.INSTANCE.toExternal(v)

case TIMESTAMP_WITHOUT_TIME_ZONE =>
val v = literal.getValueAs(classOf[java.lang.Long])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the literal for TIMESTAMP_WITHOUT_TIME_ZONE is long? I remember somewhere else you presume the literal is TimestampString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It should be TimestampString and preserve the nanosecond precision.


testSqlApi(
"ARRAY[TIMESTAMP '1985-04-11 14:15:16.1234567', TIMESTAMP '2018-07-26 17:18:19.1234567']",
"[1985-04-11T14:15:16.123456700, 2018-07-26T17:18:19.123456700]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the result isn't 1985-04-11T14:15:16.1234567?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the LocalDateTime.toString style to represent string-format TimestampType when precision > 3. It follows one of the following ISO-8601 formats:

uuuu-MM-dd'T'HH:mm:ss.SSSSSS
uuuu-MM-dd'T'HH:mm:ss.SSSSSSSSS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check whether this fits the sql standard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interpreted string should conforms to the definition of timestamp literal(SQL 2011 Part 2 Section 6.13 General Rules (11) (d)). And i will fix it in next commit.


testSqlApi(
"ARRAY[TIMESTAMP '1985-04-11 14:15:16.1234', TIMESTAMP '2018-07-26 17:18:19.1234']",
"[1985-04-11T14:15:16.123400, 2018-07-26T17:18:19.123400]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the result isn't 1985-04-11T14:15:16.1234?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

import java.time.{Instant, ZoneId}
import java.util.{Locale, TimeZone}

class TemporalTypesTest extends ExpressionTestBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extract all Timestamp related test to a dedicated one? TemporalTypesTest seems weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So divide it to TimestampTypeTest/DateTypeTest/TimeTypeTest ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked the TemporalTypesTest and find some datetime function can handle multiple datetime types(e.g. extract/floor etc). Split the class causes tests for one single function live in different places. So I would rename this as DateTimeTypesTest to avoid confusing.

assertIndexIsValid(pos);

if (SqlTimestamp.isCompact(precision)) {
return SqlTimestamp.fromEpochMillis(segments[0].getLong(getElementOffset(pos, 8)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think always reading from segments[0] is wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, it should be SegmentsUtil.getLong(segments, getElementOffset(pos, 8)

@JingsongLi
Copy link
Contributor

Thanks @docete , I will review this these two days.

@JingsongLi
Copy link
Contributor

Can we put nanos of SqlTimestamp to the fixed-length part in BinaryRow? In this way:
offsetAndNanos in fixed-length part and milliseconds in variable-length part.

@docete
Copy link
Contributor Author

docete commented Nov 13, 2019

Can we put nanos of SqlTimestamp to the fixed-length part in BinaryRow? In this way:
offsetAndNanos in fixed-length part and milliseconds in variable-length part.

OK. It can save 4 bytes per high precision timestamp object.

import java.io.File
import java.util.TimeZone

import org.apache.calcite.util.TimestampString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this together with other calcite imports

import java.math.{BigDecimal => JBigDecimal}
import java.lang.{Integer => JInteger}

import org.apache.flink.table.runtime.functions.SqlDateTimeUtils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this together with other flink imports

val v = literal.getValueAs(classOf[java.lang.Long])
LocalDateTimeConverter.INSTANCE.toExternal(v)
val timestampString = literal.getValueAs(classOf[TimestampString])
val millis = SqlDateTimeUtils.getMillisSinceEpoch(timestampString.toString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse the result of timestampString.toString

"CAST(f0 AS TIMESTAMP)",
"1990-10-14 00:00:00.000")
"CAST(f0 AS TIMESTAMP(3))",
"1990-10-14 00:00:00")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIMESTAMP(3) should be print as 1990-10-14 00:00:00.000?

Copy link
Contributor Author

@docete docete Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL 2011 defined the interpreted string value as the shortest character string that conforms to the definition of literal. And I checked other DBMSs:

MySQL 5.6: 'SELECT CAST(TIMESTAMP '1970-01-01 11:22:33' AS DATETIME(3))' => '1970-01-01T11:22:33Z'

PG 9.6: `SELECT CAST('1970-01-01 11:22:33' AS TIMESTAMP(3))' => '1970-01-01T11:22:33Z'

ORACLE 11g R2: select CAST(TIMESTAMP '1970-01-01 11:22:33' AS TIMESTAMP(3)) from log_table => '1970-01-01 11:22:33.0'

MSSQL 2017: SELECT CAST({ts '1970-01-01 11:22:33.000'} AS DATETIME), CAST({ts '1970-01-01 11:22:33.000'} AS DATETIME2) => '1970-01-01T11:22:33Z', '1970-01-01 11:22:33.0000000'

Only MSSQL's DATETIME2 will preserve tailing '0's and the precision can not be specified. IMO we shouldn't preserve the tailing '0's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO 1990-10-14 00:00:00.000 is more intuitive, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore the tailing '0's is more standard-compliant since it's the shortest to represent the literal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Does SQL standard describe something with this topic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It just describe the cast specification from datetime type to char/varchar type:
If SD is a datetime data type or an interval data type then let Y be the shortest character string that conforms to the definition of <literal> in Subclause 5.3, “<literal>”, and such that the interpreted value of Y is SV and the interpreted precision of Y is the precision of SD. If SV is a negative interval, then <sign> shall be specified within <unquoted interval string> in the literal Y.

<!-- Use TimestampString to cover CAST TIMESTAMP to STRING -->
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not introduce calcite dependency for table-runtime module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any harmful things to introduce calcite dependency?
The calcite's TimestampString provides standard-compliant processing of Timestamp literal which we can depend on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recalled depending on calcite in different places introduced some packaging problems during 1.9.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to not introduce calcite to runtime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 not introduce calcite. We are aiming to achieve Calcite-free in runtime to support different users' Calcite in a long term (shading Calcite is not easy).

case TIMESTAMP_WITHOUT_TIME_ZONE =>
generateOperatorIfNotNull(ctx, new TimestampType(), left, right) {
(l, r) => s"($l * ${MILLIS_PER_DAY}L) $op $r"
generateOperatorIfNotNull(ctx, new TimestampType(3), left, right) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about this?

case (TIMESTAMP_WITHOUT_TIME_ZONE, BIGINT) =>
generateUnaryOperatorIfNotNull(ctx, targetType, operand) {
operandTerm => s"((long) ($operandTerm / 1000))"
operandTerm => s"((long) ($operandTerm.getMillisecond() / 1000))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about this?

@KurtYoung
Copy link
Contributor

Please rebase to master, we should let travis verify all the tests since you modified a lot of them.

@docete
Copy link
Contributor Author

docete commented Nov 13, 2019

@KurtYoung The Date plus Day-Time Interval case and Cast Timestamp to BigInt case, I prefer leave it to the next PR. The semantic of arithmetic and cast operator for date time type need to be double check. Mix the change of only one or two cases in the PR is incomplete.

By the way, the semantic of TIMESTAMPADD for DATE +/- INTERVAL_DAY_TIME(hours, minutes and seconds) seems OK to return a TIMESTAMP. What do you think?

…rface to TypeGetterSetters/VectorizedColumnBatch and writeTimestamp interface to BinaryWriter
…sting Timestamp to Timestamp with local time zone
@docete
Copy link
Contributor Author

docete commented Nov 13, 2019

Rebased to master.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @docete .
Havn't finished yet...

return relBuilder.getRexBuilder().makeTimestampLiteral(TimestampString.fromCalendarFields(
valueAsCalendar(extractValue(valueLiteral, java.sql.Timestamp.class))), 3);
TimestampType timestampType = (TimestampType) type;
Class<?> clazz = valueLiteral.getOutputDataType().getConversionClass();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • First, these logical should in extractValue.
  • Second, we should just use valueLiteral.getValueAs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use valueLiteral.getValueAs(LocalDateTime.class).

} else {
throw new TableException(String.format("Invalid literal of %s.", clazz.getCanonicalName()));
}
return relBuilder.getRexBuilder().makeTimestampLiteral(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have util in SqlDateTimeUtils or other class: toTimestampString(LocalDateTime)

val value = if (reducedValue != null) {
val ts = reducedValue.asInstanceOf[SqlTimestamp]
val milliseconds = ts.getMillisecond
val nanoseconds = ts.toLocalDateTime.getNano
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use LocalDateTime to TimestampString?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little tricky here, take CAST('1500-04-30 00:00:00' AS TIMESTAMP(3)) as an example:
we use SqlDateTimeUtils.toTimestamp[1] to cast the string to SqlTimestamp and use ExpressionReducer[2] to make a Timestamp literal.

The [1] step considers Gregorian cutovers and again the [2] step uses TimestampString.fromMillisSinceEpoch which calls our copied DateTimeUtils.julianToString considers Gregorian cutovers too. Two wrongs make a correct result.

I will change the toTimestamp and this ExpressionReducer to ignore Gregorian cutovers in next PR and that will be the final one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can fix it, and it is very worthwhile to fix. Otherwise it will bring a lot of strange code.

generateNonNullLiteral(literalType, millis + "L", millis)
val fieldTerm = newName("timestamp")
val timestampString = literalValue.asInstanceOf[TimestampString].toString
val millis = getMillisSinceEpoch(timestampString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should just use TimestampString to localDateTime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can TimestampString get a LocalDateTime?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In getMillisSinceEpoch, you have got the year, month, and etc.. Why not construct a LocalDateTime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMillisSinceEpoch is a util function of TimestampString, we copied it because our copied DateTimeUtils causes Gregorian cutovers issues. We should use TimestampString.getMillisSinceEpoch directly when we remove our copied DateTimeUtils. Do you mean we should not depend on TimestampString and parse the timestamp string by ourself?

val resultTerm = ctx.addReusableLocalVariable(resultTypeTerm, "result")
val evalResult = s"$functionReference.eval(${parameters.map(_.resultTerm).mkString(", ")})"
val evalResult =
if (returnType.getTypeRoot == LogicalTypeRoot.TIMESTAMP_WITHOUT_TIME_ZONE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Types.TIMESTAMP can be represented as long is a very old document, now, TIMESTAMP_WITHOUT_TIME_ZONE can not be represented as long, you can take a look to TimestampType.INPUT_OUTPUT_CONVERSION.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea whether we should disable a UDX with long parameter to accept a Timestamp. What do you think @KurtYoung ?


parameterClasses.zipWithIndex.zip(operands).map { case ((paramClass, i), operandExpr) =>
var newOperandExpr = operandExpr
if (operandExpr.resultType.getTypeRoot == LogicalTypeRoot.TIMESTAMP_WITHOUT_TIME_ZONE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

case TIMESTAMP_WITHOUT_TIME_ZONE =>
generateOperatorIfNotNull(ctx, new TimestampType(), left, right) {
(l, r) => s"($l * ${MILLIS_PER_DAY}L) $op $r"
generateOperatorIfNotNull(ctx, new TimestampType(3), left, right) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @KurtYoung . It not a clean way.
BTW, You can split this PR in other ways, such as format related changes and so on.

val paraInternalType = fromDataTypeToLogicalType(parameterType)
if (isAny(internal) && isAny(paraInternalType)) {
getDefaultExternalClassForType(internal) == getDefaultExternalClassForType(paraInternalType)
} else if ((isTimestamp(internal) && isLong(paraInternalType))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, logical type already defined the conversions.

assert cursor > 0 : "invalid cursor " + cursor;

// zero-out the bytes
SegmentsUtil.setLong(segments, offset + cursor, 0L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this setLong to if (value == null). This is not decimal, timestamp is fix-length.
Others format too.

<!-- Use TimestampString to cover CAST TIMESTAMP to STRING -->
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to not introduce calcite to runtime.

@JingsongLi
Copy link
Contributor

Hi @docete , I think we should try our best to avoid merging the code in and fixing it in next PR.
A better way is to split the PR's dependency work as much as possible, so that the PR can be smaller.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @docete . I left some commetns but didn't go through the PR in detail.

} else if (typeInfo instanceof BigDecimalTypeInfo) {
BigDecimalTypeInfo decimalType = (BigDecimalTypeInfo) typeInfo;
return new DecimalType(decimalType.precision(), decimalType.scale());
} else if (typeInfo instanceof LegacyLocalDateTimeTypeInfo) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly introduced LegacyLocalDateTimeTypeInfo is wired. In theory, we don't need such things unless somewhere converting DataType to TypeInformation and back again. I think we should find out the root cause why we need this and create JIRA to fix the root cause. And add comments on these classes with the JIRA id.

Otherwise, we don't know how to remove these temporary code in the future.

}

/**
* Obtains an instance of {@code SqlTimestamp} from a millisecond.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I would suggest to use {@link SqlTimestamp}.

<!-- Use TimestampString to cover CAST TIMESTAMP to STRING -->
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 not introduce calcite. We are aiming to achieve Calcite-free in runtime to support different users' Calcite in a long term (shading Calcite is not easy).

case SqlTypeName.VARCHAR | SqlTypeName.CHAR | SqlTypeName.VARBINARY | SqlTypeName.BINARY =>
Int.MaxValue

// The maximal precision of TIMESTAMP is 3, change it to 9 to support nanoseconds precision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this "maximal precision of TIMESTAMP is 3" mean? Is that a typo?

case TIMESTAMP_WITHOUT_TIME_ZONE => DataTypes.BIGINT
case TIMESTAMP_WITHOUT_TIME_ZONE =>
val dt = argTypes(0).asInstanceOf[TimestampType]
DataTypes.TIMESTAMP(dt.getPrecision).bridgedTo(classOf[SqlTimestamp])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little performance sensitive, because it relates de/serialization.
If the precision is less than 3, we can use BIGINT to have a better performance.

Could you improve this a bit? Or create a issue and TODO for it.


testSqlApi(
"TIMESTAMP '1500-04-30 12:00:00.123456789'",
"1500-04-30 12:00:00.123456789")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent.

return relBuilder.getRexBuilder().makeTimestampLiteral(TimestampString.fromCalendarFields(
valueAsCalendar(extractValue(valueLiteral, java.sql.Timestamp.class))), 3);
TimestampType timestampType = (TimestampType) type;
Class<?> clazz = valueLiteral.getOutputDataType().getConversionClass();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use valueLiteral.getValueAs(LocalDateTime.class).

}

def generateRowtimeAccess(
def generateTimestampAccess(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the method name? I think the original method describe the logic more accurate.
At the first glance at the new method name, I though it is used to access timestamp field from a row or record.

@docete
Copy link
Contributor Author

docete commented Nov 20, 2019

close this since it's splitted to two PRs

@docete docete closed this Nov 20, 2019
@docete docete deleted the FLINK-14599 branch November 20, 2019 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants