Skip to content

Conversation

@dawidwys
Copy link
Contributor

What is the purpose of the change

Ports PROCTIME & ROWTIME functions to the new inference stack.

Verifying this change

Added tests for the input type strategy

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Oct 31, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Comment on lines +663 to +669
TestSpec.forStrategy(
"ROWTIME type strategy on proctime indicator",
SpecificInputTypeStrategies.windowTimeIndicator(
TimestampKind.ROWTIME))
.calledWithArgumentTypes(timeIndicatorType(TimestampKind.PROCTIME))
.expectErrorMessage(
"A proctime window cannot provide a rowtime attribute."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should we add a test to cover the other direction? E.g. using the proctime strategy on a rowtime indicator? (Such a test passes. I'll admit that I don't quite understand how to mix and match the two.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I meant by "timeindicators are tricky".

The thing is the validation in the API layer is not perfect. It has already lots of false positives at different locations. By false positives I mean we can not fully trust if a column is or is not a proctime. That's some general background.

Now to the ROWTIME/PROCTIME. Rowtime strictly requires Watermarks to be working. Watermarks are only present if you do a rowtime window. PROCTIME on the other hand tells the time when a record was processed. This theoretically can always be returned. It could be returned both for rowtime and proctime windows.

I can add a test for that.


if (!LogicalTypeChecks.canBeTimeAttributeType(type) && !type.is(LogicalTypeRoot.BIGINT)) {
return callContext.fail(
throwOnFailure, "Reference to a rowtime or proctime window required.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case which covers this branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests for that

TimeIndicatorTypeInfo.ROWTIME_INDICATOR
case WindowReference(_, Some(tpe)) if tpe == Types.LONG || tpe == Types.SQL_TIMESTAMP =>
// batch time window
Types.SQL_TIMESTAMP
Copy link
Contributor

@jnh5y jnh5y Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test need to return an SQL_TIMESTAMP?

                TestSpec.forStrategy(
                                "ROWTIME type strategy on long in batch mode",
                                SpecificInputTypeStrategies.windowTimeIndicator(
                                        TimestampKind.ROWTIME))
                        .calledWithArgumentTypes(DataTypes.BIGINT())
                        .expectArgumentTypes(DataTypes.BIGINT()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test you posted covers the input strategy not output. However you're right, it should return TIMESTAMP(3) (SQL_TIMESTAMP is a legacy type). I updated the output strategy for ROWTIME

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding, we do not need to update the output strategy for proctime since its type is constant, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do need to update it for PROCTIME, because we always need to return PROCTIME time indicator.

I thought having timestampKind == TimestampKind.PROCTIME && !LogicalTypeChecks.isTimeAttribute(type) for input is enough, but it passes also for ROWTIME time indicator.

I'll update the output type strategy for PROCTIME

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Glad I asked, and good job finding the corner/edge case.

@dawidwys dawidwys force-pushed the flink33419 branch 2 times, most recently from 0e1e0a9 to 71a13af Compare November 9, 2023 15:34
Copy link
Contributor

@jnh5y jnh5y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments have been addressed! LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants