Skip to content

Conversation

@fhueske
Copy link
Contributor

@fhueske fhueske commented Sep 23, 2017

What is the purpose of the change

Changes the contract of the DefinedRowtimeAttribute interface. The rowtime attribute is no longer appended to the schema of the row but marks an existing field in the input that will be handled as event time attribute. The specified field must be of type Long or Timestamp. The watermarks of DataStream must be aligned to the specified field.

Brief change log

  • rowtime fields are no longer appended but an existing Long or Timestamp field is marked as event time field
  • Add checks that projections are not pushed to TableSources that implement DefinedRowtimeAttribute or DefinedProctimeAttribute
  • Added several tests for table sources with rowtime or proctime attributes
  • Updated the documenation

Verifying this change

  • Check the added test cases

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no

Documentation

  • Documentation is updated according to the changes of the PR.

@haohui
Copy link

haohui commented Sep 24, 2017

LGTM overall +1.

One question: since we now cast ROWTIME / PROCTIME directly to LONG, I wonder, do we want to revisit the decision that creates dedicated types for ROWTIME / PROCTIME?

Copy link
Contributor

@twalthr twalthr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @fhueske. Looks very good! I had just minor comments. One last thing: Could could you also add a combined TableSource that implements batch and stream table source + defined time attributes?

#### During DataStream-to-Table Conversion

The event time attribute is defined with the `.rowtime` property during schema definition.
The event time attribute is defined with the `.rowtime` property during schema definition. Timestamps and watermarks must have been assigned in the `DataStream` that is converted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to dev/event_timestamps_watermarks again?

case idx =>
// regular attribute. Access attribute in input data type.
generateInputAccess(input1, input1Term, idx)
// get type of result field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that this is needed for TableSource?

// Change output type to rowtime indicator
if (FlinkTypeFactory.isRowtimeIndicatorType(outType) &&
(inputAccess.resultType == Types.LONG || inputAccess.resultType == Types.SQL_TIMESTAMP)) {
// Hard cast possible because LONG, TIMESTAMP, and ROW_TIMEINDICATOR are internally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROWTIME_INDICATOR

tableSource match {
case s: StreamTableSource[_] =>
StreamTableSourceTable.deriveRowTypeOfTableSource(s, flinkTypeFactory)
case b: BatchTableSource[_] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace b with _ to remove IDE warning.

val scan: FlinkLogicalTableSourceScan = call.rel(1).asInstanceOf[FlinkLogicalTableSourceScan]
scan.tableSource match {
// projection pushdown is not supported for sources that provide time indicators
case r: DefinedRowtimeAttribute if r.getRowtimeAttribute != null => false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a follow-up issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes will do, but it is not easy to solve IMO.

val original = TableEnvironment.getFieldIndices(tableSource)

// append rowtime marker
val withRowtime = if (rowtime.isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused rowtime

* Defines a name of the event-time attribute that represents Flink's
* event-time. Null if no rowtime should be available.
* Defines a name of the event-time attribute that represents Flink's event-time, i.e., an
* attribute that is aligned with the watermarks of the table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is aligned with the watermarks of the underlying DataStream?

* The method should return null if no rowtime attribute is defined.
*
* @return The name of the field that represents the event-time field and which is aligned
* with the watermarks of the table. The field must be present in the schema of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some here.

trait DefinedRowtimeAttribute {

/**
* Defines a name of the event-time attribute that represents Flink's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also update the docs of the class?

}

@Test
def testProjectableRowTimeTableSource(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does filter push down work? With rowtime and proctime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should not be a problem. Projection push-down is not possible because the schema of the table is partially constructed inside of the Table API (e.g., appending the proctime attribute).

@twalthr
Copy link
Contributor

twalthr commented Sep 25, 2017

@haohui We only cast ROWTIME / PROCTIME directly to LONG during runtime, the special types are needed during pre-flight phase and validation. We could not come up with a better solution that ensures that watermarks stay aligned with the rowtime.

@fhueske
Copy link
Contributor Author

fhueske commented Sep 26, 2017

Thanks for the review @twalthr.
I've updated the PR.

@haohui: This PR preserves the current logic that time attributes are exposed as TIMESTAMP. I agree that support for time indicators that expose themselves as Long is desirable. However, this requires quite a few changes as we need to extend several functions (incl. TUMBLE, HOP, etc.) and validation logic in some operators (over windows, joins, etc.). So this is not a lightweight change and should be done as a separate issue, IMO.

@wuchong
Copy link
Member

wuchong commented Sep 27, 2017

Looks good to me.

+1

@fhueske
Copy link
Contributor Author

fhueske commented Oct 4, 2017

Thanks @wuchong
I will merge this PR tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants