Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-9344] [table] Support INTERSECT and INTERSECT ALL for streaming #5998

Closed
wants to merge 1 commit into from

Conversation

Xpray
Copy link
Contributor

@Xpray Xpray commented May 12, 2018

[FLINK-9344] [TableAPI & SQL] Support INTERSECT and INTERSECT ALL for streaming

What is the purpose of the change

Support Intersect and Intersect All for Streaming SQL and TableAPI

Brief change log

  • implemented NonWindowIntersect

Verifying this change

This change added tests and can be verified as follows:
cases of intersect operations in both org.apache.flink.table.runtime.stream.sql.SetOperatorsITCase and
org.apache.flink.table.runtime.stream.table.SetOperatorsITCase

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not yet, but will be documented in next issue

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Xpray for the contribution. It looks pretty good! I just left a few comment and questions.

I guess I am confused by the JIRA ticket description as it doesn't specify whether you are supporting unbounded intersect or windowed intersect, or both.
I guess a brief description would be very helpful here, for other reviewers as well.

--
Rong

}

override def toString: String = {
s"Intersect$intersectType(intersect$intersectType: ($intersectSelectionToString))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s"Intersect$intersectType(intersect: ($intersectSelectionToString))"
I dont think you need to duplicate the type twice

with DataStreamRel {

private lazy val intersectType = if (all) {
"All"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" All" might be better formatting since you only attached this to the explainTerm and toString method

import org.apache.flink.types.Row
import org.apache.flink.util.Collector

class StreamIntersectCoProcessFunction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing JavaDoc

resultType: TypeInformation[Row],
queryConfig: StreamQueryConfig,
all: Boolean)
extends CoProcessFunction[CRow, CRow, CRow]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I am confused here:

There's CoGroupedStream with customized CoGroupFunction which is already supported in DataStream API. seems like if we operate on a windowed stream, we can apply the intersect as a CoGroupFunction. Is this function solely targeting the non-windowed intersect case. If so, can we rename the function (also adds to my point: please add Java Doc).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to have two implementations of this operator.

  1. For tables with a time attribute. This implementation works without retraction and can automatically cleanup the state.
  2. For tables without time attributes. This implementation needs to cleanup state based on retention time and produces retractions.

This PR seems to address both cases, which is fine for now. We can improve for 1. later on. Both cases should be implemented as CoProcessFunction. We should try to be independent of the DataStream window operators, IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @walterddr and @fhueske , This PR intends to support NonWindow intersect just like NonWindow innerJoin.

}
}

private def expireOutTimeRow(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe any of your test triggers this code path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider overriding the queryConfig for triggering this perhaps


validateEqualsHashCode("intersect", resultType)

// state to hold left stream element
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is misleading, you are not actually holding the "row" of stream element if I understand correctly.

@Xpray Xpray force-pushed the FLINK-9344 branch 4 times, most recently from bb72574 to 492f7b6 Compare May 25, 2018 15:55
@Xpray
Copy link
Contributor Author

Xpray commented May 25, 2018

Thanks for the review @walterddr @fhueske , I've updated the PR.

@Xpray
Copy link
Contributor Author

Xpray commented Jul 30, 2018

@fhueske , I would like to support minus/minus All after this issue, would you give some suggestion about this issue?

@twalthr twalthr changed the title [FLINK-9344] [TableAPI & SQL] Support INTERSECT and INTERSECT ALL for streaming [FLINK-9344] [table] Support INTERSECT and INTERSECT ALL for streaming Jul 31, 2018
@KurtYoung KurtYoung closed this Jan 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants