Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-11975][table-planner-blink] Support to run a sql query : 'SELECT * FROM (VALUES (1, 2, 3)) T(a, b, c)' #8035

Merged
merged 3 commits into from Mar 26, 2019

Conversation

beyond1920
Copy link
Contributor

What is the purpose of the change

Support to run a sql query : 'SELECT * FROM (VALUES (1, 2, 3)) T(a, b, c)'

Brief change log

  • add writeToSink, translateNodeDag in batch and stream tableEnv
  • introduce SinkRules for batch and stream
  • Introduce subclass of TableSink, including BaseUpsertStreamTableSink, BatchTableSink, CollectTableSink, DataStreamTableSink
  • StreamExecSink/BatchExecSink implements ExecNode interface
  • StramExecValues/BatchExecValues implements ExecNode interface, add CodeGen for Values.
  • add Itcase test infrastructure, add Itcase to run SELECT * FROM (VALUES (1, 2, 3)) T(a, b, c)' for batch and stream

Verifying this change

ITCase

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (JavaDocs)

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @beyond1920 for your pull request, I left some comments.

*
* @tparam T Type of records that this [[TableSink]] expects and supports.
*/
trait BaseUpsertStreamTableSink[T] extends StreamTableSink[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify it to java Interface?

*
* @tparam T Type of [[DataStream]] that this [[TableSink]] expects and supports.
*/
trait BatchTableSink[T] extends TableSink[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java interface?

*
* @tparam T Type of [[DataStream]] that this [[TableSink]] expects and supports.
*/
trait AppendStreamTableSink[T] extends StreamTableSink[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java interface?

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR! I think the codes need some more refinements, left some comments

/**
* A {@link TimeConvertUtils} is used to convert Time Type.
*/
public class TimeConvertUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could keep this in table-planner-blink. And a further question is, do we really need this? What's the differences between this utility and DateTimeUtils?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeConvertUtils is to fix TimeZone problems, which is refactor in FLINK-11898, So temporally to throw an Exception here.

/**
* Returns [[DamBehavior]] of this node.
*/
override def getDamBehavior: DamBehavior = DamBehavior.PIPELINED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PIPELINED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?? I don't catch your meaning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of PIPELINED in dam behavior is that the operator will not cache any data and all records will pass through in a pipelined fashion. For sink operator, the records will not pass through it, right? To me, the behavior is more like to be FULL_DAM, except the sink operator will not return or output any record after it consumes all the input data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DamBehavior of sink is never used actually, so it's value will not effect schedule or deadlock breaking. However, Full_DAM is a more appropriate choice in this point of view, I will update it.

@beyond1920
Copy link
Contributor Author

@JingsongLi thanks for your comments. All done, except one minor question, TableSink interfaces still in Scala instead of Java, could do it later with all sinks and sources.

@beyond1920
Copy link
Contributor Author

@KurtYoung Thanks for your comments. All done, except two minor question:

  1. BatchCompatibleStreamTableSink could be introduced later when we really need it?
  2. override def getDamBehavior: DamBehavior = DamBehavior.PIPELINED I don't understand your comment here.

@beyond1920 beyond1920 force-pushed the flink-11975 branch 3 times, most recently from 0b90cfc to b2f2f20 Compare March 25, 2019 11:06
@beyond1920 beyond1920 force-pushed the flink-11975 branch 5 times, most recently from cb3f443 to 32a1fb3 Compare March 26, 2019 05:32
@JingsongLi
Copy link
Contributor

+1 LGTM

2. Rename BaseUpsertStreamTableSink, BaseRetractStreamTableSink
3. Update codeStyle
4. other minor update based on comments
@beyond1920 beyond1920 force-pushed the flink-11975 branch 2 times, most recently from ea18552 to c6ce3fe Compare March 26, 2019 09:09
2. Move valuesInputFormat to values subpackage
3. Minor update in StreamExecSink and BatchExecSink
@KurtYoung KurtYoung merged commit 6333fc7 into apache:master Mar 26, 2019
HuangZhenQiu pushed a commit to HuangZhenQiu/flink that referenced this pull request Apr 22, 2019
sunhaibotb pushed a commit to sunhaibotb/flink that referenced this pull request May 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants