Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-8868] [table] Support Table Function as Table Source for Stream Sql #6574

Closed
wants to merge 1 commit into from

Conversation

Xpray
Copy link
Contributor

@Xpray Xpray commented Aug 17, 2018

What is the purpose of the change

Support Table Function as Table source for Stream Sql

TableFunction might produce infinite records, hence the support for batch sql should be discussed.

Brief change log

  • Add new DataStreamTableFunctionScan

Verifying this change

This change added tests and can be verified as follows:
new test cases in:
org.apache.flink.table.runtime.stream.sql.SqlITCase

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? no

@Xpray Xpray changed the title [FLINK-8688] [table] Support Table Function as Table for Stream Sql [FLINK-8688] [table] Support Table Function as Table Source for Stream Sql Aug 17, 2018
@Xpray Xpray force-pushed the FLINK-8688 branch 2 times, most recently from 998834d to feaf163 Compare August 17, 2018 06:13
@alpinegizmo
Copy link
Contributor

FLINK-8688 is "Enable distinct aggregation for data stream on Table/SQL API", which doesn't seem related to this PR.

@Xpray Xpray changed the title [FLINK-8688] [table] Support Table Function as Table Source for Stream Sql [FLINK-8868] [table] Support Table Function as Table Source for Stream Sql Aug 17, 2018
@Xpray
Copy link
Contributor Author

Xpray commented Aug 17, 2018

It should be FLINK-8868, I'll fix this.

@Xpray Xpray force-pushed the FLINK-8688 branch 2 times, most recently from e6b2fcf to 94f7da2 Compare August 17, 2018 14:50
Copy link
Contributor

@pnowojski pnowojski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I left couple of comments in the code.

val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)
StreamITCase.clear
tEnv.registerFunction("udtf", new TableFunc2WithBase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this TableFunc2WithBase here? Couldn't we use TableFunc2? What does TableFunc2WithBase give us?

@@ -897,6 +897,45 @@ class SqlITCase extends StreamingWithStateTestBase {

assertEquals(List(expected.toString()), StreamITCase.testResults.sorted)
}

@Test
def tableFunctionAsSource(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file is already too big and at least this new tests should be put into something like TableFunctionITCase.

val scan: FlinkLogicalTableFunctionScan = rel.asInstanceOf[FlinkLogicalTableFunctionScan]
val traitSet = rel.getTraitSet.replace(FlinkConventions.DATASTREAM)

new DataStreamTableFunctionScan(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this conversion happen always? How does it play along with for example LogicalCorrelate nodes, DataStreamCorrelateRule and org.apache.flink.table.api.stream.sql.CorrelateTest?

Does it work because if vulcano planer picks this rule in case of LogicalCorrelate it is later unable to convert LogicalCorrelate to DataStreamCorrelate and thus retracts from DataStreamTableFunctionScanRule?

Copy link
Contributor Author

@Xpray Xpray Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be two cases that TableFunction acts like a Source, one is 'Select * from Lateral Table(udtf())', the other is that the left table do not correlate with the right table so there's no LogicalCorrelate node.

@@ -897,6 +897,45 @@ class SqlITCase extends StreamingWithStateTestBase {

assertEquals(List(expected.toString()), StreamITCase.testResults.sorted)
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for table API as well. I would expect there to have this error:

org.apache.flink.table.api.ValidationException: Cannot translate a query with an unbounded table function call.

	at org.apache.flink.table.api.Table.getRelNode(table.scala:94)
	at org.apache.flink.table.utils.StreamTableTestUtil.printTable(TableTestBase.scala:321)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a test case for TableAPI and a ValidationException with "TableFunction can only be used in join and leftOuterJoin" will be thrown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant the other way around. We should try to fix this for table api. By saying:

I would expect there to have this error:

I didn't mean that "I would like to have test asserting this validation exception", but "I think you missed testing this feature on table API and it probably will fail there with validation exception"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnowojski , I think it's better to support SQL only for this time. TableAPI needs more effort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the problem with Table API here? I had a suspicion that this ValidationException:

  def getRelNode: RelNode = if (containsUnboundedUDTFCall(logicalPlan)) {
    throw new ValidationException("Cannot translate a query with an unbounded table function call.")
  } else {
    logicalPlan.toRelNode(relBuilder)
  }

is being thrown mostly as a precaution, since previously there was no execution code to support it. Now (with this PR) that will not be the case anymore. What would happen if we simply removed it?

|
| }
|
| class ${collectorName} implements ${classOf[Collector[_]].getCanonicalName} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this duplicated with org.apache.flink.table.plan.nodes.CommonCorrelate#generateCollector?


val functionCode =
s"""
|public class $funcName extends ${classOf[RichSourceFunction[_]].getCanonicalName} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And doesn't this share a lot of code with org.apache.flink.table.plan.nodes.CommonCorrelate#generateFunction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do some refactor here.

outputType: TypeInformation[T])
extends CodeGenerator(config, false, new RowTypeInfo(), None, None) {

def generateSourceFunction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for asking maybe a stupid question. Why do we need to generate so much code? Should most of the code here (all even all of it) be in standard scala/java classes and the only thing that should be generate is a RexCall as as an implementation of some interface, that should be used by some concrete java/scala class?

In other words, generating implementation of RichSourceFunction seams like a bit an overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the UDTF might produces infinite records and can do some initialization, which makes it look like a real source.

@pnowojski
Copy link
Contributor

@twalthr could you take a look at this one? Especially at the code generation parts since I have little to no experience in this regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants