Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-4565] [table] Support for SQL IN operator #4404

Closed
wants to merge 4 commits into from

Conversation

twalthr
Copy link
Contributor

@twalthr twalthr commented Jul 26, 2017

What is the purpose of the change

This PR adds support for the IN operator in both SQL and Table API. Both for testing in a set of elements f0.in(1, 2, 3) as well as in sub-query f0.in(table)

Brief change log

Various changes in plan translation, code generation, and API.

Verifying this change

This change added tests and can be verified as follows:

  • Unit tests: org.apache.flink.table.expressions.ScalarOperatorsTest
  • Unit tests: org.apache.flink.table.expressions.validation.ScalarOperatorsValidationTest
  • Integration tests: org.apache.flink.table.runtime.batch.sql.SetOperatorsITCase
  • Integration tests: org.apache.flink.table.runtime.batch.table.SetOperatorsITCase

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? not documented

Copy link
Contributor

@fhueske fhueske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @twalthr, the change itself looks very good. I had just one minor comment about TableReference.

I would change the tests from runtime tests to plan tests though. The change does not add a new operator, just expressions (which are separately tested) and translation to regular (tested) joins. I think we also need tests to validate the equivalence of expression and string Table API.

It would be good to extend the documentation as well.

What do you think?
Fabian

@@ -137,6 +133,72 @@ class ScalarOperatorsTest extends ScalarOperatorsTestBase {
}

@Test
def testIn(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add tests for expressions that result in false?

@@ -138,6 +138,25 @@ case class WindowReference(name: String, tpe: Option[TypeInformation[_]] = None)
override def toString: String = s"'$name"
}

case class TableReference(name: String, table: Table) extends Attribute {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableReference shouldn't extend Attribute IMO.
We can make it extend LeafExpression with NamedExpression.

@@ -225,4 +227,62 @@ class SetOperatorsITCase(
val expected = "2,1,Hi\n" + "3,2,Hello\n" + "4,2,Hello world\n"
TestBaseUtils.compareResultAsText(results.asJava, expected)
}

@Test
def testInWithFilter(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add stringexpr tests to validate equivalence of Scala expression and Java String Table API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These runtime tests could also be replaced by plan tests, IMO.

@@ -260,4 +260,108 @@ class SetOperatorsITCase(
val results = result.toDataSet[Row].collect()
TestBaseUtils.compareResultAsText(results.asJava, expected)
}

@Test
def testInWithFilter(): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate the feature with plan tests (TableTestBase) rather than runtime tests?
The feature does not require new operators, just expressions which are tested by dedicated tests and joins which are also tested.

@@ -17,9 +17,9 @@
*/
package org.apache.flink.api.scala.util

import java.sql.{Date, Time, Timestamp}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unused imports

import scala.collection.mutable
import scala.util.Random

object BatchTestData {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed if we switch to plan tests.

@twalthr
Copy link
Contributor Author

twalthr commented Aug 4, 2017

Thanks for reviewing @fhueske. I addressed your feedback and added documentation. I will merge this now...

@asfgit asfgit closed this in 4f776a2 Aug 4, 2017
<span class="label label-primary">Batch</span>
</td>
<td>
Returns true if an expression exists in a given table sub-query. The sub-query table must consist of one column. This column must have the same data type as the expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the syntax be enhanced so that user can specify one column in the table with multiple columns ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "enhanced"? We cannot change the SQL standard. The user can nest another query and reduce the table to one column.

}
(expression.resultType, tableOutput.head.resultType) match {
case (lType, rType) if isNumeric(lType) && isNumeric(rType) => ValidationSuccess
case (lType, rType) if lType == rType => ValidationSuccess
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved ahead of the isNumeric() check since this check is cheaper ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, yes we could do it. Feel free to submit a patch for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I open a new JIRA ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a hotfix is enough. This part is not performance critical anyway, because it happens before runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #4493

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twalthr:
Can you take a look at my PR ?

@tedyu
Copy link
Contributor

tedyu commented Aug 9, 2017

@fhueske
Can you take a look at #4493 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants