[FLINK-17936][table] Introduce new type inference for AS #12331

twalthr · 2020-05-26T06:38:01Z

What is the purpose of the change

Implements a new type inference for AS. The PR contains the last missing pieces required to start porting the other expressions for a consistent API behavior.

Brief change log

See commit messages.

Verifying this change

This change is already covered by existing tests. Tests have been added to InputTypeStrategiesTest.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? no
If yes, how is the feature documented? JavaDocs

flinkbot · 2020-05-26T06:41:13Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 48b24f4 (Fri Oct 16 10:56:06 UTC 2020)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

flinkbot · 2020-05-26T07:01:16Z

CI report:

68fc093 Azure: FAILURE
48b24f4 Azure: PENDING

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

dawidwys

Thank you for the PR. Nice to see more PlannerExpressions being removed. I added some comments.

dawidwys · 2020-05-26T07:46:11Z

...rc/main/java/org/apache/flink/table/types/inference/strategies/RootArgumentTypeStrategy.java

+			.map(newDataType -> {
+				final Class<?> clazz = actualDataType.getConversionClass();
+				final LogicalType newType = newDataType.getLogicalType();
+				if (newType.supportsInputConversion(clazz) || newType.supportsOutputConversion(clazz)) {


Shouldn't we check only the supportsOutputConversion? UDF arguments are kind of "outputs" not "inputs".

dawidwys · 2020-05-26T07:49:19Z

...rc/main/java/org/apache/flink/table/types/inference/strategies/RootArgumentTypeStrategy.java

+					// we don't know where the precision occurs (before or after the dot)
+					return DataTypes.DECIMAL(precision * 2, precision);
+				}
+				return DataTypes.DECIMAL(DecimalType.MIN_PRECISION, DecimalType.MIN_SCALE);


Why not default precision and scale?

I improved the comment for an explanation.

dawidwys · 2020-05-26T08:28:09Z

...-table/flink-table-common/src/main/java/org/apache/flink/table/types/logical/DoubleType.java

@@ -33,6 +33,8 @@
 @PublicEvolving
 public final class DoubleType extends LogicalType {

+	public static final int PRECISION = 15;


Shouldn't it be 16? https://en.wikipedia.org/wiki/Double-precision_floating-point_format

With the 52 bits of the fraction (F) significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log10(2) ≈ 15.955).

dawidwys · 2020-05-26T08:44:45Z

.../main/java/org/apache/flink/table/types/inference/strategies/FamilyArgumentTypeStrategy.java

+	static {
+		// commonly used type roots for families
+		familyToRoot.put(LogicalTypeFamily.NUMERIC, LogicalTypeRoot.INTEGER);
+		familyToRoot.put(LogicalTypeFamily.EXACT_NUMERIC, LogicalTypeRoot.INTEGER);


Why did you decide for INTEGER for NUMERIC, EXACT_NUMERIC and BINARY for BINARY_STRING? Shouldn't we use the type with the highest precision? That's what Calcite does. Calcite uses:

DECIMAL(MAX_PRECISION, MAX_SCALE) for NUMERIC, EXACT_NUMERIC

VARBINARY for BINARY_STRING

See org.apache.calcite.sql.type.SqlTypeFamily#getDefaultConcreteType

dawidwys · 2020-05-26T09:00:17Z

...rc/main/java/org/apache/flink/table/types/inference/strategies/RootArgumentTypeStrategy.java

+	 *
+	 * <p>This method is shared with {@link FamilyArgumentTypeStrategy}.
+	 */
+	static Optional<DataType> findDataType(


nit: Could we moved that to a helper class in the strategies package? It looks a bit counter-intuitive that an unrelated(FamilyArgumentTypeStrategy does not extend from this class) class uses this method.

dawidwys · 2020-05-26T09:10:40Z

...ble-common/src/test/java/org/apache/flink/table/types/inference/InputTypeStrategiesTest.java

+					DataTypes.VARCHAR(1),
+					DataTypes.DECIMAL(10, 0),
+					DataTypes.DECIMAL(30, 15),
+					DataTypes.BOOLEAN(),


How about we preserve the nullability? Right now there is no way to say that we accept both. Therefore e.g. we loose the nullability info in such case:

.inputTypeStrategy(logical(LogicalTypeRoot.BOOLEAN)) .outputTypeStrategy(TypeStrategies.argument(0))

I would suggest either:

if the expectedNullability in FamilyArgumentTypeStrategy/RootArgumentTypeStrategy is false forward the nullability of the input argument

introduce three state value -> expect nullable, not null, both

dawidwys · 2020-05-26T09:14:55Z

...ble-common/src/test/java/org/apache/flink/table/types/inference/InputTypeStrategiesTest.java

+					"Unsupported argument type. Expected nullable type of root 'VARCHAR' but actual type was 'VARCHAR(5)'."),
+
+			TestSpec
+				.forStrategy(


Can we add tests for invalid cases? E.g. Using FLOAT with EXACT_NUMERIC.

dawidwys · 2020-05-26T09:16:01Z

...ble-common/src/test/java/org/apache/flink/table/types/inference/InputTypeStrategiesTest.java

+					sequence(
+						logical(LogicalTypeFamily.CHARACTER_STRING),
+						logical(LogicalTypeFamily.EXACT_NUMERIC),
+						logical(LogicalTypeFamily.APPROXIMATE_NUMERIC),


Can we add a case with e.g. BIGINT used with APPROXIMATE_NUMERIC family? This should work due to the implicit casts right?

dawidwys · 2020-05-26T09:19:52Z

...er/src/test/scala/org/apache/flink/table/api/batch/table/validation/CalcValidationTest.scala

@@ -102,14 +102,6 @@ class CalcValidationTest extends TableTestBase {
      case _: ValidationException => //ignore
    }

-    try {
-      util.addTable[(Int, Long, String)]("Table2")
-      .select('_1 as '*, '_2 as 'b, '_1 as 'c)


Why did you remove this test? Is the field reference a problem here? Can we just change it to strings?

.select('_1 as "*", '_2 as "b", '_1 as "c")

It is very unlikely case that is not worth to have a separate input type strategy for.

dawidwys · 2020-05-26T12:11:25Z

...le-common/src/main/java/org/apache/flink/table/types/inference/strategies/StrategyUtils.java

+			.map(newDataType -> {
+				final Class<?> clazz = actualDataType.getConversionClass();
+				final LogicalType newType = newDataType.getLogicalType();
+				if (newType.supportsInputConversion(clazz)) {


Shouldn't it be OutputConversionClass ? Arguments of a UDF are "outputs` of the table ecosystem.

Absolutely!

dawidwys · 2020-05-26T12:17:18Z

...le-common/src/main/java/org/apache/flink/table/types/inference/strategies/StrategyUtils.java

+				} else if (Objects.equals(expectedNullability, Boolean.FALSE)) {
+					return newDataType.notNull();
+				}
+				return newDataType;


Use the nullability of the actualDataType here? Otherwise expectedNullability = null is equivalent to expectedNullability = true

You can see the problem with this test case:

TestSpec .forStrategy( "...", logical(LogicalTypeFamily.APPROXIMATE_NUMERIC)) .calledWithArgumentTypes( DataTypes.BIGINT().notNull()) .expectArgumentTypes( DataTypes.DOUBLE() // should be NOT NULL, it is nullable for now ),

dawidwys

Thanks for the update. LGTM

Introduces the last missing pieces required to start porting the other expressions for a consistent API behavior. This closes #12331.

Introduces the last missing pieces required to start porting the other expressions for a consistent API behavior. This closes apache#12331.

[hotfix][table-common] Add logical type root/family argument strategies

890e4c9

twalthr requested a review from dawidwys May 26, 2020 06:38

rmetzger added the review=description? label May 26, 2020

rmetzger added the component=TableSQL/API label May 26, 2020

dawidwys requested changes May 26, 2020

View reviewed changes

fixup

b3340f6

twalthr force-pushed the FLINK-17936 branch from 5a19d76 to 68fc093 Compare May 26, 2020 11:33

dawidwys requested changes May 26, 2020

View reviewed changes

twalthr added 2 commits May 26, 2020 14:58

fixup

fa4a7f0

[FLINK-17936][table] Introduce new type inference for AS

48b24f4

twalthr force-pushed the FLINK-17936 branch from 68fc093 to 48b24f4 Compare May 26, 2020 12:58

dawidwys approved these changes May 26, 2020

View reviewed changes

twalthr closed this in b2db9ec May 26, 2020

twalthr added a commit that referenced this pull request May 26, 2020

[FLINK-17936][table] Introduce new type inference for AS

1b9e1eb

Introduces the last missing pieces required to start porting the other expressions for a consistent API behavior. This closes #12331.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-17936][table] Introduce new type inference for AS #12331

[FLINK-17936][table] Introduce new type inference for AS #12331

twalthr commented May 26, 2020

flinkbot commented May 26, 2020 •

edited

flinkbot commented May 26, 2020 •

edited

dawidwys left a comment

dawidwys May 26, 2020

dawidwys May 26, 2020

twalthr May 26, 2020

dawidwys May 26, 2020

dawidwys May 26, 2020

dawidwys May 26, 2020

dawidwys May 26, 2020

dawidwys May 26, 2020

dawidwys May 26, 2020

dawidwys May 26, 2020

twalthr May 26, 2020

dawidwys May 26, 2020

twalthr May 26, 2020

dawidwys May 26, 2020

dawidwys May 26, 2020

dawidwys left a comment

[FLINK-17936][table] Introduce new type inference for AS #12331

[FLINK-17936][table] Introduce new type inference for AS #12331

Conversation

twalthr commented May 26, 2020

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented May 26, 2020 • edited

Automated Checks

Review Progress

flinkbot commented May 26, 2020 • edited

CI report:

dawidwys left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dawidwys left a comment

Choose a reason for hiding this comment

flinkbot commented May 26, 2020 •

edited

flinkbot commented May 26, 2020 •

edited