-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-17936][table] Introduce new type inference for AS #12331
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 48b24f4 (Fri Oct 16 10:56:06 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR. Nice to see more PlannerExpressions
being removed. I added some comments.
.map(newDataType -> { | ||
final Class<?> clazz = actualDataType.getConversionClass(); | ||
final LogicalType newType = newDataType.getLogicalType(); | ||
if (newType.supportsInputConversion(clazz) || newType.supportsOutputConversion(clazz)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check only the supportsOutputConversion
? UDF arguments are kind of "outputs" not "inputs".
// we don't know where the precision occurs (before or after the dot) | ||
return DataTypes.DECIMAL(precision * 2, precision); | ||
} | ||
return DataTypes.DECIMAL(DecimalType.MIN_PRECISION, DecimalType.MIN_SCALE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not default precision and scale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I improved the comment for an explanation.
@@ -33,6 +33,8 @@ | |||
@PublicEvolving | |||
public final class DoubleType extends LogicalType { | |||
|
|||
public static final int PRECISION = 15; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be 16? https://en.wikipedia.org/wiki/Double-precision_floating-point_format
With the 52 bits of the fraction (F) significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log10(2) ≈ 15.955).
static { | ||
// commonly used type roots for families | ||
familyToRoot.put(LogicalTypeFamily.NUMERIC, LogicalTypeRoot.INTEGER); | ||
familyToRoot.put(LogicalTypeFamily.EXACT_NUMERIC, LogicalTypeRoot.INTEGER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you decide for INTEGER
for NUMERIC
, EXACT_NUMERIC
and BINARY
for BINARY_STRING
? Shouldn't we use the type with the highest precision? That's what Calcite does. Calcite uses:
DECIMAL(MAX_PRECISION, MAX_SCALE)
forNUMERIC
,EXACT_NUMERIC
VARBINARY
forBINARY_STRING
See org.apache.calcite.sql.type.SqlTypeFamily#getDefaultConcreteType
* | ||
* <p>This method is shared with {@link FamilyArgumentTypeStrategy}. | ||
*/ | ||
static Optional<DataType> findDataType( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could we moved that to a helper class in the strategies
package? It looks a bit counter-intuitive that an unrelated(FamilyArgumentTypeStrategy
does not extend from this class) class uses this method.
DataTypes.VARCHAR(1), | ||
DataTypes.DECIMAL(10, 0), | ||
DataTypes.DECIMAL(30, 15), | ||
DataTypes.BOOLEAN(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we preserve the nullability? Right now there is no way to say that we accept both. Therefore e.g. we loose the nullability info in such case:
.inputTypeStrategy(logical(LogicalTypeRoot.BOOLEAN))
.outputTypeStrategy(TypeStrategies.argument(0))
I would suggest either:
- if the
expectedNullability
inFamilyArgumentTypeStrategy
/RootArgumentTypeStrategy
is false forward the nullability of the input argument - introduce three state value -> expect nullable, not null, both
"Unsupported argument type. Expected nullable type of root 'VARCHAR' but actual type was 'VARCHAR(5)'."), | ||
|
||
TestSpec | ||
.forStrategy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add tests for invalid cases? E.g. Using FLOAT
with EXACT_NUMERIC
.
sequence( | ||
logical(LogicalTypeFamily.CHARACTER_STRING), | ||
logical(LogicalTypeFamily.EXACT_NUMERIC), | ||
logical(LogicalTypeFamily.APPROXIMATE_NUMERIC), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a case with e.g. BIGINT
used with APPROXIMATE_NUMERIC
family? This should work due to the implicit casts right?
@@ -102,14 +102,6 @@ class CalcValidationTest extends TableTestBase { | |||
case _: ValidationException => //ignore | |||
} | |||
|
|||
try { | |||
util.addTable[(Int, Long, String)]("Table2") | |||
.select('_1 as '*, '_2 as 'b, '_1 as 'c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove this test? Is the field reference a problem here? Can we just change it to strings?
.select('_1 as "*", '_2 as "b", '_1 as "c")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is very unlikely case that is not worth to have a separate input type strategy for.
.map(newDataType -> { | ||
final Class<?> clazz = actualDataType.getConversionClass(); | ||
final LogicalType newType = newDataType.getLogicalType(); | ||
if (newType.supportsInputConversion(clazz)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be OutputConversionClass
? Arguments of a UDF are "outputs` of the table ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely!
} else if (Objects.equals(expectedNullability, Boolean.FALSE)) { | ||
return newDataType.notNull(); | ||
} | ||
return newDataType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the nullability of the actualDataType
here? Otherwise expectedNullability = null
is equivalent to expectedNullability = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the problem with this test case:
TestSpec
.forStrategy(
"...",
logical(LogicalTypeFamily.APPROXIMATE_NUMERIC))
.calledWithArgumentTypes(
DataTypes.BIGINT().notNull())
.expectArgumentTypes(
DataTypes.DOUBLE() // should be NOT NULL, it is nullable for now
),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. LGTM
Introduces the last missing pieces required to start porting the other expressions for a consistent API behavior. This closes #12331.
Introduces the last missing pieces required to start porting the other expressions for a consistent API behavior. This closes apache#12331.
What is the purpose of the change
Implements a new type inference for AS. The PR contains the last missing pieces required to start porting the other expressions for a consistent API behavior.
Brief change log
See commit messages.
Verifying this change
This change is already covered by existing tests. Tests have been added to
InputTypeStrategiesTest
.Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation