New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-17321][table] Add support casting of map to map and multiset to multiset #18287
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit a65fb5e (Thu Jan 06 13:06:01 UTC 2022) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick scan, and it looks good. Will do a proper review later.
Just one question, can you add one single test case for map to map and one for multiset to multiset in CastFunctionITCase
, as it tests that it works through the stack? Because I'm pretty sure it doesn't, as you need to change LogicalTypeCasts
as well.
.../org/apache/flink/table/planner/functions/casting/MapToMapAndMultisetToMultisetCastRule.java
Outdated
Show resolved
Hide resolved
I added test for map and multiset to
It seems the reason is not Line 190 in c5581b8
I will have a look at it later today |
Hey @snuyanzin have you checked this issue? There might be something helpful for you in the patch i posted there https://issues.apache.org/jira/browse/FLINK-25428. If the multiset test is a blocker for you, and there's no easy fix for that, you can ignore it now, as it's still not well supported by the whole stack, so perhaps you can just leave it the test there commented. |
@slinkydeveloper thanks for pointing the issue, I had a look however it seems it does not help with current multiset issue. |
@snuyanzin I rather prefer that the multiset e2e casting gets worked out in a separate issue, as it requires significant more testing to check that it works throughout the stack, and as you have shown in the commits, it also requires new function definitions to expose it to the user. That's something we should rather address in a separate context. As a goal for this task i would say we want:
|
ok, I removed multiset related fix from this PR and created a separate issue for that https://issues.apache.org/jira/browse/FLINK-25567 Now it should match the goal, please let me know if not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're good, but there's a mystery here worth to be investigated before merging 😄
CastTestSpecBuilder.testCastTo(MAP(STRING(), STRING())) | ||
.fromCase(MAP(FLOAT(), DOUBLE()), null, null) | ||
.fromCase( | ||
MAP(INT(), INT()), | ||
Collections.singletonMap(1, 2), | ||
Collections.singletonMap("1", "2")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I'm surprised that this works without changing anything in LogicalTypeCasts
... Can you add a test case for failure? For example:
CastTestSpecBuilder
.testCastTo(MAP(STRING(), STRING()))
.fail(MAP(STRING(), ROW(INT())), whateverValue)
Same for ROW to ROW and ARRAY to ARRAY. Just take whatever inner invalid tuple and see what exception you get. If it fails within the single CastRule
implementations, then something is wrong with LogicalTypeCasts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get, should it fail?
SELECT CAST(MAP['a', row(1)] AS MAP<STRING, STRING>);
I thought it should return something like that
{a=(1)}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not fail, the goal of this issue is to allow such casting. The thing is that I'm actually surprised it doesn't fail without touching LogicalTypeCasts
, which is the class that checks valid/invalid casts.
EDIT: I've done some debugging, LogicalTypeCasts#supportsConstructedCasting
does the job and just checks for the equality of the type root and for the castability of the children. So it works 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, for jumping in so late.
index, valueArrayTerm, innerInputValueType), | ||
"false", | ||
// Null check is done at the array access level | ||
innerInputValueType.copy(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slinkydeveloper didn't we change this behavior recently and use a new method that deals with NullType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, perhaps can you fix it when merging?
final String value = newName("value"); | ||
|
||
return new CastRuleUtils.CodeWriter() | ||
.declStmt(className(Map.class), map, constructorCall(HashMap.class)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if a HashMap
is safe to use because not all internal data structures have proper hashCode/equals
. By looking at org.apache.flink.table.planner.codegen.calls.ScalarOperatorGens#generateMap
, I also don't see any HashMap
. A safer solution sounds like reusing the array to array casting logic and apply it keyArray and valueArray? In general, we don't guarantee unique keys for maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is safe at this level, as the input map is assumed to be valid and should not have multiple values for the same key anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cast to MAP<CHAR(1), X>
could produce multiple keys. But after a offline discussion, I think it is safe to use a HashMap here and the problem is rather in the ScalarOperatorGens#generateMap
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will apply the remaining issue while merging. Thanks @snuyanzin.
case MAP_VALUE_CONSTRUCTOR | MULTISET_VALUE => | ||
generateMapOrMultiset(ctx, resultType, operands) | ||
// maps | ||
case MAP_VALUE_CONSTRUCTOR => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snuyanzin Looking forward to a PR for this commit. Maybe you can also address the comment about the value constructor in this PR (using HashMap for deduplication of key).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @twalthr , thanks for highlighting this
yes I will try to handle this
…o multiset This closes apache#18287.
What is the purpose of the change
This PR adds support of casting maps to maps and multisets to multisets
Brief change log
org.apache.flink.table.planner.functions.casting.MapToMapAndMultisetToMultisetCastRule
org.apache.flink.table.planner.functions.casting.CastRulesTest
Verifying this change
This change added tests and can be verified as follows:
CastRulesTest
that validate casting of maps to maps, multisets to multisetsDoes this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation