-
Notifications
You must be signed in to change notification settings - Fork 13.9k
[FLINK-12200] [Table-planner] Support UNNEST for MAP types #8179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Syncing the fork to the original repository
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
|
Guys! Could you please review this pull request? |
|
@KurtYoung @JingsongLi Could you please review this pull request or connect to guys who responsible for this component? Thanks in advance! |
KurtYoung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, I left some comments. BTW, i noticed there are some format issues and i'm not sure i pointed out all of them, please check it.
| case map: MapRelDataType => | ||
| val keyTypeInfo = FlinkTypeFactory.toTypeInfo(map.keyType) | ||
| val valueTypeInfo = FlinkTypeFactory.toTypeInfo(map.valueType) | ||
| val componentTypeInfo = createTuple2TypeInformation(keyTypeInfo,valueTypeInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more appropriate for us to return RowType here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after comma
Fixed
Is it more appropriate for us to return RowType here?
I need to return typeInfo for an exploded type which will be as a field in the resultant Row. I'm going to use the following transformation form Map[K,V] into a sequence of tuples [(K,V), ...]. I need it to reuse the SQL syntaxis of mapping result on the tuple. I mean I want to support the following SQL query:
SELECT k, v FROM t1 , UNNEST(t1.c) as A (k,v)
This is the reason why I return TupleTypeInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked other system's behavior, it seems more common to unnest map field into two columns, in your case, it would be:
SELECT k, v FROM t1 , UNNEST(t1.c) as (k,v)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KurtYoung I'm sorry but in case if I use SQL like in your example I got org.apache.flink.table.api.SqlParserException in method org.apache.flink.table.calcite.FlinkPlannerImpl.parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for giving a wrong example. What i want to say is we should make built-in table functions looks more easy and explicit for framework. To this very example, you relied on some implicit functionality which Flink currently support. The first one is you replied on Flink can implicitly convert Tuple to a Row, and the second one is you don't provide any valuable type related information in your table function, all the type information only provided in the logical rule.
I'm not saying this is wrong, but i think there is another way to make all these things more accurate and explicit. For example, you can explicitly tell framework your table function is returning Row type, and give more type related information to framework. It will make these codes more robust, have less possibility break in the future. Can you give this a try? If this will involve lots of changes, i'm also ok with current version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KurtYoung I got it, thank you a lot for your explanation. I return the ROW type now instead of Tuple as was previously.
| val valueTypeInfo = FlinkTypeFactory.toTypeInfo(map.valueType) | ||
| val componentTypeInfo = createTuple2TypeInformation(keyTypeInfo,valueTypeInfo) | ||
| val componentType = cluster.getTypeFactory.asInstanceOf[FlinkTypeFactory] | ||
| .createTypeFromTypeInfo(componentTypeInfo,true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| .createTypeFromTypeInfo(componentTypeInfo,true) | ||
|
|
||
| val explodeFunction = ExplodeFunctionUtil.explodeTableFuncFromType(map.typeInfo) | ||
| (componentType , explodeFunction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space before comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| } | ||
| } | ||
|
|
||
| class MapExplodeTableFunc extends TableFunction[Object]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before brace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure making this table function returning object and let framework rely on information somewhere else to perform right is good choice. This will make the codes fragile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before brace
fixed
I'm not sure making this table function returning object and let framework rely on information somewhere else to perform right is good choice. This will make the codes fragile.
The UNNEST function should be able to explode any type of key and value types. This is the reason why I'm using Object type here. Basically, I use the same approach which used for Array types (for the arrays used the ObjectExplodeTableFunc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ObjectExplodeTableFunc is not directly used by all arrays which want to be unnested. It's only used by the array which has object array element type, .e.g. Array(object[]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments, I've fixed the TableFunction and now it returns Row
JingsongLi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, basically LGTM. just some format issues.
| class MapExplodeTableFunc extends TableFunction[Object] { | ||
|
|
||
| def eval(map: util.Map[Object, Object]): Unit = { | ||
| map.asScala.foreach{ case (key,value) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map.asScala.foreach { case (key, value)
space need here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
|
||
| def eval(map: util.Map[Object, Object]): Unit = { | ||
| map.asScala.foreach{ case (key,value) => | ||
| collect((key,value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect((key, value))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
The PR component tag should be [table-planner] instead of [Table API? |
Thank you for your note, I've fixed the tag in the title . |
|
@KurtYoung @JingsongLi Thank you guys for your feedback. I've fixed all format issues and add some answers to your questions. |
|
LGTM +1 |
What is the purpose of the change
This pull request adds support of UNNEST operator for MAP types.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
SELECT a,b,v FROM src CROSS JOIN UNNEST(c) as f (k,v)Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation