Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-6097][table] Guaranteed the order of the extracted field refer… #3560

Closed
wants to merge 1 commit into from

Conversation

sunjincheng121
Copy link
Member

…ences as input order.

Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the How To Contribute guide.
In addition to going through the list, please provide a meaningful description of your changes.

  • General

    • The pull request references the related JIRA issue ("[FLINK-6097] Guaranteed the order of the extracted field references as input order.")
    • The pull request addresses only one issue
    • Each commit in the PR has a meaningful commit message (including the JIRA id)
  • Documentation

    • Documentation has been added for new functionality
    • Old documentation affected by the pull request has been updated
    • JavaDoc for public methods has been added
  • [x ] Build

    • Functionality added by the pull request is covered by tests
    • mvn clean verify has been executed successfully locally or a Travis build has passed

}

private def identifyFieldReferences(
expr: Expression,
fieldReferences: Set[NamedExpression]): Set[NamedExpression] = expr match {
fieldReferences: List[NamedExpression]): List[NamedExpression] = expr match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using Set here gave us better intuition that we actually want to count uniq fields, any reason why you want to keep the order? Especially the order is not defined very well, it will change easily if we modify the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more about Especially the order is not defined very well, it will change easily if we modify the code.?

Copy link
Contributor

@KurtYoung KurtYoung Mar 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order is really depending on how we extract fields from all kinds of expressions. Like BinaryExpression, we first extract left child, and then right child. And for Funtion Calls, we extract the field from parameter with left to right order. More complex example will be over, imagine there is an aggregate on a partitioned window. Should the fields appeared in the aggregate or the field which partitioned on should be considered first?

IMO this kind of order is hard to define and hard to stay consistency, it will change easily when we modifying the codes. We should not rely anything on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also use a LinkedHashSet which preserves the order in which elements are inserted.

Copy link
Contributor

@KurtYoung KurtYoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sunjincheng121 , i'm curious what's purpose of this change. Can you give me an example what the order is used for?

@sunjincheng121
Copy link
Member Author

HI, @KurtYoung Thanks for your attention to this PR. Good question, Here I glad share why I notice this method:
When we try to implement OVER window TableAPI, The first version of the prototype to achieve,we do not consider the table field will be out of order when we implement translateToPlan method,then we set outputRow field from inputRow according to the Initial order of the table field index.
At the beginning, the projections in the select statement less than 5 columns It works well.But Unfortunately when the count of projections bigger than 4 (>=5), we got the random result. Then we debug the code, we find that ProjectionTranslator # identifyFieldReferences method uses theSet temporary save field, when the number of elements in the Set is less than 5, the Set takes the Se1, Se2, Se3, Se4 data structures. When the number of elements is greater than or equal to 5, the Set takes HashSet # HashTrieSet and which will cause the data to be out of order. So we thought 2 approach to solve this problem:

Let ProjectionTranslator # identifyFieldReferences method guaranteed the order of the extracted field references same as input order.
We add the input and output field mapping.
At last we using approach#2 solve the problem. This change is not necessary for the problem i have faced. But I feel it is better to let the output of this method in the same order as the input, it may be very helpful for other cases, though I am currently not aware of any. I am ok with not making this change, but we should add a comment instead to highlight that the potential output of the current output. Otherwise, some people may not pay attention to this and assume it is in order.

Thanks,
SunJincheng

@fhueske
Copy link
Contributor

fhueske commented Mar 20, 2017

Hi @sunjincheng121, thanks for explaining the issue, but I did not completely understand the problem. The ProjectionTranslator is working on Table API expressions and LogicalNodes which are translated into Calcite RelNodes and RexNodes. The RelBuilder works with names not positions, so shouldn't field positions and layout be handled by Calcite?

@fhueske
Copy link
Contributor

fhueske commented Apr 28, 2017

Can you close this PR as well @sunjincheng121?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants