-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-17135][python][tests] Fix the test testPandasFunctionMixedWithGeneralPythonFunction to make it more stable #11771
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit d5143e5 (Thu Apr 16 09:16:39 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
The blink planner tests have passed in my personal azure pipelines: https://dev.azure.com/dianfu/58b46dfa-b96b-4b3f-a321-ce7644c4503b/_apis/build/builds/29/logs/108 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any comments to the changed code. I don't see though how it can fix the described issue. Can you describe how it fixes anything? I am sure this will be helpful for anyone trying to understand the change in the future. I think either in the commit message or even here in the PR would be fine. Thanks. |
It's a good idea to add description in the PR commit message and will do that. For this issue itself, I have already explained in the JIRA. Let me try to explain it in the PR again: |
I understood the underlying issue. I don't get though how this PR works the issue around. How do the adjustments make it less probable? |
There are two caches in RelDataTypeFactoryImpl: KEY2TYPE_CACHE and DATATYPE_CACHE. KEY2TYPE_CACHE caches the mapping of Key(consists of field names and field types, etc) to RelDataType and can be used for the canonization of row types per my understanding. DATATYPE_CACHE caches the RelDataType instances. PythonCalcSplitRule will split a Calc RelNode which contains both non-vectorized Python UDF and vectorized Python UDF into two Calc RelNodes. For the failure test case, the output type of the bottom Calc consists of two fields (f0: INTEGER, f1: INTEGER), let's call it row_type_0. This row type is already available in the cache (generated by other test cases, it's held in variable KEY2TYPE_CACHE) and so it will hit the cache when constructing this row type. However, during debugging, I found that the INTEGER type referenced by row_type_0 is already cleaned up from the cache DATATYPE_CACHE. Then when constructing the RexProgram for the top Calc, it creates another INTEGER type and failure happens. To work around this problem, we adjust the test case a bit to make the output row type of the bottom Calc consisting of three fields instead of two fields to make the cache hit fail. It seems a little hack, however, it did could solve this problem. I'm glad to try if there is more elegant way to address this problem wholly in Flink which could avoid this problem thoroughly. Do you have any suggestions? Glad to hear! |
May be we can revert the change of calcite-3149 in Flink? Does that makes sense for you? |
Thanks for the explanation @dianfu. Really helpful. Now I understand the change. Personally I don't have a better idea from the top of my head. Let's see what @danny0405 comes up with. +1 from my side. |
…GeneralPythonFunction to make it more stable There are two caches in RelDataTypeFactoryImpl: KEY2TYPE_CACHE and DATATYPE_CACHE. KEY2TYPE_CACHE caches the mapping of Key(consists of field names and field types, etc) to RelDataType and can be used for the canonization of row types. DATATYPE_CACHE caches the RelDataType instances. PythonCalcSplitRule will split a Calc RelNode which contains both non-vectorized Python UDF and vectorized Python UDF into two Calc RelNodes. For the test case testPandasFunctionMixedWithGeneralPythonFunction, the output type of the bottom Calc consists of two fields (f0: INTEGER, f1: INTEGER), let's call it row_type_0. This row type is already available in the cache (generated by other test cases, held in variable KEY2TYPE_CACHE) and so it will hit the cache when constructing this row type for the bottle Calc. However, during debugging, I found that the INTEGER type referenced by row_type_0 is already cleaned up from the cache DATATYPE_CACHE. Then when constructing the RexProgram for the top Calc, it creates another INTEGER type and failure happens. To work around this problem, we adjust the test case a bit to make the output row type of the bottom Calc consisting of three fields instead of two fields to make the cache hit fail.
Not sure why the tests still not triggered in PR. It has already passed in my private travis and azure pipelines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the issue is fixed if there is another CALC
with same type of the changed CALC
, maybe we should LOG an TODO and issue ID here and revert the change if we fix that from the CALCITE side.
@danny0405 Thanks a lot for your comments. Personally I think there is no need to revert this change even after the issue is fixed in calcite. The changed test case is to test the functionality of PythonCalcSplitRule and it's only updated a bit to work around the calcite bug, however, the test case itself is still a valid test case for its test purpose. |
Thanks, i think it is ready for merge. |
@dawidwys @danny0405 Thanks a lot for your double-check on this. Merging... |
What is the purpose of the change
This pull request changes the test testPandasFunctionMixedWithGeneralPythonFunction a bit to work around the bug introduced in CALCITE-3149.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation