-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag… #3621
Conversation
…e flow Why is this PR needed? Flink write files to obs with setting the filename in order of time, when loading stages files to carbondata, files are list and loaded with the parameter "maxcount" which will return "maxcount" files with smaller filename, in the other words, "maxcount" files with earliest generation time. but in same senario, it is possible that the filename of stage files is not in order of time, a switch is used here to judge whether to list files with specify batch size, or just list all files in the stage dir. What changes were proposed in this PR? A swith "CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME" is added. Does this PR introduce any user interface change? No Is any new testcase added? No
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/296/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2000/ |
// an approximate value. For local files, as can it can we not guarantee the order, we | ||
// just list all. | ||
val allFiles = dir.listFiles(false, batchSize * 2) | ||
// It is possible that the filename of stage files is not in order of time, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain in which scenario it won't be ordered of time ?
val allFiles = dir.listFiles(false, batchSize * 2) | ||
// It is possible that the filename of stage files is not in order of time, | ||
// A switch is used here to judge whether to list files with specify batch size | ||
val CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME = CarbonProperties.getInstance().getProperty( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need validation, toBoolean may fail if user sets a non-boolean value say 123 by mistake.
val allFiles = dir.listFiles(false, batchSize * 2) | ||
// It is possible that the filename of stage files is not in order of time, | ||
// A switch is used here to judge whether to list files with specify batch size | ||
val CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME = CarbonProperties.getInstance().getProperty( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change variable name to isSortedFileNames
@@ -1593,6 +1593,14 @@ private CarbonCommonConstants() { | |||
|
|||
public static final String SUPPORT_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE = "false"; | |||
|
|||
// Whether the filename of stage files is in order of time | |||
@CarbonProperty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update document also for any new carbon property added
@@ -1593,6 +1593,14 @@ private CarbonCommonConstants() { | |||
|
|||
public static final String SUPPORT_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE = "false"; | |||
|
|||
// Whether the filename of stage files is in order of time | |||
@CarbonProperty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it dynamic configurable
@marchpure Please rebase and fix the comments |
…e flow
Why is this PR needed?
Flink writes files to obs with setting the filename in order of time, when loading stages files to carbondata, files are list and loaded with the parameter "maxcount" which will return "maxcount" files with smaller filename, in the other words, "maxcount" files with earliest generation time. but in same senario, it is possible that the filename of stage files is not in order of time, a switch is used here to judge whether to list files with specify batch size, or just list all files in the stage dir.
What changes were proposed in this PR?
A swith "CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME" is added.
Does this PR introduce any user interface change?
Is any new testcase added?