Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOTFIX] Support both listfile() and listfile(maxCount) in InsertStag… #3621

Closed
wants to merge 1 commit into from

Conversation

marchpure
Copy link
Contributor

…e flow

Why is this PR needed?

Flink writes files to obs with setting the filename in order of time, when loading stages files to carbondata, files are list and loaded with the parameter "maxcount" which will return "maxcount" files with smaller filename, in the other words, "maxcount" files with earliest generation time. but in same senario, it is possible that the filename of stage files is not in order of time, a switch is used here to judge whether to list files with specify batch size, or just list all files in the stage dir.

What changes were proposed in this PR?

A swith "CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME" is added.

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • No

…e flow

Why is this PR needed?
Flink write files to obs with setting the filename in order of time, when loading stages files to carbondata, files are list and loaded with the parameter "maxcount" which will return "maxcount" files with smaller filename, in the other words, "maxcount" files with earliest generation time. but in same senario, it is possible that the filename of stage files is not in order of time, a switch is used here to judge whether to list files with specify batch size, or just list all files in the stage dir.

What changes were proposed in this PR?
A swith "CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME" is added.

Does this PR introduce any user interface change?
No

Is any new testcase added?
No
@CarbonDataQA1
Copy link

Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.4/296/

@CarbonDataQA1
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2000/

// an approximate value. For local files, as can it can we not guarantee the order, we
// just list all.
val allFiles = dir.listFiles(false, batchSize * 2)
// It is possible that the filename of stage files is not in order of time,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain in which scenario it won't be ordered of time ?

val allFiles = dir.listFiles(false, batchSize * 2)
// It is possible that the filename of stage files is not in order of time,
// A switch is used here to judge whether to list files with specify batch size
val CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME = CarbonProperties.getInstance().getProperty(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need validation, toBoolean may fail if user sets a non-boolean value say 123 by mistake.

val allFiles = dir.listFiles(false, batchSize * 2)
// It is possible that the filename of stage files is not in order of time,
// A switch is used here to judge whether to list files with specify batch size
val CARBON_STAGE_FILENAME_IS_IN_ORDER_OF_TIME = CarbonProperties.getInstance().getProperty(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change variable name to isSortedFileNames

@@ -1593,6 +1593,14 @@ private CarbonCommonConstants() {

public static final String SUPPORT_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE = "false";

// Whether the filename of stage files is in order of time
@CarbonProperty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update document also for any new carbon property added

@@ -1593,6 +1593,14 @@ private CarbonCommonConstants() {

public static final String SUPPORT_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE = "false";

// Whether the filename of stage files is in order of time
@CarbonProperty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it dynamic configurable

@kunal642
Copy link
Contributor

@marchpure Please rebase and fix the comments

@marchpure marchpure closed this Jun 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants