HDDS-10568. When the ldb command is executed, it is output by line#6420
HDDS-10568. When the ldb command is executed, it is output by line#6420jianghuazhu wants to merge 5 commits intoapache:masterfrom
Conversation
|
ci : Can you help review this pr, @adoroszlai @xichen01 . |
xichen01
left a comment
There was a problem hiding this comment.
Thanks for working on this, few comments to handle
| return displayTable(iterator, dbColumnFamilyDef, out, schemaV3); | ||
| while (iterator.get().isValid()) { | ||
| try (PrintWriter out = new PrintWriter(new BufferedWriter( | ||
| new PrintWriter(fileName + fileSuffix, UTF_8.name())))) { |
There was a problem hiding this comment.
If preFileRecords is not specified, we'd better make the filename the same as the previous filename (without fileSuffix)
| batch = new ArrayList<>(batchSize); | ||
| sequenceId++; | ||
| } | ||
| if ((preFileRecords > -1) && (count >= preFileRecords)) { |
There was a problem hiding this comment.
Seems like the ldb will generate unlimited empty file If the preFileRecords is zero.
| @CommandLine.Option(names = {"--pre-file-records"}, | ||
| description = "The number of print records per file.", | ||
| defaultValue = "-1") | ||
| private long preFileRecords; |
There was a problem hiding this comment.
Nit: suggest --max-records-per-file
There was a problem hiding this comment.
Thanks for your comment and review, @xichen01 .
I will update soon.
There was a problem hiding this comment.
Please also rename preFileRecords to recordsPerFile.
(pre means "before")
|
Can you help review this PR again, @xichen01 ? |
|
Can you help review this pr, @kerneltime @errose28 . |
| } | ||
| fileSuffix++; | ||
| } | ||
| } else { |
There was a problem hiding this comment.
Perhaps we can simplify this if... else
Like:
//...
String fileNameXXX = preFileRecords > 0 ? fileName + fileSuffix++ : fileName;
//...
new PrintWriter(fileNameXXX, UTF_8.name())
errose28
left a comment
There was a problem hiding this comment.
Thanks for the improvement @jianghuazhu. I think the idea is solid since just using split on a stdout stream may produce individual files that are not valid json. Let's add some tests to TestLDBCli to make sure we have all the corner cases around various flag combinations working.
| private int threadCount; | ||
|
|
||
| @CommandLine.Option(names = {"--max-records-per-file"}, | ||
| description = "The number of print records per file.", |
There was a problem hiding this comment.
| description = "The number of print records per file.", | |
| description = "The number of records to print per file.", |
| if ((preFileRecords > 0) && (count >= preFileRecords)) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
What's the expected behavior when this new --max-records-per-file flag is used without --out? Right now it looks like the choice that stdout is considered "one file" and so this flag overrides the --length option:
# The DB here has many more than 3 entries
$ ./ozone debug ldb --db=om.db scan --column_family=fileTable -l3 --max-records-per-file=2 | jq '.[].keyName' | wc -l
2
$ ./ozone debug ldb --db=om.db scan --column_family=fileTable -l2 --max-records-per-file=3 | jq '.[].keyName' | wc -l
2
Maybe we should disallow --max-records-per-file without --out.
There was a problem hiding this comment.
-l is also broken with this new option and I got a bit of a surprise trying to test this 😄 I would have expected 5 files, here not 57 thousand.
$ ./ozone debug ldb --db=om.db scan --column_family=fileTable -l10 --max-records-per-file=2 --out=foo
^C
$ ls -l | grep foo | wc -l
57343
There was a problem hiding this comment.
Thank you for your comment and review.
I will update soon.
There was a problem hiding this comment.
-lis also broken with this new option and I got a bit of a surprise trying to test this 😄 I would have expected 5 files, here not 57 thousand.$ ./ozone debug ldb --db=om.db scan --column_family=fileTable -l10 --max-records-per-file=2 --out=foo ^C $ ls -l | grep foo | wc -l 57343
When --out is not set, all records are output to stdout.
When --max-records-per-file and -l exist at the same time, --max-records-per-file shall prevail.
|
/pending |
|
Sorry, I had some other work some time ago. |
|
https://github.com/apache/ozone/actions/runs/8849501009/job/24349282946?pr=6420 |
|
|
ci: https://github.com/jianghuazhu/ozone/actions/runs/8876637070 |
|
/ready |
Blocking review request is removed.
|
|
||
| private boolean withinLimit(long i) { | ||
| return limit == -1L || i < limit; | ||
| return recordsPerFile > 0 || limit == -1L || i < limit; |
There was a problem hiding this comment.
if the recordsPerFile > 0 is true, the subsequent judgments will be short-circuited, including the i < limit then the limit will be invalidated. This is not a expected function.
There was a problem hiding this comment.
Thanks @xichen01 for the comment and review.
When recordsPerFile>0, it means that --max-records-per-file has taken effect, and --limit should be ignored at this time.
There was a problem hiding this comment.
The --limit is used to limit the total count of records, the --max-records-per-file is used to limit the max records count of specific file.
Such as:
ozone debug ldb ... --limit 10 --max-records-per-file 1 --out result.txt
This command should generate 10 files, like result.txt0, result.txt1, ..., and each of them has 1 record.
There was a problem hiding this comment.
I'll update later.
|
/pending "I'll update later" |
|
Thank you very much for the patch. I am closing this PR temporarily as there was no activity recently and it is waiting for response from its author. It doesn't mean that this PR is not important or ignored: feel free to reopen the PR at any time. It only means that attention of committers is not required. We prefer to keep the review queue clean. This ensures PRs in need of review are more visible, which results in faster feedback for all PRs. If you need ANY help to finish this PR, please contact the community on the mailing list or the slack channel." |
|
Continued in #7467. |


What changes were proposed in this pull request?
When executing the ldb command, if the data is very large, a very large file will be generated, which is not friendly. This pr will add a new function that can control the maximum number of records allowed to be saved in each file.
Details:
HDDS-10568
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10568
How was this patch tested?