-
Notifications
You must be signed in to change notification settings - Fork 478
deprecates table.file.max #4098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ctubbsii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property is used in a lot of places in the tests. The build is failing due to deprecation warnings about it.
In non-test code, I think you can/should inline the AccumuloConfiguration.getMaxFilesPerTablet() method into the deprecated compaction code, so it goes away when that stuff goes away.
As for the rest of the test code, I took a stab at replacing some of these, but it got overwhelming and I decided to give up on it for the time being. Maybe I can come back to it and help out later. The main problem here is that it's not always easy to tell if the property was used arbitrarily, just to test setting a property (in which case it can be substituted for another property) or if it was an actually relevant setting for the test. In some cases, it might have once been relevant, but no longer functions the way it used to, so it can just be deleted. In other cases, you probably have to set the tserver property instead... and it's not clear if the tserver needs to bounce or if it can be picked up from the system config dynamically (which is not usually the case for tserver.* properties).
Anyway, there's a lot of tedious work to discontinue using the deprecated property throughout the test code.
|
I would love if instead of deprecating this we actually made it an imperative to compact. |
I have been thinking about how this could be achieved efficiently. The way the old code worked for max files was so inefficient. The compaction ratio efficiently reduces the number of files, but it is hard to reason about the number of files it will produce. One problem with the compaction ratio is that the total number of files you end up with in a tablet depends on the smallest expected files For example if really small files are arriving in a tablet, like 1000 bytes. Then you will end up with 3x1K files, 3x3K files, 3x9K files, etc. So starting at 1K with a compaction ratio of 3, you end up with 20 levels or 60 total files before getting to 1G. So with the current compaction ratio the number of levels and total number files you end up with depends s on the smallest file arriving. For larger files we want a higher compaction ratio, but maybe we don't want that for smaller files. This led me to wonder about making the compaction ratio a function of the file size. Something like the following that more aggressively reduces the number of small files, while still doing logarithmic work. // function that given a file size computes a compaction ratio
Function<Long,Double> ratioFunction = size -> {
if(size < 1_000) {
return 1.0;
} else if(size < 10_000) {
return 1.25;
} else if(size < 100_000) {
return 1.5;
}else if(size < 1_000_000) {
return 1.75;
} else if(size < 10_000_000) {
return 2.0;
} else if(size < 100_000_000) {
return 2.25;
} else {
return 3.0;
}
};Tying to figure out if we could generate the above function given a target number of max files and expected max size that accomodates any minimum file size. So a user does not specify the above function, but one is generated based on desired max files and expected max file size. If this could be done, it could be implemented as another compaction planner for 2.x. |
|
|
|
Decided to not deprecate the table.file.max property in favor of #4127. However not going to close this PR for now, the changes to make all the test stop using the table.file.max prop in this PR still seem like good changes to make. So plan to make this no longer deprecate the prop, but still remove the unnecessary use of the prop in all of the test. |
The description update to document how it works with the default strategy would be good to keep from this as well, if that's not already done as part of 4127. |
No description provided.