Skip to content

Conversation

@keith-turner
Copy link
Contributor

No description provided.

Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property is used in a lot of places in the tests. The build is failing due to deprecation warnings about it.

In non-test code, I think you can/should inline the AccumuloConfiguration.getMaxFilesPerTablet() method into the deprecated compaction code, so it goes away when that stuff goes away.

As for the rest of the test code, I took a stab at replacing some of these, but it got overwhelming and I decided to give up on it for the time being. Maybe I can come back to it and help out later. The main problem here is that it's not always easy to tell if the property was used arbitrarily, just to test setting a property (in which case it can be substituted for another property) or if it was an actually relevant setting for the test. In some cases, it might have once been relevant, but no longer functions the way it used to, so it can just be deleted. In other cases, you probably have to set the tserver property instead... and it's not clear if the tserver needs to bounce or if it can be picked up from the system config dynamically (which is not usually the case for tserver.* properties).

Anyway, there's a lot of tedious work to discontinue using the deprecated property throughout the test code.

@FineAndDandy
Copy link
Contributor

I would love if instead of deprecating this we actually made it an imperative to compact.

@keith-turner
Copy link
Contributor Author

I would love if instead of deprecating this we actually made it an imperative to compact.

I have been thinking about how this could be achieved efficiently. The way the old code worked for max files was so inefficient. The compaction ratio efficiently reduces the number of files, but it is hard to reason about the number of files it will produce.

One problem with the compaction ratio is that the total number of files you end up with in a tablet depends on the smallest expected files For example if really small files are arriving in a tablet, like 1000 bytes. Then you will end up with 3x1K files, 3x3K files, 3x9K files, etc. So starting at 1K with a compaction ratio of 3, you end up with 20 levels or 60 total files before getting to 1G. So with the current compaction ratio the number of levels and total number files you end up with depends s on the smallest file arriving.

For larger files we want a higher compaction ratio, but maybe we don't want that for smaller files. This led me to wonder about making the compaction ratio a function of the file size. Something like the following that more aggressively reduces the number of small files, while still doing logarithmic work.

      // function that given a file size computes a compaction ratio
      Function<Long,Double> ratioFunction = size -> {
       if(size < 1_000) {
            return 1.0;
        } else if(size < 10_000) {
          return 1.25;
        } else if(size < 100_000) {
          return 1.5;
        }else if(size < 1_000_000) {
          return 1.75;
        } else if(size < 10_000_000) {
          return 2.0;
        } else if(size < 100_000_000) {
          return 2.25;
        } else {
          return 3.0;
        }
      };

Tying to figure out if we could generate the above function given a target number of max files and expected max size that accomodates any minimum file size. So a user does not specify the above function, but one is generated based on desired max files and expected max file size. If this could be done, it could be implemented as another compaction planner for 2.x.

@ctubbsii
Copy link
Member

ctubbsii commented Dec 24, 2023

Math.max(1,Math.min(3,1+Math.floor(Math.log(size)/Math.log(10)-2)*.25))

@keith-turner
Copy link
Contributor Author

Decided to not deprecate the table.file.max property in favor of #4127. However not going to close this PR for now, the changes to make all the test stop using the table.file.max prop in this PR still seem like good changes to make. So plan to make this no longer deprecate the prop, but still remove the unnecessary use of the prop in all of the test.

@ctubbsii
Copy link
Member

Decided to not deprecate the table.file.max property in favor of #4127. However not going to close this PR for now, the changes to make all the test stop using the table.file.max prop in this PR still seem like good changes to make. So plan to make this no longer deprecate the prop, but still remove the unnecessary use of the prop in all of the test.

The description update to document how it works with the default strategy would be good to keep from this as well, if that's not already done as part of 4127.

@ctubbsii ctubbsii modified the milestones: 3.1.0, 2.1.3 Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants