Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compaction property redesign #4273

Open
ctubbsii opened this issue Feb 15, 2024 · 4 comments
Open

Compaction property redesign #4273

ctubbsii opened this issue Feb 15, 2024 · 4 comments
Assignees
Milestone

Comments

@ctubbsii
Copy link
Member

Had a discussion with @keith-turner and @ddanielr about some of the compaction configuration and SPI design, and we came up with some of these ideas that could be implemented:

One major unknown is what to call it. We went back and forth a bit on whether it should be CompactionService* or CompactionPlanner*

  1. Remove user-facing SPI for CompactionPlanner and related config
  2. Replace with CompactionServiceFactory:
    a. compaction.service.factory holds the class name (singleton ref in ServerContext?)
    b. compaction.service.factory.config holds a single string of all the factory's config
  3. Suggested API (names can change later, this is just an idea)
interface CompactionServiceFactory {
   void init(PluginEnvironment env);
   CompactionService forName(String service);
}

// configuration for a bounded named group / queue
class GroupConfig {
  String name;
  int maxQueueSize;
}

interface CompactionService {
  Set<GroupConfig> getGroups();
  List<Job> plan(tabletPlanningInformation);
}
  1. init can validate the config, or re-validate periodically (implementation dependent)

Our default configuration for the compaction.service.factory.config could be (not pretty-printed, but can be copied/pasted into the property description in an HTML-friendly pretty-printed way):

[
    {
        "meta": {
            "maxOpenFilesPerJob": "30",
            "groups": [
                {
                    "name": "accumulo_meta_small",
                    "maxSize": "128M",
                    "maxJobs": "1000"
                },
                {
                    "name": "accumulo_meta_large",
                    "maxJobs": "1000"
                }
            ]
        },
        "default": {
            "maxOpenFilesPerJob": "30",
            "groups": [
                {
                    "name": "user_small",
                    "maxSize": "128M",
                    "maxJobs": "1000"
                },
                {
                    "name": "user_large",
                    "maxJobs": "1000"
                }
            ]
        }
    }
]

One of the main benefits of this is having simplified configuration for user's configuration file. Also, we'll be able to set a default value in the DefaultConfiguration that actually lives in one place, and is easy for users to view for reference that represents the actual default behavior and override if they want to.

Re: #3981, #4034 , #4061

@ctubbsii ctubbsii added the bug This issue has been verified to be a bug. label Feb 15, 2024
@keith-turner
Copy link
Contributor

For naming, could have the following property names.

compaction.services.planner.factory
compaction.services.planner.factory.config

And the following SPI with Planner in the name and changed forName to forService.

interface CompactionPlannerFactory {
   void init(PluginEnvironment env);
   CompactionPlanner forService(String serviceName);
}

interface CompactionPlanner {
  Set<GroupConfig> getGroups();
  List<Job> plan(tabletPlanningInformation);
}

@dlmarion
Copy link
Contributor

Thinking about property validation, something that I just added to the Upgrader in 2.1.3, having properties defined in the SPI classes and not in Property.java makes validation incomplete. It might be good for the service provider interfaces to extend a common base interface that has a method which returns a property validator object. That would let the SPI implementer use whatever properties they like, but we still have a way to validate that the configuration is correct.

@dlmarion
Copy link
Contributor

Had a discussion with @ddanielr about this. Taking into account #3546 I think the default configuration should be simple. As in:

[
  {
    "cs1": {
      "maxOpenFilesPerJob": "30",
      "groups": [
          {
            "name": "default",
          }
      ]
    }
  }
]

@ctubbsii ctubbsii modified the milestones: 4.0.0, 3.1.0 Jul 12, 2024
@dlmarion dlmarion removed the bug This issue has been verified to be a bug. label Aug 19, 2024
@dlmarion
Copy link
Contributor

I removed the bug label as this issue does not reflect a defect in the current implementation. I also moved this from from correctness iteration in the Elasticity project to the scaling iteration. These changes are not required for the correct operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants