Skip to content

External compaction design#266

Closed
dlmarion wants to merge 8 commits intoapache:next-releasefrom
dlmarion:external-compaction-design
Closed

External compaction design#266
dlmarion wants to merge 8 commits intoapache:next-releasefrom
dlmarion:external-compaction-design

Conversation

@dlmarion
Copy link
Contributor

@dlmarion dlmarion commented Mar 2, 2021

No description provided.

Copy link
Contributor

@milleruntime milleruntime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I provided some thoughts for consideration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `tserver.compaction.major.service.<service>.planner.opts.executors`: a json array where each object in the array has the fields name, maxSize, and numThreads. For example:
* `tserver.compaction.major.service.default.planner.opts.executors`: a json array where each object in the array has the fields name, maxSize, and numThreads. For example:

The default name should be "default" right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can modify the behavior of the default compaction planner by using these properties, then yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `tserver.compaction.major.service.<service>.planner.opts.maxOpen`: number of files that will be included in a single compaction
* `tserver.compaction.major.service.default.planner.opts.maxOpen`: number of files that will be included in a single compaction

The default name should be "default" right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can modify the behavior of the default compaction planner by using these properties, then yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like at least one of these methods should include a timeout, I am not sure where that would be determined though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have the tservers push the summary information when it changes? This way we don't want to burden tservers by polling while it is busy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have the tservers push the summary information when it changes?

I am uncertain which is better, but I am leaning a bit towards pushing. One advantage to tservers pushing is that tservers w/ nothing in their external queues would not push anything. With polling, it would keep polling tservers that have no tablets in their external queues.

Was thinking polling is better in the case when the coordinator starts up. It can do something like the following.

  • poll all tservers
  • start accepting request for work from compactors

With push, I suppose it could do something like the following

  • wait X minutes for tservers to report external compaction summary info
  • start accepting request for work from compactors

So need to figure out the coordinator startup behavior w/ push.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about only polling at startup so compactors can get started right away? I suppose a hybrid approach would then require 2 different RPC endpoints though. Maybe that's not such a big deal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hybrid approach may be best, I was hoping to think of something simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tserver push frequency is known, like tservers push at least every 30 secs. Then the coordinator could wait 2x that I think and it would be close to optimal. If the coordinator misses a few updates from tservers before handing out work, then it may start some lower priority compactions running before higher prio ones. That is probably ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the priority be system generated (like age + size)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default planner currenty uses the compaction type and #of files to construct a priority. The following are links to the code that currently creates the priority.

https://github.com/apache/accumulo/blob/b12d1103d788473168616f28ae87a65bb76aa880/core/src/main/java/org/apache/accumulo/core/spi/compaction/DefaultCompactionPlanner.java#L236

https://github.com/apache/accumulo/blob/b12d1103d788473168616f28ae87a65bb76aa880/core/src/main/java/org/apache/accumulo/core/util/compaction/CompactionJobPrioritizer.java#L33

@dlmarion and I were talking and concluded that long is probably too high cardinality for priority. Maybe a 16 bit or 8 bit integer would be better. Then maybe the prio could be type:log2(numFiles)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to call cancel on any compactions if a table is deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good to have different configurable options for handling this scenario. Option 1 would be to cancel any compactions on the arrival of new files. Option 2 would be to allow the compaction to finish (not cancel on arrival). This would allow users to tune their cluster for faster scans or cleaning old data. This might also be complicated by whether the compaction is a USER or system type. For example, we may want to allow user compactions to finish vs cancelling a system one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I am not sure if we want to cancel compactions with the arrival of new data. I don't think we currently do that at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording needs to be updated. The compaction is not yet running and cancel just means its canceled in the queue. The current code never cancels a running compaction when the plan changes, only queued compactions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. So something more like "EE2 is removed from the queue"

Copy link
Contributor

@keith-turner keith-turner Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Removed from the queue" would be better wording. In the code it actually just cancels it, possibly leaving it in the queue for efficiency. Something canceled in the queue would never run. However those are just implementation details, its doing lazy removal.

Below is where only queued compactions are canceled

https://github.com/apache/accumulo/blob/b12d1103d788473168616f28ae87a65bb76aa880/server/tserver/src/main/java/org/apache/accumulo/tserver/compactions/CompactionService.java#L185

Below is where a task is atomically transitioned to cancel, but only if its currently in the queued state.

https://github.com/apache/accumulo/blob/b12d1103d788473168616f28ae87a65bb76aa880/server/tserver/src/main/java/org/apache/accumulo/tserver/compactions/CompactionExecutor.java#L112

Below is where a task atomically transitions from queued state to running. The task in the canceled state, then the transition will fail.

https://github.com/apache/accumulo/blob/b12d1103d788473168616f28ae87a65bb76aa880/server/tserver/src/main/java/org/apache/accumulo/tserver/compactions/CompactionExecutor.java#L88

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this design is strictly for online compactions but we could make offline compactions configurable. Maybe even have a separate queue for handling offline compactions.

@dlmarion dlmarion changed the base branch from main to next-release April 20, 2021 17:09
@dlmarion dlmarion closed this May 11, 2021
@dlmarion dlmarion deleted the external-compaction-design branch May 11, 2021 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments