CASSANDRA-18802: Parallelized UCS compactions#3688
CASSANDRA-18802: Parallelized UCS compactions#3688blambov wants to merge 2 commits intoapache:trunkfrom
Conversation
1c51472 to
aca9e20
Compare
aca9e20 to
5272d61
Compare
00eb416 to
1d4b867
Compare
5d4c92b to
94cb773
Compare
| /** | ||
| * force a major compaction of this column family | ||
| * | ||
| * @param permittedParallelism the maximum number of compaction threads that can be used by the operation |
There was a problem hiding this comment.
Nit: wouldn't hurt to say that permittedParallelism <= 0 is supported and treated specially.
There was a problem hiding this comment.
Good call. Verifying that it is indeed applied as described uncovered problems.
In fact, the strategy itself is the wrong place to apply this limit, because we may have other tasks (e.g. from other repair state or other disks).
Moved all the limit application to CompactionStrategyManager
| } | ||
| } | ||
|
|
||
| public ScannerList getScanners(Collection<SSTableReader> toCompact) |
There was a problem hiding this comment.
This method can go away; it does the same thing than the default implementation of the interface.
| * - perform a full (maximum possible) compaction if requested by the user | ||
| */ | ||
| public abstract class AbstractCompactionStrategy | ||
| public abstract class AbstractCompactionStrategy implements ScannerFactory |
There was a problem hiding this comment.
Nit: I find it slightly convoluted to implement ScannerFactory, and only a handful of calls to getScanner on the strategy remains. I'd have added a protected method instead (for LCS to override)
protected ScannerFactory scannerFactory()
{
return ScannerFactory.DEFAULT;
}| * @param cfStore | ||
| * @param compacting we take the drop-candidates from this set, it is usually the sstables included in the compaction | ||
| * @param overlapping the sstables that overlap the ones in compacting. | ||
| * @param overlappingSupplier function used to get the sstables that overlap the ones in compacting. |
There was a problem hiding this comment.
Nit: I'd rephrase to something like "called on the compacting sstables to compute the set of sstables that overlap with them if needed" (both to specify what the function argument is, and suggest that the reason this is a method is to avoid the computation if it's not needed (which I assume the reason?)).
| task.execute(active); | ||
| ranCompaction = true; | ||
| } | ||
| else |
There was a problem hiding this comment.
Nit: I personally really find if that mix branches with and without brackets distracting/bad (and I've not seen others use them, though admittedly my personal experience with this particular code base is outdated of late). I could have swear this was explicitly prohibited by the coding style, but it's admittedly not clearly stated in the current form, and so be it. But this one trips me every time, so I figured I'd at least voice my opinion once, just so I feel better having voiced it, but it's quite a personal preference and feel free to ignore (and I won't bring it again).
| public TimeUUID register(PartialLifecycleTransaction part) | ||
| { | ||
| int index = partsToCommitOrAbort.incrementAndGet(); | ||
| return mainTransaction.opId().withSequence(index); |
There was a problem hiding this comment.
Nit: in practice, the idea of having linked operation ID between the parent and child tasks make sense. But this does kind of assume that the mainTransaction ID was generated with a 0 sequence (nothing would strongly break if that's not the case, but it's still somewhat assumed here), and that's not documented/easy to miss.
There was a problem hiding this comment.
Added comment to the constructor.
| { | ||
| partsCount = partsToCommitOrAbort.get(); | ||
| initializationComplete = true; | ||
| // TODO: Switch to trace before merging. |
There was a problem hiding this comment.
Flagging this so it doesn't get forgotten.
| long gcBefore = dontPurgeTombstones ? Long.MIN_VALUE : getDefaultGcBefore(cfs, nowInSec); | ||
| controller = new ValidationCompactionController(cfs, gcBefore); | ||
| scanners = cfs.getCompactionStrategyManager().getScanners(sstables, ranges); | ||
| scanners = ScannerFactory.DEFAULT.getScanners(sstables, ranges); |
There was a problem hiding this comment.
Not super fussed about it, but isn't it a slight regression that this is not allowed to rely on the compaction strategy to optimize the scanners?
There was a problem hiding this comment.
This is correct, that optimization will no longer apply here.
We have three options on this:
- Leave as is (special scanner applies only to compactions, no cost to get list)
- Roll back b249598 and associated follow-ups (special scanner applies everywhere, all operations pay a collection cost when getting the scanner list)
- Drop the special LCS scanner altogether (LCS pays a cost of up to 1 extra key comparison per partition in exchange for lower call polymorphism).
If you are not happy with the first option, I can run some benchmarks to see if the special scanner actually saves anything to choose between the other two.
There was a problem hiding this comment.
In principle, I love simplifying code and would rather we get rid of the special case, especially since LCS is in my mind kind of deprecated (in favor of UCS). But I'm sure plenty of users still use LCS and the truth is that I have no good intuition on the impact of this. What's the concrete impact of removing that LCS scanner for compaction? and for validation/anti-compaction? I feel the right decision mostly depend on the answer to those questions (unless we've officially deprecated LCS, but I don't think that's the case)?
I do understand doing benchmark might be time consuming though. I just feel I'm lacking proper data to have an opinion at the moment. But my only concern is making sure LCS users don't experience a clearly noticeable performance because of this; I'm not attached at all to that special scanner otherwise.
There was a problem hiding this comment.
Reverted the change, opening a separate ticket to deal with the ScannerList complexity.
| name = {"-j", "--jobs"}, | ||
| description = "Use -j to specify the maximum number of threads to use for parallel compaction. " + | ||
| "If not set, up to half the compaction threads will be used. " + | ||
| "If set to 0, the major compaction will use all threads and will not permit other compactions to run until it completes (use with caution).") |
There was a problem hiding this comment.
I don't think this comment is up-to-date. Afaict, unset and 0 behave the same way.
| } | ||
|
|
||
| /** | ||
| * Returns a new TimeUUID with the same timestamp as this one, but with the provided sequence value. |
There was a problem hiding this comment.
Let's call out that this method should be used with care as it removes "uniqueness" guarantees of the returned TimeUUID (same as we warn on minAtUnixMillis for instance).
c1ac9de to
c505d67
Compare
patch by Branimir Lambov, reviewed by Sylvain Lebresne for CASSANDRA-18802
466d3af to
d9acee7
Compare
|
Committed as 54e4688. |
CASSANDRA-18802
Implements parallelization of UCS compactions that would split the output into multiple sstables starting at known in advance boundaries. To do this, introduces a
CompositeLifecycleTransactionthat consists of multiplePartialLifecycleTransactionparts and only commits after all individual parts have completed, and creates multiple compaction tasks under the same composite transaction. Each individual task has a separate operation UUID, but to make it possible to recognize composite tasks their UUIDs have a part index as their sequence component (visible as "800n" as the second-to-last component of the UUID string), while non-parallelized ones have 0.Major compaction is also changed to take advantage of parallelization. To make sure that it can be carried out on an active node, the number of threads used by a major compaction can be limited, to half the compaction threads by default. This is implemented by creating
CompositeCompactionTasksthat execute multiple tasks serially and grouping the major compaction tasks into a limited number of composites.Because we do not currently support arbitrary filtering of the ranges of an sstable, parallelized compactions cannot use early open. Despite this, they are able to achieve comparable or better performance.
The first two commits bring in CASSANDRA-20092, which is needed for correct calculation of total compaction sizes. The next commit introduces some utilities that are helpful but not ultimately necessary for this patch (it can be easily adjusted to not use them; it is likely that SAI changes will bring these in independently). The final commit simplifies the method of creating the compaction-strategy-specific scanner list variation and is also optional.