Skip to content

CASSANDRA-18802: Parallelized UCS compactions#3688

Closed
blambov wants to merge 2 commits intoapache:trunkfrom
blambov:CASSANDRA-18802
Closed

CASSANDRA-18802: Parallelized UCS compactions#3688
blambov wants to merge 2 commits intoapache:trunkfrom
blambov:CASSANDRA-18802

Conversation

@blambov
Copy link
Contributor

@blambov blambov commented Nov 14, 2024

CASSANDRA-18802

Implements parallelization of UCS compactions that would split the output into multiple sstables starting at known in advance boundaries. To do this, introduces a CompositeLifecycleTransaction that consists of multiple PartialLifecycleTransaction parts and only commits after all individual parts have completed, and creates multiple compaction tasks under the same composite transaction. Each individual task has a separate operation UUID, but to make it possible to recognize composite tasks their UUIDs have a part index as their sequence component (visible as "800n" as the second-to-last component of the UUID string), while non-parallelized ones have 0.

Major compaction is also changed to take advantage of parallelization. To make sure that it can be carried out on an active node, the number of threads used by a major compaction can be limited, to half the compaction threads by default. This is implemented by creating CompositeCompactionTasks that execute multiple tasks serially and grouping the major compaction tasks into a limited number of composites.

Because we do not currently support arbitrary filtering of the ranges of an sstable, parallelized compactions cannot use early open. Despite this, they are able to achieve comparable or better performance.

The first two commits bring in CASSANDRA-20092, which is needed for correct calculation of total compaction sizes. The next commit introduces some utilities that are helpful but not ultimately necessary for this patch (it can be easily adjusted to not use them; it is likely that SAI changes will bring these in independently). The final commit simplifies the method of creating the compaction-strategy-specific scanner list variation and is also optional.

@blambov blambov force-pushed the CASSANDRA-18802 branch 2 times, most recently from 1c51472 to aca9e20 Compare November 21, 2024 10:19
@blambov blambov marked this pull request as ready for review November 21, 2024 10:29
/**
* force a major compaction of this column family
*
* @param permittedParallelism the maximum number of compaction threads that can be used by the operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: wouldn't hurt to say that permittedParallelism <= 0 is supported and treated specially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Verifying that it is indeed applied as described uncovered problems.

In fact, the strategy itself is the wrong place to apply this limit, because we may have other tasks (e.g. from other repair state or other disks).

Moved all the limit application to CompactionStrategyManager

}
}

public ScannerList getScanners(Collection<SSTableReader> toCompact)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can go away; it does the same thing than the default implementation of the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* - perform a full (maximum possible) compaction if requested by the user
*/
public abstract class AbstractCompactionStrategy
public abstract class AbstractCompactionStrategy implements ScannerFactory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I find it slightly convoluted to implement ScannerFactory, and only a handful of calls to getScanner on the strategy remains. I'd have added a protected method instead (for LCS to override)

protected ScannerFactory scannerFactory()
{
   return ScannerFactory.DEFAULT;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param cfStore
* @param compacting we take the drop-candidates from this set, it is usually the sstables included in the compaction
* @param overlapping the sstables that overlap the ones in compacting.
* @param overlappingSupplier function used to get the sstables that overlap the ones in compacting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd rephrase to something like "called on the compacting sstables to compute the set of sstables that overlap with them if needed" (both to specify what the function argument is, and suggest that the reason this is a method is to avoid the computation if it's not needed (which I assume the reason?)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

task.execute(active);
ranCompaction = true;
}
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I personally really find if that mix branches with and without brackets distracting/bad (and I've not seen others use them, though admittedly my personal experience with this particular code base is outdated of late). I could have swear this was explicitly prohibited by the coding style, but it's admittedly not clearly stated in the current form, and so be it. But this one trips me every time, so I figured I'd at least voice my opinion once, just so I feel better having voiced it, but it's quite a personal preference and feel free to ignore (and I won't bring it again).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

public TimeUUID register(PartialLifecycleTransaction part)
{
int index = partsToCommitOrAbort.incrementAndGet();
return mainTransaction.opId().withSequence(index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in practice, the idea of having linked operation ID between the parent and child tasks make sense. But this does kind of assume that the mainTransaction ID was generated with a 0 sequence (nothing would strongly break if that's not the case, but it's still somewhat assumed here), and that's not documented/easy to miss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment to the constructor.

{
partsCount = partsToCommitOrAbort.get();
initializationComplete = true;
// TODO: Switch to trace before merging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging this so it doesn't get forgotten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

long gcBefore = dontPurgeTombstones ? Long.MIN_VALUE : getDefaultGcBefore(cfs, nowInSec);
controller = new ValidationCompactionController(cfs, gcBefore);
scanners = cfs.getCompactionStrategyManager().getScanners(sstables, ranges);
scanners = ScannerFactory.DEFAULT.getScanners(sstables, ranges);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super fussed about it, but isn't it a slight regression that this is not allowed to rely on the compaction strategy to optimize the scanners?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, that optimization will no longer apply here.

We have three options on this:

  • Leave as is (special scanner applies only to compactions, no cost to get list)
  • Roll back b249598 and associated follow-ups (special scanner applies everywhere, all operations pay a collection cost when getting the scanner list)
  • Drop the special LCS scanner altogether (LCS pays a cost of up to 1 extra key comparison per partition in exchange for lower call polymorphism).

If you are not happy with the first option, I can run some benchmarks to see if the special scanner actually saves anything to choose between the other two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I love simplifying code and would rather we get rid of the special case, especially since LCS is in my mind kind of deprecated (in favor of UCS). But I'm sure plenty of users still use LCS and the truth is that I have no good intuition on the impact of this. What's the concrete impact of removing that LCS scanner for compaction? and for validation/anti-compaction? I feel the right decision mostly depend on the answer to those questions (unless we've officially deprecated LCS, but I don't think that's the case)?

I do understand doing benchmark might be time consuming though. I just feel I'm lacking proper data to have an opinion at the moment. But my only concern is making sure LCS users don't experience a clearly noticeable performance because of this; I'm not attached at all to that special scanner otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the change, opening a separate ticket to deal with the ScannerList complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name = {"-j", "--jobs"},
description = "Use -j to specify the maximum number of threads to use for parallel compaction. " +
"If not set, up to half the compaction threads will be used. " +
"If set to 0, the major compaction will use all threads and will not permit other compactions to run until it completes (use with caution).")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is up-to-date. Afaict, unset and 0 behave the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now fixed,

}

/**
* Returns a new TimeUUID with the same timestamp as this one, but with the provided sequence value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call out that this method should be used with care as it removes "uniqueness" guarantees of the returned TimeUUID (same as we warn on minAtUnixMillis for instance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@blambov blambov force-pushed the CASSANDRA-18802 branch 2 times, most recently from c1ac9de to c505d67 Compare December 11, 2024 08:52
Copy link
Contributor

@pcmanus pcmanus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, awesome work.

patch by Branimir Lambov, reviewed by Sylvain Lebresne for CASSANDRA-18802
@blambov
Copy link
Contributor Author

blambov commented Dec 11, 2024

Committed as 54e4688.

@blambov blambov closed this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants