CASSANDRA-18802: Parallelized UCS compactions by blambov · Pull Request #3688 · apache/cassandra

blambov · 2024-11-14T16:10:42Z

Implements parallelization of UCS compactions that would split the output into multiple sstables starting at known in advance boundaries. To do this, introduces a CompositeLifecycleTransaction that consists of multiple PartialLifecycleTransaction parts and only commits after all individual parts have completed, and creates multiple compaction tasks under the same composite transaction. Each individual task has a separate operation UUID, but to make it possible to recognize composite tasks their UUIDs have a part index as their sequence component (visible as "800n" as the second-to-last component of the UUID string), while non-parallelized ones have 0.

Major compaction is also changed to take advantage of parallelization. To make sure that it can be carried out on an active node, the number of threads used by a major compaction can be limited, to half the compaction threads by default. This is implemented by creating CompositeCompactionTasks that execute multiple tasks serially and grouping the major compaction tasks into a limited number of composites.

Because we do not currently support arbitrary filtering of the ranges of an sstable, parallelized compactions cannot use early open. Despite this, they are able to achieve comparable or better performance.

The first two commits bring in CASSANDRA-20092, which is needed for correct calculation of total compaction sizes. The next commit introduces some utilities that are helpful but not ultimately necessary for this patch (it can be easily adjusted to not use them; it is likely that SAI changes will bring these in independently). The final commit simplifies the method of creating the compaction-strategy-specific scanner list variation and is also optional.

pcmanus · 2024-12-05T10:55:47Z

src/java/org/apache/cassandra/db/ColumnFamilyStoreMBean.java

+    /**
+     * force a major compaction of this column family
+     *
+     * @param permittedParallelism the maximum number of compaction threads that can be used by the operation


Nit: wouldn't hurt to say that permittedParallelism <= 0 is supported and treated specially.

Good call. Verifying that it is indeed applied as described uncovered problems.

In fact, the strategy itself is the wrong place to apply this limit, because we may have other tasks (e.g. from other repair state or other disks).

Moved all the limit application to CompactionStrategyManager

pcmanus · 2024-12-05T11:04:36Z

src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java

-        }
-    }
-
    public ScannerList getScanners(Collection<SSTableReader> toCompact)


This method can go away; it does the same thing than the default implementation of the interface.

pcmanus · 2024-12-05T11:11:24Z

src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java

 *  - perform a full (maximum possible) compaction if requested by the user
 */
-public abstract class AbstractCompactionStrategy
+public abstract class AbstractCompactionStrategy implements ScannerFactory


Nit: I find it slightly convoluted to implement ScannerFactory, and only a handful of calls to getScanner on the strategy remains. I'd have added a protected method instead (for LCS to override)

protected ScannerFactory scannerFactory() { return ScannerFactory.DEFAULT; }

pcmanus · 2024-12-05T11:13:34Z

src/java/org/apache/cassandra/db/compaction/CompactionController.java

     * @param cfStore
     * @param compacting we take the drop-candidates from this set, it is usually the sstables included in the compaction
-     * @param overlapping the sstables that overlap the ones in compacting.
+     * @param overlappingSupplier function used to get the sstables that overlap the ones in compacting.


Nit: I'd rephrase to something like "called on the compacting sstables to compute the set of sstables that overlap with them if needed" (both to specify what the function argument is, and suggest that the reason this is a method is to avoid the computation if it's not needed (which I assume the reason?)).

pcmanus · 2024-12-05T11:42:58Z

src/java/org/apache/cassandra/db/compaction/CompactionManager.java

+                        task.execute(active);
                    ranCompaction = true;
                }
+                else


Nit: I personally really find if that mix branches with and without brackets distracting/bad (and I've not seen others use them, though admittedly my personal experience with this particular code base is outdated of late). I could have swear this was explicitly prohibited by the coding style, but it's admittedly not clearly stated in the current form, and so be it. But this one trips me every time, so I figured I'd at least voice my opinion once, just so I feel better having voiced it, but it's quite a personal preference and feel free to ignore (and I won't bring it again).

pcmanus · 2024-12-05T16:09:39Z

src/java/org/apache/cassandra/db/lifecycle/CompositeLifecycleTransaction.java

+    public TimeUUID register(PartialLifecycleTransaction part)
+    {
+        int index = partsToCommitOrAbort.incrementAndGet();
+        return mainTransaction.opId().withSequence(index);


Nit: in practice, the idea of having linked operation ID between the parent and child tasks make sense. But this does kind of assume that the mainTransaction ID was generated with a 0 sequence (nothing would strongly break if that's not the case, but it's still somewhat assumed here), and that's not documented/easy to miss.

Added comment to the constructor.

pcmanus · 2024-12-05T16:10:21Z

src/java/org/apache/cassandra/db/lifecycle/CompositeLifecycleTransaction.java

+    {
+        partsCount = partsToCommitOrAbort.get();
+        initializationComplete = true;
+        // TODO: Switch to trace before merging.


Flagging this so it doesn't get forgotten.

pcmanus · 2024-12-05T16:35:21Z

src/java/org/apache/cassandra/db/repair/CassandraValidationIterator.java

        long gcBefore = dontPurgeTombstones ? Long.MIN_VALUE : getDefaultGcBefore(cfs, nowInSec);
        controller = new ValidationCompactionController(cfs, gcBefore);
-        scanners = cfs.getCompactionStrategyManager().getScanners(sstables, ranges);
+        scanners = ScannerFactory.DEFAULT.getScanners(sstables, ranges);


Not super fussed about it, but isn't it a slight regression that this is not allowed to rely on the compaction strategy to optimize the scanners?

This is correct, that optimization will no longer apply here.

We have three options on this:

Leave as is (special scanner applies only to compactions, no cost to get list)

Roll back b249598 and associated follow-ups (special scanner applies everywhere, all operations pay a collection cost when getting the scanner list)

Drop the special LCS scanner altogether (LCS pays a cost of up to 1 extra key comparison per partition in exchange for lower call polymorphism).

If you are not happy with the first option, I can run some benchmarks to see if the special scanner actually saves anything to choose between the other two.

In principle, I love simplifying code and would rather we get rid of the special case, especially since LCS is in my mind kind of deprecated (in favor of UCS). But I'm sure plenty of users still use LCS and the truth is that I have no good intuition on the impact of this. What's the concrete impact of removing that LCS scanner for compaction? and for validation/anti-compaction? I feel the right decision mostly depend on the answer to those questions (unless we've officially deprecated LCS, but I don't think that's the case)?

I do understand doing benchmark might be time consuming though. I just feel I'm lacking proper data to have an opinion at the moment. But my only concern is making sure LCS users don't experience a clearly noticeable performance because of this; I'm not attached at all to that special scanner otherwise.

Reverted the change, opening a separate ticket to deal with the ScannerList complexity.

CASSANDRA-20134

pcmanus · 2024-12-05T16:40:00Z

src/java/org/apache/cassandra/tools/nodetool/Compact.java

+            name = {"-j", "--jobs"},
+            description = "Use -j to specify the maximum number of threads to use for parallel compaction. " +
+                          "If not set, up to half the compaction threads will be used. " +
+                          "If set to 0, the major compaction will use all threads and will not permit other compactions to run until it completes (use with caution).")


I don't think this comment is up-to-date. Afaict, unset and 0 behave the same way.

This is now fixed,

pcmanus · 2024-12-05T16:43:57Z

src/java/org/apache/cassandra/utils/TimeUUID.java

+    }
+
+    /**
+     * Returns a new TimeUUID with the same timestamp as this one, but with the provided sequence value.


Let's call out that this method should be used with care as it removes "uniqueness" guarantees of the returned TimeUUID (same as we warn on minAtUnixMillis for instance).

pcmanus

Lgtm, awesome work.

patch by Branimir Lambov, reviewed by Sylvain Lebresne for CASSANDRA-18802

blambov · 2024-12-11T15:38:54Z

Committed as 54e4688.

blambov force-pushed the CASSANDRA-18802 branch 2 times, most recently from 1c51472 to aca9e20 Compare November 21, 2024 10:19

blambov marked this pull request as ready for review November 21, 2024 10:29

blambov force-pushed the CASSANDRA-18802 branch from aca9e20 to 5272d61 Compare November 21, 2024 10:39

blambov mentioned this pull request Nov 22, 2024

STAR-1872: Parallelize UCS compactions per output shard datastax/cassandra#1342

Merged

blambov force-pushed the CASSANDRA-18802 branch from 00eb416 to 1d4b867 Compare November 22, 2024 15:58

blambov force-pushed the CASSANDRA-18802 branch from 5d4c92b to 94cb773 Compare December 4, 2024 09:12

pcmanus reviewed Dec 5, 2024

View reviewed changes

blambov force-pushed the CASSANDRA-18802 branch 2 times, most recently from c1ac9de to c505d67 Compare December 11, 2024 08:52

pcmanus approved these changes Dec 11, 2024

View reviewed changes

blambov added 2 commits December 11, 2024 12:38

Parallelized UCS compactions

1ba21df

patch by Branimir Lambov, reviewed by Sylvain Lebresne for CASSANDRA-18802

Do not commit: CircleCI

d9acee7

blambov force-pushed the CASSANDRA-18802 branch from 466d3af to d9acee7 Compare December 11, 2024 10:40

blambov closed this Dec 11, 2024

Conversation

blambov commented Nov 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcmanus left a comment

Choose a reason for hiding this comment

Uh oh!

blambov commented Dec 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

blambov commented Nov 14, 2024 •

edited

Loading