CASSANDRA-17034: Memtable API #1295

blambov · 2021-10-28T15:38:19Z

Provides a memtable API as described in CEP-11.

adelapena · 2022-04-04T15:55:16Z

src/java/org/apache/cassandra/db/memtable/SkipListMemtable.java

+            for (AtomicBTreePartition partition : toFlush.values())
+            {
+                keySize += partition.partitionKey().getKey().remaining();
+                if (trackContention && partition.useLock())


I think trackContention is always true

Looking a bit more closely at this, it seems like the basic idea of getFlushSet() was separating writeSortedContents() into general logic and the skiplist-specific contention tracking stuff. That makes sense, but my only worry is that it looks like it also entails iterating over all the partitions twice. Would we be able to avoid that if the FlushCollection contract included something like a beforeAppend() callback that could do the contention tracking and logging?

I don't understand why trackContention would always be true (for one, running junit tests, e.g. SimpleQueryTest enters the path below and never this one).

Also, we previously iterated twice as well: once to gather keySize in the FlushRunnable constructor, and once to write the partitions in writeSortedContents. The new code moves the contended row count collection from the second loop (which now isn't memtable-specific) to the first (which is).

I don't understand why trackContention would always be true

I think what he means is that trackContention isn't always true, but if it's true on line 264 (in the outer if statement) then it's also true on line 272 (in the inner if statement).

we previously iterated twice

My bad. I didn't see the keySize bit in the FlushRunnable constructor.

Oh, I managed to miss that it's in that particular if too... sorry, fixed now.

adelapena · 2022-04-04T15:56:30Z

src/java/org/apache/cassandra/db/memtable/SkipListMemtable.java

+    {
+        private final TableMetadata metadata;
+        private final Iterator<Map.Entry<PartitionPosition, AtomicBTreePartition>> iter;
+        private final Map<PartitionPosition, AtomicBTreePartition> source;


source is only used by the constructor to get the iterator, we don't need the attribute

adelapena · 2022-04-04T15:57:50Z

src/java/org/apache/cassandra/db/ColumnFamilyStore.java

@@ -2382,7 +2500,7 @@ public void run()
    /**
     * Drops current memtable without flushing to disk. This should only be called when truncating a column family which is not durable.
     */
-    public Future<CommitLogPosition> dumpMemtable()
+    public Future<CommitLogPosition> dumpMemtable(FlushReason reason)


The reason argument is not used

Probably we should remove the argument if we are not going to use it.

Checked pmem code, it's not used there either. Removed now.

adelapena · 2022-04-04T16:00:00Z

src/java/org/apache/cassandra/db/memtable/Memtable.java

+ * Memtable interface. This defines the operations the ColumnFamilyStore can perform with memtables.
+ * They are of several types:
+ * - construction factory interface
+ * - write and read operations: put, getPartition and makePartitionIterator


The mentioned read methods (getPartition and makePartition) don't exist on UnfilteredSource.

I think there's a compile error around this in org.apache.cassandra.simulator.paxos.Ballots

Changed.

The error is fixed in a separate commit.

adelapena · 2022-04-04T16:03:01Z

src/java/org/apache/cassandra/db/memtable/Memtable.java

+    }
+
+    /**
+     * Interface for providing signals back to the owner.


Just to ease reading, I would add some details about what is the owner, like in the description of the owner parameter of Memtable.Factory#create.

adelapena · 2022-04-04T16:07:24Z

src/java/org/apache/cassandra/db/memtable/Memtable.java

+     * - SNAPSHOT will be followed by performSnapshot().
+     * - STREAMING/REPAIR will be followed by creating a FlushSet for the streamed/repaired ranges. This data will be
+     *   used to create sstables, which will be streamed and then deleted.
+     * This will not be called if the sstable is switched because of truncation or drop.


Indeed this isn't called from ColumnFamilyStore#dumpMemtable, but I think it can be called with TRUNCATE from ColumnFamilyStore#truncateBlocking.

Changed the comment somewhat, but I'm going to revisit this after looking at what Intel had to do for pmem.

adelapena · 2022-04-04T16:09:50Z

src/java/org/apache/cassandra/db/memtable/Memtable.java

+     * Called when the known ranges have been updated and owner.localRangeSplits() may return different values.
+     * This will not be called if shouldSwitch(OWNED_RANGES_CHANGE) returns true, the memtable will be swapped out
+     * instead.
+     * TODO: Implement call.


This is actually called from ColumnFamilyStore#invalidateLocalRanges, what call this refers to? Is it the missed call to owner.localRangeSplits in the implementation of the method?

adelapena · 2022-04-04T16:12:52Z

src/java/org/apache/cassandra/streaming/StreamSession.java

@@ -33,6 +33,7 @@
 import io.netty.util.concurrent.Future;
 import org.apache.cassandra.concurrent.ScheduledExecutors;
 import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.io.sstable.SSTableMultiWriter;


Nit: unused import

adelapena · 2022-04-04T16:14:23Z

src/java/org/apache/cassandra/schema/TableParams.java

@@ -178,6 +182,9 @@ public void validate()

        if (memtableFlushPeriodInMs < 0)
            fail("%s must be greater than or equal to 0 (got %s)", Option.MEMTABLE_FLUSH_PERIOD_IN_MS, memtableFlushPeriodInMs);
+
+        if (cdc && memtable.factory.writesShouldSkipCommitLog())
+            fail("CDC cannot work if writes skip the commit log. Check your memtable configuration.");


Suggested change

fail("CDC cannot work if writes skip the commit log. Check your memtable configuration.");

fail("CDC cannot work if memtable writes skip the commit log. Check your memtable configuration.");

I believe the original version is better -- the qualification is unnecessary because all writes go to the memtable, and it could cause unnecessary confusion.

adelapena · 2022-04-04T16:17:25Z

src/java/org/apache/cassandra/schema/MemtableParams.java

+    {
+        Map<String, String> copy = new HashMap<>(options);
+        String className = copy.remove(Option.CLASS.toString());
+        if (className.isEmpty() || className == null)


We should check first if className is null

adelapena · 2022-04-05T09:51:55Z

test/unit/org/apache/cassandra/cql3/MemtableQuickTest.java

+    static String keyspace;
+    String table;
+    ColumnFamilyStore cfs;


These can be local variables

adelapena · 2022-04-05T09:52:22Z

test/unit/org/apache/cassandra/cql3/MemtableQuickTest.java

+    int partitions = 50_000;
+    int rowsPerPartition = 4;
+
+    int deletedPartitionsStart = 20_000;
+    int deletedPartitionsEnd = deletedPartitionsStart + 10_000;
+
+    int deletedRowsStart = 40_000;
+    int deletedRowsEnd = deletedRowsStart + 5_000;


These can be private static final

adelapena · 2022-04-05T09:59:03Z

test/unit/org/apache/cassandra/cql3/MemtableQuickTest.java

+        CQLTester.setUpClass();
+        CQLTester.prepareServer();
+        CQLTester.disablePreparedReuseForTest();
+        System.err.println("setupClass done.");


Suggested change

System.err.println("setupClass done.");

System.out.println("setupClass done.");

adelapena · 2022-04-05T09:59:26Z

test/unit/org/apache/cassandra/cql3/MemtableQuickTest.java

+
+

Nit: double blank line

adelapena · 2022-04-05T10:01:26Z

test/unit/org/apache/cassandra/cql3/MemtableSizeTest.java

+            table = createTable(keyspace, "CREATE TABLE %s ( userid bigint, picid bigint, commentid bigint, PRIMARY KEY(userid, picid))" +
+                                      " with compression = {'enabled': false}" +
+                                      " and memtable = { 'class': '" + memtableClass + "'}");


Nit: parameter alignement

adelapena · 2022-04-05T11:05:58Z

test/unit/org/apache/cassandra/cql3/validation/operations/CreateTest.java

+        assertRows(execute(format("SELECT memtable FROM %s.%s WHERE keyspace_name = ? and table_name = ?;",
+                                  SchemaConstants.SCHEMA_KEYSPACE_NAME,
+                                  SchemaKeyspaceTables.TABLES),
+                           KEYSPACE,
+                           currentTable()),
+                   row(map()));


This block appears seven times across the test, only changing the map arguments. We could reduce duplication with an utility method like:

private void assertMemtableOptions(Object... options) throws Throwable { assertRows(execute(format("SELECT memtable FROM %s.%s WHERE keyspace_name = ? and table_name = ?", SchemaConstants.SCHEMA_KEYSPACE_NAME, SchemaKeyspaceTables.TABLES), KEYSPACE, currentTable()), row(map(options))); }

adelapena · 2022-04-05T11:09:35Z

test/unit/org/apache/cassandra/cql3/validation/operations/AlterTest.java

+        assertRows(execute(format("SELECT memtable FROM %s.%s WHERE keyspace_name = ? and table_name = ?;",
+                                  SchemaConstants.SCHEMA_KEYSPACE_NAME,
+                                  SchemaKeyspaceTables.TABLES),
+                           KEYSPACE,
+                           currentTable()),
+                   row(map()));


This block appears six times across the test, only changing the map arguments. We could reduce duplication with an utility method like:

private void assertMemtableOptions(Object... options) throws Throwable { assertRows(execute(format("SELECT memtable FROM %s.%s WHERE keyspace_name = ? and table_name = ?", SchemaConstants.SCHEMA_KEYSPACE_NAME, SchemaKeyspaceTables.TABLES), KEYSPACE, currentTable()), row(map(options))); }

adelapena · 2022-04-05T11:10:04Z

test/unit/org/apache/cassandra/cql3/validation/operations/AlterTest.java

@@ -568,6 +567,105 @@ public void testDoubleWith() throws Throwable
        }
    }

+


Nit: unneeded blank line

adelapena · 2022-04-05T11:14:52Z

test/unit/org/apache/cassandra/index/StubIndex.java

@@ -23,6 +23,7 @@
 import java.util.function.BiFunction;

 import org.apache.cassandra.Util;
+import org.apache.cassandra.db.memtable.Memtable;


Nit: unused import

adelapena · 2022-04-05T11:25:59Z

src/java/org/apache/cassandra/db/SystemKeyspace.java

@@ -141,6 +141,7 @@ private SystemKeyspace()
              + "version int,"
              + "PRIMARY KEY ((id)))")
              .partitioner(new LocalPartitioner(TimeUUIDType.instance))
+              .memtable(MemtableParams.DEFAULT)


Do we need this? MemtableParams.DEFAULT is already the default

adelapena · 2022-04-05T13:43:15Z

src/java/org/apache/cassandra/config/DatabaseDescriptor.java

+        if (conf == null)
+            return null;


Can conf be null when this is called?

Apparently some tests can end up calling this without initializing the configuration.

adelapena · 2022-04-05T15:30:36Z

src/java/org/apache/cassandra/db/memtable/SkipListMemtable.java

+
+            public ShardBoundaries localRangeSplits(int shardCount)
+            {
+                return null; // not implemented


Maybe we should throw UnsupportedOperationException

The only callsite actually had access to the CFS, this is not necessary any more.

adelapena · 2022-04-05T15:59:12Z

src/java/org/apache/cassandra/db/memtable/Memtable.java

+         * splitting the owned space evenly. It is up to the memtable to use this information.
+         * Any changes in the ring structure (e.g. added or removed nodes) will invalidate the splits; in such a case
+         * the memtable will be sent a shouldSwitch(OWNED_RANGES_CHANGE) and, should that return false, a
+         * localRangesChanged() call.


Suggested change

* localRangesChanged() call.

* {@link #localRangesUpdated()} call.

adelapena · 2022-04-05T16:13:30Z

src/java/org/apache/cassandra/db/ColumnFamilyStore.java

+            shardBoundaries = new ShardBoundaries(boundaries.subList(0, boundaries.size() - 1),
+                                                  versionedLocalRanges.ringVersion);
+            cachedShardBoundaries = shardBoundaries;
+            logger.info("Memtable shard boundaries for {}.{}: {}", keyspace.getName(), getTableName(), boundaries);


Should we use DEBUG level? IIUC DiskBoundariesManager uses DEBUG level for similar messages.

adelapena · 2022-04-06T13:33:15Z

src/java/org/apache/cassandra/db/rows/UnfilteredSource.java

+     * @param reversed true if the content should be returned in reverse order
+     * @param listener a listener used to handle internal read events
+     */
+    UnfilteredRowIterator iterator(DecoratedKey key,


Nit: I think that the two UnfilteredSource#iterator methods would be better named UnfilteredSource#rowIterator, complementing UnfilteredSource#partitionIterator. That would also apply to SSTableReader#iterator(FileDataInput, DecoratedKey, RowIndexEntry, Slices, ColumnFilter, boolean).

adelapena · 2022-04-06T13:34:10Z

src/java/org/apache/cassandra/db/memtable/ShardBoundaries.java

+ * In practice, each keyspace has its associated boundaries, see {@link Keyspace}.
+ * <p>
+ * Technically, if we use {@code n} shards, this is a list of {@code n-1} tokens and each token {@code tk} gets assigned
+ * to the core ID corresponding to the slot of the smallest token in the list that is greater to {@code tk}, or {@code n}


Suggested change

* to the core ID corresponding to the slot of the smallest token in the list that is greater to {@code tk}, or {@code n}

* to the shard id corresponding to the slot of the smallest token in the list that is greater to {@code tk}, or {@code n}

adelapena · 2022-04-06T14:15:02Z

src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java

+        public long estimateSize(SSTableWriter.SSTableSizeParameters parameters)
+        {
+            return (long) ((parameters.partitionCount() // index entries
+                            + parameters.partitionCount() // keys in data file


Shouldn't this be parameters.partitionKeySize(), which isn't called anywhere? Also, I would call that method partitionKeysSize, in plural, so one can't think that this somehow refers to the size of some single partition.

Changed both calls. Also changed partitionCount to be collected during the key size collection pass to avoid potentially walking the flush set again to collect it.

maedhroz · 2022-04-08T20:33:40Z

test/conf/cassandra.yaml

+    test_invalid_factory_method:
+        class: org.apache.cassandra.cql3.validation.operations.CreateTest$InvalidMemtableFactoryMethod
+    test_invalid_factory_field:
+        class: org.apache.cassandra.cql3.validation.operations.CreateTest$InvalidMemtableFactoryField


With CASSANDRA-17292 on the horizon, it might be a good time to think about using a structure for memtable configuration that would be compatible w/ the main proposal there: maedhroz@450b920

Given we're moving toward implementation-specific configuration, we could even replace the existing ungrouped memtable YAML items with a new top-level memtable element. (There are a few ways to handle compatibility, including just using the items under the new memtable element if one is actually specified, pulling values from the old top-level options if they aren't specified in the new format, etc.)

ex.

memtable: configuration: skiplist configurations: skiplist: class: SkipListMemtable trie: class: TrieMemtable shards: 16

I wanted to avoid having an entry for selecting the default configuration, relying on modifying the "default" one instead (and if you want to copy a config for the default, you can do that by extending). Unless you have a strong feeling about this, I prefer to keep that for now.

The configuration is now changed to nested format, and I changed it to use ParameterizedClass-like format. One of the reasons for this was the YAML type reinterpretation, as I could not find a way to make snakeyaml read Map<String, Map<String, String>> correctly.

I will also refrain from moving the existing properties in this ticket, but that's something that does have to be done eventually.

Looks good, thanks!

Just a couple minor things to cleanup, like CLASS_OPTION and EXTENDS_OPTION become unused in MemtableParams.

Removed the two properties and added format documentation.

maedhroz · 2022-04-08T20:55:15Z

src/java/org/apache/cassandra/db/ColumnFamilyStore.java

+        REPAIR,
+        SCHEMA_CHANGE,
+        OWNED_RANGES_CHANGE,
+        UNIT_TESTS; // explicitly requested flush needed for a test


Suggested change

UNIT_TESTS; // explicitly requested flush needed for a test

UNIT_TESTS // explicitly requested flush needed for a test

maedhroz · 2022-04-08T21:46:37Z

test/microbench/org/apache/cassandra/test/microbench/CacheLoaderBench.java

@@ -97,7 +97,7 @@ public void setup() throws Throwable
                RowUpdateBuilder rowBuilder = new RowUpdateBuilder(cfs.metadata(), System.currentTimeMillis() + random.nextInt(), "key");
                rowBuilder.add(colDef, "val1");
                rowBuilder.build().apply();
-                cfs.forceBlockingFlush();
+                cfs.forceBlockingFlush(ColumnFamilyStore.FlushReason.UNIT_TESTS);


Not super important, but I notice we're using UNIT_TESTS in some jmh tests and USER_FORCED in others?

I generally chose to go with USER_FORCED whenever it's not an actual unit test; this one is now corrected.

maedhroz · 2022-04-08T21:50:57Z

test/unit/org/apache/cassandra/cql3/KeyCacheCqlTest.java

@@ -31,6 +31,7 @@
 import org.apache.cassandra.cache.KeyCacheKey;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.ColumnFamilyStore;


nit: duplicate import

maedhroz · 2022-04-08T21:51:04Z

test/unit/org/apache/cassandra/cql3/KeyCacheCqlTest.java

@@ -31,6 +31,7 @@
 import org.apache.cassandra.cache.KeyCacheKey;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.index.Index;
 import org.apache.cassandra.io.sstable.format.SSTableFormat;


nit: unused import

maedhroz · 2022-04-08T21:59:57Z

test/unit/org/apache/cassandra/cql3/CQLTester.java

@@ -635,7 +635,7 @@ public void flush(String keyspace)
    {
        ColumnFamilyStore store = getCurrentColumnFamilyStore(keyspace);
        if (store != null)
-            store.forceBlockingFlush();
+            store.forceBlockingFlush(ColumnFamilyStore.FlushReason.UNIT_TESTS);


I guess we're going to have to change a ton of tests either way, so what do you think about adding a couple utility flush methods to CQLTester that hide the details of the flush reason?

ex.

List<Future<?>> flush(Keyspace keyspace) { return keyspace.flush(ColumnFamilyStore.FlushReason.UNIT_TESTS); }

...probably also one for forceBlockingFlush()

Added static methods in Util and changed all test to use them.

maedhroz · 2022-04-08T22:08:21Z

test/unit/org/apache/cassandra/db/commitlog/CommitLogTest.java

+    {
+        try
+        {
+            cfs.switchMemtableIfCurrent(current, ColumnFamilyStore.FlushReason.UNIT_TESTS).get();


Good catch!

I wonder if this is connected to any known test failure?

The commit log test is flaky, this may be part of the reason.

maedhroz · 2022-04-08T22:32:27Z

test/unit/org/apache/cassandra/db/commitlog/CommitLogTest.java

+        catch (InterruptedException|ExecutionException e)
+        {
+            throw Throwables.propagate(e);
+        }


Would it make sense to use FBUtilities.waitOnFuture() here?

Yes, changed.

maedhroz · 2022-04-08T22:42:21Z

src/java/org/apache/cassandra/db/ColumnFamilyStore.java


-        logger.info("Enqueuing flush of {}: {}",
+        logger.info("Enqueuing flush of {} ({}): {}",


1.) Would it be helpful to make this a fully qualified table name (i.e. throw the keyspace prefix on there)?

2.) nit, ignore if you don't like: Suggestion for format string:

Enqueueing flush of {}, Reason: {}, Usage: {}

maedhroz · 2022-04-09T03:07:59Z

src/java/org/apache/cassandra/db/repair/CassandraValidationIterator.java

@@ -195,12 +195,19 @@ public CassandraValidationIterator(ColumnFamilyStore cfs, Collection<Range<Token
            if (!isIncremental)
            {
                // flush first so everyone is validating data that is as similar as possible
-                StorageService.instance.forceKeyspaceFlush(cfs.keyspace.getName(), cfs.name);
+                cfs.forceBlockingFlush(ColumnFamilyStore.FlushReason.REPAIR);


It's hard for me to know whether it would actually be useful, but we could possibly make REPAIR more granular (ex. VALIDATION, ANTICOMPACTION, etc.)

Might be helpful in the logs; done.

maedhroz · 2022-04-09T03:10:46Z

src/java/org/apache/cassandra/index/SecondaryIndexManager.java

-import org.apache.cassandra.db.partitions.PartitionUpdate;
-import org.apache.cassandra.db.partitions.UnfilteredPartitionIterator;
+import org.apache.cassandra.db.memtable.Memtable;
+import org.apache.cassandra.db.partitions.*;


nit: Was the switch to a wildcard import intended?

No idea why Intellij decided to do that... fixed.

maedhroz · 2022-04-09T03:14:18Z

src/java/org/apache/cassandra/service/StorageService.java

@@ -4179,7 +4185,7 @@ public void forceKeyspaceFlush(String keyspaceName, String... tableNames) throws
        for (ColumnFamilyStore cfStore : getValidColumnFamilies(true, false, keyspaceName, tableNames))
        {
            logger.debug("Forcing flush on keyspace {}, CF {}", keyspaceName, cfStore.name);
-            cfStore.forceBlockingFlush();
+            cfStore.forceBlockingFlush(ColumnFamilyStore.FlushReason.USER_FORCED);


I think there are also some cases, like in RepairTest, where we call this but could make an argument the reason should be UNIT_TESTS.

Added a reason-specifying version and changed callsites.

maedhroz · 2022-04-09T03:19:57Z

src/java/org/apache/cassandra/db/ColumnFamilyStore.java

+        INTERNALLY_FORCED,  // explicitly requested flush, necessary for the operation of an internal table
+        USER_FORCED, // flush explicitly requested by the user (e.g. nodetool flush)
+        STARTUP,
+        SHUTDOWN,


nit: Would it make sense to use DRAIN here, given we can do that via nodetool?

maedhroz · 2022-04-11T19:33:21Z

test/microbench/org/apache/cassandra/test/microbench/instance/ReadTest.java

+                return performReadSerial(readStatement, supplier);
+            else
+                return performReadThreads(readStatement, supplier);
+        }


If we want to control read concurrency, would it be good enough to just delegate that to the @Threads annotation on the top-level class here?

I wanted to measure the minimum time per batch, which is different from what JMH threads can be used to measure (more suitable for max throughput).

maedhroz · 2022-04-11T19:36:54Z

test/microbench/org/apache/cassandra/test/microbench/instance/ReadTest.java

+        for (Future<Integer> f : futures)
+            done += f.get();
+        assert count == done;
+    }


Were writes just taking too long and making it hard to iterate quickly on optimizations? (AFAICT, the benchmark only explicitly measures reads...)

nit: There are some common bits of write logic between WriteTest and ReadTest, to the extent your might be able to factor out a "writer" class.

It was helpful to run just the read test and get a basic idea of the write performance too (and also data size).

Refactored the benchmarks to extract the shared code.

maedhroz · 2022-04-11T19:49:02Z

test/microbench/org/apache/cassandra/test/microbench/instance/WriteTest.java

+    }
+
+    @Benchmark
+    public void writeTable() throws Throwable


Is the idea here to do as many of these write/flush cycles as we can do in 1 second for each iteration? @Measurement seems to have support for batching, so just curious. Would it make sense to benchmark the writes and flushes separately?

Repeatedly running the same write(+truncate+flush), for at least one second per measurement iteration (if the count is the default 1m, it takes more than a second).
Separating the two is not that easy, but if we measure both TRUNCATE and FLUSH we can look at the difference.

maedhroz · 2022-04-11T20:13:08Z

test/unit/org/apache/cassandra/cql3/validation/operations/AlterTest.java

+                                  SchemaKeyspaceTables.TABLES),
+                           KEYSPACE,
+                           currentTable()),
+                   row("skiplist"));


Does the concept of a default exist mainly as a mechanism to have a new implementation picked up automatically on startup after simply changing the YAML at the node level, rather than across the whole cluster?

According to the CEP-19 discussion, users want to try out new settings a subset of nodes at a time, and prefer to not have settings that are not overridable per node.

This design addresses these concerns and addresses all scenarios I could think of: we can gradually switch all tables to a different implementation by changing the node's default; we can assign a specific implementation to some tables and, to analyze or work around problems, switch only some nodes to a different one; we can create targeted per-node settings in heterogeneous deployments.

maedhroz · 2022-04-11T20:16:44Z

test/conf/cassandra.yaml

+    skiplist:
+        extends: default
+        class: SkipListMemtable
+    skiplist_remapped:


Do we use skiplist_remapped anywhere?

Added inCreateTest.

maedhroz · 2022-04-11T20:33:55Z

test/unit/org/apache/cassandra/cql3/MemtableQuickTest.java

+import java.util.List;
+
+import com.google.common.collect.ImmutableList;
+import org.junit.Assert;


unused: import org.junit.Assert;

maedhroz · 2022-04-12T02:24:11Z

test/unit/org/apache/cassandra/db/memtable/TestMemtable.java

+    }
+
+    public static Memtable.Factory FACTORY =
+        (commitLogLowerBound, metadaRef, owner) -> new SkipListMemtable(commitLogLowerBound, metadaRef, owner);


Could also just be...

public static Memtable.Factory FACTORY = SkipListMemtable::new;

maedhroz · 2022-04-12T02:57:33Z

src/java/org/apache/cassandra/cql3/statements/schema/TableAttributes.java

+        if (hasOption(Option.MEMTABLE))
+            builder.memtable(MemtableParams.get(getString(Option.MEMTABLE)));
+        else
+            builder.memtable(MemtableParams.DEFAULT);


Could also be...

builder.memtable(MemtableParams.get(getSimple(Option.MEMTABLE.toString())));

...given how get() handles nulls.

Actually, the else part does not follow the pattern of the code. In theory one could call this twice and resetting to default if the memtable is not specified the second time would be unexpected. Changed.

maedhroz · 2022-04-12T03:04:13Z

src/java/org/apache/cassandra/db/memtable/Memtable.java

+            return memtable().metadata();
+        }
+
+        long partitionCount();


nit: Is this necessary, given we extend SSTableWriter.SSTableSizeParameters?

also make sure default memtable options are not stored in table config.

Table configuration can now only select a configuration defined in cassandra.yaml, to permit per-node configuration that cannot be overridden by table configuration. Also adds support for configurations to extend from others, to permit easy remapping of memtable configurations.

The move to ParameterizedClass is to avoid deviating from what is used elsewhere, but also to ensure parameters are correctly interpreted as strings.

This adds a "snapshot_commitlog_position" field to "commitlog_archiving.properties", which overrides the commit log intervals to be replayed. This should be used to specify the time that a persistent memtable snapshot was taken (or started) to correctly replay commit log segments. If the value is not specified, and a persistent memtable is in use, all present segments will be replayed to support a mode where older segments are deleted when a snapshot is created.

…hree

blambov force-pushed the CASSANDRA-17034 branch 2 times, most recently from e3e30f7 to 18909fa Compare October 29, 2021 14:00

blambov force-pushed the CASSANDRA-17034 branch from 18909fa to be837f0 Compare January 18, 2022 14:43

adelapena reviewed Apr 4, 2022

View reviewed changes

adelapena reviewed Apr 5, 2022

View reviewed changes

adelapena reviewed Apr 6, 2022

View reviewed changes

blambov force-pushed the CASSANDRA-17034 branch from be837f0 to 1122ddd Compare April 8, 2022 09:27

maedhroz reviewed Apr 8, 2022

View reviewed changes

maedhroz reviewed Apr 9, 2022

View reviewed changes

maedhroz reviewed Apr 11, 2022

View reviewed changes

maedhroz reviewed Apr 12, 2022

View reviewed changes

blambov added 28 commits April 29, 2022 11:51

Memtable templates in cassandra.yaml

1931aa3

also make sure default memtable options are not stored in table config.

Correct compilation error

8f0c394

First batch of review comments

a0ba85e

Second batch of review comments

1d72ab9

Change configuration to ParameterizedClass-based nested format

fd215cb

The move to ParameterizedClass is to avoid deviating from what is used elsewhere, but also to ensure parameters are correctly interpreted as strings.

Fix initialization, more review comments

2d364bd

Refactor benchmarks

bb5dd55

Use logger

c454beb

Add flushing methods in test Util and change tests to use them

5dd3422

Review bits

88d538f

More review bits

598fa13

Fix Util.flush

43a9d34

Fix cqlshlib test expectation

52b461c

Remove reason in dropMemtable

5235e14

Fix test failures around writing memtable config to schema

1b59570

Fixing test failures around writing memtable config to schema, take two

b1e540b

Fixing test failures around writing memtable config to schema, take t…

3831e84

…hree

Add documentation

4679ed8

Fix license check on doc

492a679

Fix configurations order

d7c4a15

Add sharded skip-list memtable

3618270

Doc corrections

ec5a98b

Review comments on ShardedSkipListMemtable

b6cfab8

Relax MemtableSizeTest for sharded skip list

9f835bb

Fix rebase

ec3ad4e

CircleCi:DO NOT MERGE

9627194

blambov force-pushed the CASSANDRA-17034 branch from 09f4e05 to 9627194 Compare April 29, 2022 08:52

smiklosovic closed this Apr 29, 2022

	fail("CDC cannot work if writes skip the commit log. Check your memtable configuration.");
	fail("CDC cannot work if memtable writes skip the commit log. Check your memtable configuration.");

	System.err.println("setupClass done.");
	System.out.println("setupClass done.");

		@@ -568,6 +567,105 @@ public void testDoubleWith() throws Throwable
		}
		}

	* localRangesChanged() call.
	* {@link #localRangesUpdated()} call.

	* to the core ID corresponding to the slot of the smallest token in the list that is greater to {@code tk}, or {@code n}
	* to the shard id corresponding to the slot of the smallest token in the list that is greater to {@code tk}, or {@code n}

	UNIT_TESTS; // explicitly requested flush needed for a test
	UNIT_TESTS // explicitly requested flush needed for a test


		logger.info("Enqueuing flush of {}: {}",
		logger.info("Enqueuing flush of {} ({}): {}",

CASSANDRA-17034: Memtable API #1295

CASSANDRA-17034: Memtable API #1295

Conversation

blambov commented Oct 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maedhroz Apr 8, 2022 • edited Loading

Choose a reason for hiding this comment

maedhroz Apr 8, 2022 •

edited

Loading