CASSANDRA-16896: Add soft/hard limits to local reads to protect against reading too much data in a single query #1180

dcapwell · 2021-08-31T17:38:29Z

No description provided.

conf/cassandra.yaml

src/java/org/apache/cassandra/config/Config.java

...ed/org/apache/cassandra/distributed/test/trackwarnings/RowIndexEntryTooLargeWarningTest.java

src/java/org/apache/cassandra/utils/ObjectSizes.java

maedhroz · 2021-09-02T16:44:22Z

src/java/org/apache/cassandra/utils/ObjectSizes.java

@@ -32,7 +32,6 @@
 public class ObjectSizes
 {
    private static final MemoryMeter meter = new MemoryMeter()
-                                             .omitSharedBufferOverhead()


Looking through the usages of measureDeep(), it doesn't seem like any of those cases involve wrapping existing BBs, so this configuration option was indeed not necessary, but it would be good to get confirmation of that...

test/unit/org/apache/cassandra/utils/ObjectSizesTest.java

src/java/org/apache/cassandra/metrics/KeyspaceMetrics.java

src/java/org/apache/cassandra/exceptions/TombstoneAbortException.java

src/java/org/apache/cassandra/service/reads/ReadCallback.java

src/java/org/apache/cassandra/io/sstable/IndexInfo.java

maedhroz · 2021-09-02T20:21:45Z

src/java/org/apache/cassandra/db/rows/BTreeRow.java

+        long heapSize = EMPTY_SIZE
+                        + clustering.unsharedHeapSize()
+                        + primaryKeyLivenessInfo.unsharedHeapSize()
+                        + deletion.unsharedHeapSize()


Aside: Part of me wonders if having something like an Accountable interface for unsharedHeapSize() would make it possible to factor out some utilities for adding up the sizes of "Accountables". No action required, but I remember this being an interesting concept when I worked on Lucene in a past life.

EDIT: ...and I just realized we have IMeasurableMemory :)

no interface exists for unsharedHeapSizeExceptData though!

src/java/org/apache/cassandra/cql3/statements/SelectStatement.java

src/java/org/apache/cassandra/service/StorageService.java

src/java/org/apache/cassandra/service/reads/ReadCallback.java

maedhroz · 2021-09-03T17:10:41Z

src/java/org/apache/cassandra/db/ReadCommand.java

@@ -86,6 +88,10 @@
    protected static final Logger logger = LoggerFactory.getLogger(ReadCommand.class);
    public static final IVersionedSerializer<ReadCommand> serializer = new Serializer();

+    // Expose the active command running so transitive calls can lookup this command.
+    // This is useful for a few reasons, but mainly because the CQL query is here.
+    private static final FastThreadLocal<ReadCommand> COMMAND = new FastThreadLocal<>();


My initial reaction to seeing this is to push to avoid the thread-local. It seems like there are two pieces of information this transmits: the metadata and CQL. If we're willing to downgrade to knowledge of the table and key (which I'd say might be good enough for a warning message) we can just pass both of those directly to IndexSerializer#deserialize(). Doing that would also implicitly eliminate the problem of constantly setting and removing the current ReadCommand for local reads that don't care.

IndexSerializer isn't really an external interface, so I don't think those changes would be disruptive.

passing the information through is a huge change as it needs to be passed all over the place. We also need to handle the cases called outside of a ReadCommand....

maedhroz · 2021-09-03T17:17:56Z

src/java/org/apache/cassandra/db/ReadCommand.java

+            protected void onClose()
+            {
+                ColumnFamilyStore cfs = Schema.instance.getColumnFamilyStoreInstance(metadata().id);
+                if (cfs != null)


nit: Is there a case where the cfs should be null?

two cases

unit tests; I think this is why I handled this in the other code path which update metrics

race condition with drop table

its mostly to be defensive as null is a valid response =)

It seems like a case where a "no-spammed" WARN level message might be useful, but I won't push too hard for it. (ex. Someone is still issuing queries against a table that's being dropped. It's not clearly wrong, but it isn't something you'd want to be doing either...)

I think we have issues with dropping tables concurrently with queries, so you will hit an issue anyways. I wouldn't want to handle here as this is a systemic issue, and these checks are opt-in so can't rely on those log checks

src/java/org/apache/cassandra/config/TrackWarnings.java

src/java/org/apache/cassandra/db/ReadCommand.java

maedhroz · 2021-09-03T17:46:52Z

src/java/org/apache/cassandra/db/RowIndexEntry.java

+            else if (warnThreshold != 0 && estimatedMemory > warnThreshold)
+            {
+                // use addIfLarger rather than add as a previous partition may be larger than this one
+                MessageParams.addIfLarger(ParamType.ROW_INDEX_ENTRY_TOO_LARGE_WARNING, estimatedMemory);


This is sort of along the same lines as https://github.com/apache/cassandra/pull/1180/files#r702060997, but I want to make sure there aren't any edge cases (ex. local read and short-read protection read on different threads) where operations against the local node/replica will accumulate information on different thread-local map in MessageParams. (I think requests to remote replicas should be safe, as they'll accumulate these and serialize a response on the same thread.)

your link just loads all files, so not sure if you were pointing to a line.

where operations against the local node/replica will accumulate information on different thread-local map in MessageParams

🤷 This feature is blocked by a thread containing a ReadCommand, so it has to have a ReadCommand (aka this thread is doing a read). If we hit a point where we are not single-threaded then I don't think this will work correctly, and the only data collected will be the one on the thread running the ReadCommand

src/java/org/apache/cassandra/db/RowIndexEntry.java

src/java/org/apache/cassandra/db/ReadCommand.java

conf/cassandra.yaml

…d-repaired

dcapwell · 2021-09-17T21:40:33Z

@krummas thanks for the review; updated based off your recommendation. I made one change between your patch and my version; I kept the context lazy to only the cases where this feature is present, this is done by checking if WarningContext.isSupported(params.keySet()) is true, if so then it loads and updates.

maedhroz · 2021-09-21T17:49:58Z

src/java/org/apache/cassandra/exceptions/TombstoneAbortException.java

@@ -23,14 +23,14 @@
 import org.apache.cassandra.db.ConsistencyLevel;
 import org.apache.cassandra.locator.InetAddressAndPort;

-import static org.apache.cassandra.service.reads.ReadCallback.tombstoneAbortMessage;
+import static org.apache.cassandra.service.reads.trackwarnings.WarningsSnapshot.tombstoneAbortMessage;


nit: alternate package naming

Suggested change

import static org.apache.cassandra.service.reads.trackwarnings.WarningsSnapshot.tombstoneAbortMessage;

import static org.apache.cassandra.service.reads.warnings.WarningsSnapshot.tombstoneAbortMessage;

maedhroz reviewed Sep 1, 2021

View reviewed changes

conf/cassandra.yaml Outdated Show resolved Hide resolved

maedhroz reviewed Sep 1, 2021

View reviewed changes

src/java/org/apache/cassandra/config/Config.java Outdated Show resolved Hide resolved

maedhroz reviewed Sep 1, 2021

View reviewed changes

...ed/org/apache/cassandra/distributed/test/trackwarnings/RowIndexEntryTooLargeWarningTest.java Outdated Show resolved Hide resolved