Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSANDRA-16896: Add soft/hard limits to local reads to protect against reading too much data in a single query #1180

Closed
wants to merge 72 commits into from

Conversation

dcapwell
Copy link
Contributor

No description provided.

conf/cassandra.yaml Outdated Show resolved Hide resolved
@@ -32,7 +32,6 @@
public class ObjectSizes
{
private static final MemoryMeter meter = new MemoryMeter()
.omitSharedBufferOverhead()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the usages of measureDeep(), it doesn't seem like any of those cases involve wrapping existing BBs, so this configuration option was indeed not necessary, but it would be good to get confirmation of that...

long heapSize = EMPTY_SIZE
+ clustering.unsharedHeapSize()
+ primaryKeyLivenessInfo.unsharedHeapSize()
+ deletion.unsharedHeapSize()
Copy link
Contributor

@maedhroz maedhroz Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: Part of me wonders if having something like an Accountable interface for unsharedHeapSize() would make it possible to factor out some utilities for adding up the sizes of "Accountables". No action required, but I remember this being an interesting concept when I worked on Lucene in a past life.

EDIT: ...and I just realized we have IMeasurableMemory :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^_^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no interface exists for unsharedHeapSizeExceptData though!

@@ -86,6 +88,10 @@
protected static final Logger logger = LoggerFactory.getLogger(ReadCommand.class);
public static final IVersionedSerializer<ReadCommand> serializer = new Serializer();

// Expose the active command running so transitive calls can lookup this command.
// This is useful for a few reasons, but mainly because the CQL query is here.
private static final FastThreadLocal<ReadCommand> COMMAND = new FastThreadLocal<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial reaction to seeing this is to push to avoid the thread-local. It seems like there are two pieces of information this transmits: the metadata and CQL. If we're willing to downgrade to knowledge of the table and key (which I'd say might be good enough for a warning message) we can just pass both of those directly to IndexSerializer#deserialize(). Doing that would also implicitly eliminate the problem of constantly setting and removing the current ReadCommand for local reads that don't care.

IndexSerializer isn't really an external interface, so I don't think those changes would be disruptive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing the information through is a huge change as it needs to be passed all over the place. We also need to handle the cases called outside of a ReadCommand....

protected void onClose()
{
ColumnFamilyStore cfs = Schema.instance.getColumnFamilyStoreInstance(metadata().id);
if (cfs != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a case where the cfs should be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two cases

  1. unit tests; I think this is why I handled this in the other code path which update metrics
  2. race condition with drop table

its mostly to be defensive as null is a valid response =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a case where a "no-spammed" WARN level message might be useful, but I won't push too hard for it. (ex. Someone is still issuing queries against a table that's being dropped. It's not clearly wrong, but it isn't something you'd want to be doing either...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have issues with dropping tables concurrently with queries, so you will hit an issue anyways. I wouldn't want to handle here as this is a systemic issue, and these checks are opt-in so can't rely on those log checks

else if (warnThreshold != 0 && estimatedMemory > warnThreshold)
{
// use addIfLarger rather than add as a previous partition may be larger than this one
MessageParams.addIfLarger(ParamType.ROW_INDEX_ENTRY_TOO_LARGE_WARNING, estimatedMemory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of along the same lines as https://github.com/apache/cassandra/pull/1180/files#r702060997, but I want to make sure there aren't any edge cases (ex. local read and short-read protection read on different threads) where operations against the local node/replica will accumulate information on different thread-local map in MessageParams. (I think requests to remote replicas should be safe, as they'll accumulate these and serialize a response on the same thread.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your link just loads all files, so not sure if you were pointing to a line.

where operations against the local node/replica will accumulate information on different thread-local map in MessageParams

🤷 This feature is blocked by a thread containing a ReadCommand, so it has to have a ReadCommand (aka this thread is doing a read). If we hit a point where we are not single-threaded then I don't think this will work correctly, and the only data collected will be the one on the thread running the ReadCommand

conf/cassandra.yaml Outdated Show resolved Hide resolved
@dcapwell
Copy link
Contributor Author

@krummas thanks for the review; updated based off your recommendation. I made one change between your patch and my version; I kept the context lazy to only the cases where this feature is present, this is done by checking if WarningContext.isSupported(params.keySet()) is true, if so then it loads and updates.

@@ -23,14 +23,14 @@
import org.apache.cassandra.db.ConsistencyLevel;
import org.apache.cassandra.locator.InetAddressAndPort;

import static org.apache.cassandra.service.reads.ReadCallback.tombstoneAbortMessage;
import static org.apache.cassandra.service.reads.trackwarnings.WarningsSnapshot.tombstoneAbortMessage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: alternate package naming

Suggested change
import static org.apache.cassandra.service.reads.trackwarnings.WarningsSnapshot.tombstoneAbortMessage;
import static org.apache.cassandra.service.reads.warnings.WarningsSnapshot.tombstoneAbortMessage;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants