New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CASSANDRA-16896: Add soft/hard limits to local reads to protect against reading too much data in a single query #1180
Conversation
...ed/org/apache/cassandra/distributed/test/trackwarnings/RowIndexEntryTooLargeWarningTest.java
Outdated
Show resolved
Hide resolved
@@ -32,7 +32,6 @@ | |||
public class ObjectSizes | |||
{ | |||
private static final MemoryMeter meter = new MemoryMeter() | |||
.omitSharedBufferOverhead() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the usages of measureDeep()
, it doesn't seem like any of those cases involve wrapping existing BBs, so this configuration option was indeed not necessary, but it would be good to get confirmation of that...
src/java/org/apache/cassandra/exceptions/TombstoneAbortException.java
Outdated
Show resolved
Hide resolved
9ae7bc6
to
69b1f4d
Compare
long heapSize = EMPTY_SIZE | ||
+ clustering.unsharedHeapSize() | ||
+ primaryKeyLivenessInfo.unsharedHeapSize() | ||
+ deletion.unsharedHeapSize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: Part of me wonders if having something like an Accountable
interface for unsharedHeapSize()
would make it possible to factor out some utilities for adding up the sizes of "Accountables". No action required, but I remember this being an interesting concept when I worked on Lucene in a past life.
EDIT: ...and I just realized we have IMeasurableMemory
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^_^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no interface exists for unsharedHeapSizeExceptData
though!
src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
Outdated
Show resolved
Hide resolved
@@ -86,6 +88,10 @@ | |||
protected static final Logger logger = LoggerFactory.getLogger(ReadCommand.class); | |||
public static final IVersionedSerializer<ReadCommand> serializer = new Serializer(); | |||
|
|||
// Expose the active command running so transitive calls can lookup this command. | |||
// This is useful for a few reasons, but mainly because the CQL query is here. | |||
private static final FastThreadLocal<ReadCommand> COMMAND = new FastThreadLocal<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial reaction to seeing this is to push to avoid the thread-local. It seems like there are two pieces of information this transmits: the metadata and CQL. If we're willing to downgrade to knowledge of the table and key (which I'd say might be good enough for a warning message) we can just pass both of those directly to IndexSerializer#deserialize()
. Doing that would also implicitly eliminate the problem of constantly setting and removing the current ReadCommand
for local reads that don't care.
IndexSerializer
isn't really an external interface, so I don't think those changes would be disruptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing the information through is a huge change as it needs to be passed all over the place. We also need to handle the cases called outside of a ReadCommand....
protected void onClose() | ||
{ | ||
ColumnFamilyStore cfs = Schema.instance.getColumnFamilyStoreInstance(metadata().id); | ||
if (cfs != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is there a case where the cfs
should be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two cases
- unit tests; I think this is why I handled this in the other code path which update metrics
- race condition with drop table
its mostly to be defensive as null is a valid response =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a case where a "no-spammed" WARN level message might be useful, but I won't push too hard for it. (ex. Someone is still issuing queries against a table that's being dropped. It's not clearly wrong, but it isn't something you'd want to be doing either...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have issues with dropping tables concurrently with queries, so you will hit an issue anyways. I wouldn't want to handle here as this is a systemic issue, and these checks are opt-in so can't rely on those log checks
else if (warnThreshold != 0 && estimatedMemory > warnThreshold) | ||
{ | ||
// use addIfLarger rather than add as a previous partition may be larger than this one | ||
MessageParams.addIfLarger(ParamType.ROW_INDEX_ENTRY_TOO_LARGE_WARNING, estimatedMemory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of along the same lines as https://github.com/apache/cassandra/pull/1180/files#r702060997, but I want to make sure there aren't any edge cases (ex. local read and short-read protection read on different threads) where operations against the local node/replica will accumulate information on different thread-local map in MessageParams
. (I think requests to remote replicas should be safe, as they'll accumulate these and serialize a response on the same thread.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your link just loads all files, so not sure if you were pointing to a line.
where operations against the local node/replica will accumulate information on different thread-local map in MessageParams
🤷 This feature is blocked by a thread containing a ReadCommand, so it has to have a ReadCommand (aka this thread is doing a read). If we hit a point where we are not single-threaded then I don't think this will work correctly, and the only data collected will be the one on the thread running the ReadCommand
1135008
to
2a94614
Compare
@krummas thanks for the review; updated based off your recommendation. I made one change between your patch and my version; I kept the context lazy to only the cases where this feature is present, this is done by checking if |
@@ -23,14 +23,14 @@ | |||
import org.apache.cassandra.db.ConsistencyLevel; | |||
import org.apache.cassandra.locator.InetAddressAndPort; | |||
|
|||
import static org.apache.cassandra.service.reads.ReadCallback.tombstoneAbortMessage; | |||
import static org.apache.cassandra.service.reads.trackwarnings.WarningsSnapshot.tombstoneAbortMessage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: alternate package naming
import static org.apache.cassandra.service.reads.trackwarnings.WarningsSnapshot.tombstoneAbortMessage; | |
import static org.apache.cassandra.service.reads.warnings.WarningsSnapshot.tombstoneAbortMessage; |
No description provided.