-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CASSANDRA-18732 Baseline Diagnostic vtables for Accord #3305
Conversation
src/java/org/apache/cassandra/db/virtual/AccordVirtualTables.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/virtual/AccordVirtualTables.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the migration states table, one of the reasons for using the list/map collections is also better documentation. text
is flexible for us but once people start parsing (tools) then we can't actually change things... so we are stuck with the format...
src/java/org/apache/cassandra/db/virtual/AccordVirtualTables.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/db/virtual/AccordVirtualTables.java
Outdated
Show resolved
Hide resolved
{ | ||
private MigrationStates(String keyspace) | ||
{ | ||
super(parse(keyspace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do I know the migration is done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a range move from "migrating" -> "migrated"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know it's done when migrated_ranges
is the whole ring and should have a single entry. Right now once a migration state has been created for a table it never goes away.
{ | ||
private CommandStoreCache(String keyspace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be useful to include instance as well, either as its own table or a column here... having a low hit rate doesn't let you know what's going on, need to see which instances are contributing to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there even any reason to populate this w/ the global Stats
? Would it make more sense to expose the Stats
from commandCache
, timestampsForKeyCache
, and commandsForKeyCache
by store ID instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the global one is def nice to have, but if you are trying to debug an issue you kinda need to know which "instance" is having the issues... if command cache is 99% cache hit and the global miss rate is 50%, you know something is going on with the CFK caches (but which one?!?!)
The system_metrics
vtables also expose the JMX metrics, which is the JVM level (includes each store)
TL;DR - I personally feel that cmd/cfk cache level is more useful than global and a global is nice but not required (slight duplicate with JMX table)
{ | ||
private MigrationStates(String keyspace) | ||
{ | ||
super(parse(keyspace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -696,6 +705,27 @@ private static void assertMigrationState(String tableName, ConsensusMigrationTar | |||
}); | |||
} | |||
|
|||
private static void assertVtableMigrationState(String tableName, List<Range<Token>> migratedRanges, List<Range<Token>> migratingRanges) throws JsonProcessingException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: need to get rid of JsonProcessingException
assertTrue(vtableResult.hasNext()); | ||
Row tableState = vtableResult.next(); | ||
List<String> vtableMigratedRanges = tableState.getList("migrated_ranges"); | ||
assertEquals(migratedRanges, vtableMigratedRanges.stream().map(Range::fromString).collect(Collectors.toList())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aweisberg It looks like this assertion is a bit flakey. I haven't been able to reproduce locally, but a couple times in CI...
junit.framework.AssertionFailedError: expected:<[]> but was:<[(-1,9223372036854775807]]>
at org.apache.cassandra.distributed.test.accord.AccordMigrationTest.assertVtableMigrationState(AccordMigrationTest.java:715)
at org.apache.cassandra.distributed.test.accord.AccordMigrationTest.assertMigrationState(AccordMigrationTest.java:633)
at org.apache.cassandra.distributed.test.accord.AccordMigrationTest.lambda$testPaxosToAccordCAS$13(AccordMigrationTest.java:404)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how that range has moved to "migrated", unless I'm looking at the wrong keyspace/table...or there was something I'm waiting for that I shouldn't (as this looks a lot like I'm getting a later state than expected)
List<AccordStateCache.ImmutableStats> commandStats = | ||
AsyncChains.getBlockingAndRethrow(stores.map(store -> ((AccordCommandStore) store.commandStore()).commandCache().statsSnapshot())); | ||
for (int storeID : stores.ids()) | ||
addRow(commandStats.get(storeID), result, storeID, "command"); | ||
|
||
List<AccordStateCache.ImmutableStats> commandsForKeyStats = | ||
AsyncChains.getBlockingAndRethrow(stores.map(store -> ((AccordCommandStore) store.commandStore()).commandsForKeyCache().statsSnapshot())); | ||
for (int storeID : stores.ids()) | ||
addRow(commandsForKeyStats.get(storeID), result, storeID, "commands_for_key"); | ||
|
||
List<AccordStateCache.ImmutableStats> timestampsForKeyStats = | ||
AsyncChains.getBlockingAndRethrow(stores.map(store -> ((AccordCommandStore) store.commandStore()).timestampsForKeyCache().statsSnapshot())); | ||
for (int storeID : stores.ids()) | ||
addRow(timestampsForKeyStats.get(storeID), result, storeID, "timestamps_for_key"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why schedule on the same stores 3 times when you could do this once? this would be 3x faster to do
patch by Caleb Rackliffe; reviewed by David Capwell and Ariel Weisberg for CASSANDRA-18732
Accord PR: apache/cassandra-accord#89