-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CASSANDRA-17819: Fix resetting the schema #1804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CASSANDRA-17819: Fix resetting the schema #1804
Conversation
f703959 to
3987d38
Compare
ekaterinadimitrova2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, primarily questions actually. Need to look at it more tomorrow. Seems like there are few things going on and I do not want to change any behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about some Javadoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add some logging in case no update has happened?
src/java/org/apache/cassandra/schema/DefaultSchemaUpdateHandler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this was switched to trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but actually I considered this message as temporary situation - when the messaging version is eventually known, the schema is either pulled or it is logged that the messaging version is incompatible (at debug level)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mentioned in the previous version but for completeness - I think this TO DO deserves to be documented actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but where do you want it to be documented? I think we should rather fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then in a follow up ticket in order to be fixed and add the number in this comment? WDYT?
test/distributed/org/apache/cassandra/distributed/test/SchemaTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you decide to move it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to think a bit about this TO DO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still trying to understand why it's needed, but for sure a quick attempt at removing it breaks the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same logic is in pullComplete.
Seems to me we markReceived before we deal with requestQueue in maybePullSchema, I am still not sure why though.... looking again into that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it hasn't necessarily marked are received, has it? It's only marked as received if it's equals to the local schema, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, reading back that is right but then it is confusing why in those cases we add to the requestQueue which seems to be a queue of endpoints from which we are going to fetch.... do we need still to fetch? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At that point we do not remove the endpoint from the queue, we seem to remove it next time we call maybePullSchema and we iterate the queue. Not sure why that approach was taken really and not just not adding it at all in case we markReceived... I will experiment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we add it to requestQueue because we might need to fetch it, unless it has been marked as received. I guess the simplified version would be adding the endpoint to the request queue only if hasn't been marked, something like:
VersionInfo info = versionInfo.computeIfAbsent(version, VersionInfo::new);
info.endpoints.add(endpoint);
logger.trace("Added endpoint {} to schema {}: {}", endpoint, info.version, info);
if (Objects.equals(schemaVersion.get(), version))
{
info.markReceived();
logger.trace("Schema {} from {} has been marked as received because it is equal the local schema", version, endpoint);
}
else
{
info.requestQueue.addFirst(endpoint);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was my point but I have to test if it can have some other side effects that we might miss...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least the MigrationCoordinatorTest and SchemaTest (both in-jvm and unit tests) pass for me locally with this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly, that was what I've commented - "...given we've just marked this schema version as received..." :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: endpoints
3987d38 to
6761ff6
Compare
ekaterinadimitrova2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some small comments, I want to do one more pass and check further the MigrationCoordinatorTest as I just skimmed through it. I want to be sure I didn't miss anything. I also want to look further in 4.0. Not an area of the code that I have a lot of experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make it print the SCHEMA_PULL_INTERVAL as it is mutable, not always 1 minute necessarily anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immediately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one too
Not that I am super opinionated but it just feels like it can improve readability that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be split in a few lines as it becomes too long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will, but with the current formatting rules I can see no way to format it nicely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line can also be split in a few lines so we don't have to scroll right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, the formatting rules does not allow for make it nicely to my knowledge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line also seems too long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a brief JavaDoc line saying what the property does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should include the unit in the name, like in SCHEMA_PULL_INTERVAL_MS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } catch (RuntimeException ex) { | |
| } | |
| catch (RuntimeException ex) | |
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unused import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * As long as the certain version is advertised by some node, it is being tracked. As long as a version is tracked, | |
| * As long as a certain version is advertised by some node, it is being tracked. As long as a version is tracked, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be waitQueueSize?
| ", waitQueue=" + waitQueue.getWaiting() + | |
| ", waitQueueSize=" + waitQueue.getWaiting() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!clearCompletion.await(StorageService.SCHEMA_DELAY_MILLIS, TimeUnit.MILLISECONDS)) { | |
| if (!clearCompletion.await(StorageService.SCHEMA_DELAY_MILLIS, TimeUnit.MILLISECONDS)) | |
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should either add some description to the @return tag or remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, and btw. I suppose the more adequate names for those methods now would be reset -> sync, clear -> refresh
test/distributed/org/apache/cassandra/distributed/test/SchemaTest.java
Outdated
Show resolved
Hide resolved
|
Thank you for thorough review, I've applied your suggestions |
061eb71 to
214ded9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: syncs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the names here probably tbl_one or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably change this name to checkTablesPropagated or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a definition
Patch by Jacek Lewandowski, reviewed by Andrés de la Peña and Ekaterina Dimitrova for CASSANDRA-17819
a8842d6 to
228bf5b
Compare
|
Merged to cassandra-4.1 in d8bbeb9 |
No description provided.