[CASSANDRA-15355] Schema push/pull race on continuous schema changes #364
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In https://issues.apache.org/jira/browse/CASSANDRA-5025, pull based schema updates were scheduled 1 minute after the schema change was first visible, so as to prefer the push codepath as much as possible.
Unfortunately, this does not handle the case where there are many schema changes happening - imagine a scenario where we create a table every 5 seconds for 2 minutes - the first update tasks execute 60 seconds in and the schemas may well be out of sync between nodes at that point.
In this case, there is some chance that when the task runs, the schemas are out of sync because a subsequent schema update has occurred, and so the same push/pull race has happened.
A fix is to modify the codepath such that the scheduled task is only run if the other node's schema version is the same as when the task was scheduled. A different (later scheduled) task should run otherwise.
For us, what we see is that when we have a reasonably large number of changes, a few schema changes can have the unfortunate outcome of causing our nodes to run out of memory and crash. This change stops that.