Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CASSANDRA-15355] Schema push/pull race on continuous schema changes #364

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

j-baker
Copy link

@j-baker j-baker commented Oct 14, 2019

In https://issues.apache.org/jira/browse/CASSANDRA-5025, pull based schema updates were scheduled 1 minute after the schema change was first visible, so as to prefer the push codepath as much as possible.

Unfortunately, this does not handle the case where there are many schema changes happening - imagine a scenario where we create a table every 5 seconds for 2 minutes - the first update tasks execute 60 seconds in and the schemas may well be out of sync between nodes at that point.

In this case, there is some chance that when the task runs, the schemas are out of sync because a subsequent schema update has occurred, and so the same push/pull race has happened.

A fix is to modify the codepath such that the scheduled task is only run if the other node's schema version is the same as when the task was scheduled. A different (later scheduled) task should run otherwise.

For us, what we see is that when we have a reasonably large number of changes, a few schema changes can have the unfortunate outcome of causing our nodes to run out of memory and crash. This change stops that.

In https://issues.apache.org/jira/browse/CASSANDRA-5025, pull based schema updates were scheduled 1 minute after the schema change was first visible, so as to prefer the push codepath as much as possible.

Unfortunately, this does not handle the case where there are many schema changes happening - imagine a scenario where we create a table every 5 seconds for 2 minutes - the first update tasks execute 60 seconds in and the schemas may well be out of sync between nodes at that point.

In this case, there is some chance that when the task runs, the schemas are out of sync because a subsequent schema update has occurred, and so the same push/pull race has happened.

A fix is to modify the codepath such that the scheduled task is only run if the other node's schema version is the same as when the task was scheduled. A different (later scheduled) task should run otherwise.

For us, what we see is that when we have a reasonably large number of changes, a few schema changes can have the unfortunate outcome of causing our nodes to run out of memory and crash. This change stops that.
blambov pushed a commit to blambov/cassandra that referenced this pull request Mar 21, 2022
Port ReadCoordinationMetrics to add metrics that count when the coordinator is either not a replica or a preferred replica for a read request.
blambov pushed a commit to blambov/cassandra that referenced this pull request Jun 13, 2022
Port ReadCoordinationMetrics to add metrics that count when the coordinator is either not a replica or a preferred replica for a read request.

(cherry picked from commit 3967db9)
blambov pushed a commit to blambov/cassandra that referenced this pull request Nov 24, 2022
Port ReadCoordinationMetrics to add metrics that count when the coordinator is either not a replica or a preferred replica for a read request.

(cherry picked from commit 3967db9)
(cherry picked from commit 9532979)
adelapena pushed a commit to adelapena/cassandra that referenced this pull request Sep 26, 2023
Port ReadCoordinationMetrics to add metrics that count when the coordinator is either not a replica or a preferred replica for a read request.

(cherry picked from commit 3967db9)
(cherry picked from commit 9532979)
(cherry picked from commit cd56ad2)
(cherry picked from commit 926c403)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant