-
Notifications
You must be signed in to change notification settings - Fork 768
SOLR-15399: IndexFetcher should not issue a local commit for PULL replicas #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this, Tim. LGTM. I'm guessing the side effects are that the PULL replica won't really clean up it's index (i.e. disk will still contain all index files) until a real-non-gen-zero replication happen from the leader, right? That index won't be searchable though. Even if there is a restart, since the PULL replica will go through the recovery process, the index on disk won't be searchable again IIUC. This should also solve SOLR-10751 and SOLR-12100
thanks for the review @tflobbe ... let me add a test case for the restart while the PULL replica is in this side-effect state, just so we can be sure it's ok and recovers properly |
solr/core/src/java/org/apache/solr/cloud/ReplicateFromLeader.java
Outdated
Show resolved
Hide resolved
… to verify recovery
Impressive work to track this down! |
Description
When doing the work for ae1ac22, I discovered that the
TestPullReplica
class was disabled via an AwaitsFix. I enabled it and it passed fairly consistently, but then saw flaky failures on Jenkins. This PR fixes those flaky tests ...IndexFetcher
was triggering a local commit for a PULL replica core (see snippet below) which caused a race condition where the PULL replica would think its index generation / version were the same as the leader (see log I posted in JIRA).From IndexFetcher, line 479:
(the else block hitting is what causes the tests to fail intermittently)
This would cause the PULL replica to not sync index updates with the leader in the tests (since they do it doc at a time).
Solution
I don't think PULL replicas should ever commit locally and should always get the index from the leader. So this PR ensures that
skipCommitOnLeaderVersionZero
istrue
for PULL replicas, which avoids this race condition.Tests
Removing
@BadApple
from several flaky tests that pass consistently now: 200/200 beasts (tests would fail within about 40 beast runs w/o this fix)Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.