Skip to content

Fix cluster index and process cleanup#5986

Open
nickva wants to merge 1 commit intomainfrom
fix-clustered-index-cleanup
Open

Fix cluster index and process cleanup#5986
nickva wants to merge 1 commit intomainfrom
fix-clustered-index-cleanup

Conversation

@nickva
Copy link
Copy Markdown
Contributor

@nickva nickva commented Apr 28, 2026

Previously, as described in #5980 we didn't perform a thorough index cleanup when ddocs changed. We only cleaned up on nodes where the design docs were located. That was true for a n=3 cluster and an n=3 db but may not be true in general in a cluster.

To fix the issue, run a small gen_server responsible for performing cluster index cleanup. To avoid spawning Q*N jobs, deduplicate the requests by delaying for up to 30 seconds per clustered db. For cleanup reuse and call the already existing fabric index file cleanup machinery. That accomplishes two things:

  • Starts a quicker index file cleanup. Previously we only did this during smoosh compaction runs. The view files could linger for a while until compaction in smoosh would be triggered.

  • Cleaning search index files also stops indexes on their (Java) side, so index file clean-up does "double duty" so to speak, when it comes to index shut down for search indexes

Fix #5980

@willholley
Copy link
Copy Markdown
Member

Does this change mean that users who inadvertently trigger a full index rebuild by modifying a design document will have a much smaller window during which to "undo" the change (revert to the previous ddoc content and pick up those index files in disk again)? We've definitely used the delay in index cleanup to recover from that kind of situation, particularly when an index takes hours/days to rebuild.

@nickva
Copy link
Copy Markdown
Contributor Author

nickva commented Apr 28, 2026

@willholley yeah this would shorten that time. However, when compaction would run is unpredictable in general, so we shouldn't rely on that feature. Previously, a design doc update could "kick" smoosh which could start an immediate cleanup and delete the files. Also, the configure file deletion options still applies (before and with this change) so if that's setup, the view files can still be recoverable with some file moves and copies perhaps.

@willholley
Copy link
Copy Markdown
Member

@nickva yep - not a blocker from me, just an operational change we'll need to be aware of.

@janl
Copy link
Copy Markdown
Member

janl commented Apr 28, 2026

revert to the previous ddoc content and pick up those index files in disk again

it feels like we should make this an explicit feature rather than making an incident of the implementation. It might be too big a scope for this PR, but I think we should consider it going forward.

Something along the lines of renaming the file to include the deleted timestamp in the filename so we can purge it later (or rely on last modified times, if they are reliable), and scoop these files up before starting a fresh index build

Comment thread src/couch_index/src/couch_index_cleanup.erl
@nickva
Copy link
Copy Markdown
Contributor Author

nickva commented Apr 28, 2026

revert to the previous ddoc content and pick up those index files in disk again

it feels like we should make this an explicit feature rather than making an incident of the implementation. It might be too big a scope for this PR, but I think we should consider it going forward.

Something along the lines of renaming the file to include the deleted timestamp in the filename so we can purge it later (or rely on last modified times, if they are reliable), and scoop these files up before starting a fresh index build

Oh that could be interesting. If we open an index we could look through the deleted views files to see if we have something recent (time and/or update seq) we can re-create from

@nickva nickva force-pushed the fix-clustered-index-cleanup branch from c670a11 to 5768a2b Compare April 29, 2026 21:23
Previously, as described in #5980 we didn't perform a thorough index cleanup
when ddocs changed. We only cleaned up on nodes where the design docs were
located. That was true for a n=3 db and an n=3, db but may not be true in
general in a cluster.

To fix the issue, run a small gen_server responsible performing cluster
index cleanup. To avoid spawning Q*N jobs, deduplicate the requests by delaying
for up to 30 seconds per clustered db. For cleanup reuse and call the already
existing fabric index file cleanup machinery. That accomplishes two things:

 - Starts a quicker index file cleanup. Previously we only did this during
   smoosh compaction runs. The view files could linger for a while until
   compaction in smoosh would be triggered.

 - Cleaning search index files also stops indexes on their (Java) side, so
   index file clean-up does "double duty" so speak when it comes to index shut
   down.

Fix #5980
@nickva nickva force-pushed the fix-clustered-index-cleanup branch from 5768a2b to 578d1b5 Compare April 30, 2026 19:11
@nickva nickva requested a review from rnewson May 1, 2026 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Index processes are not shutdown on every node when ddoc changes

3 participants