Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental snapshot not working correctly after running forcemerge. #102395

Open
merajblueshift opened this issue Nov 21, 2023 · 2 comments
Open
Labels
>bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Meta label for distributed team

Comments

@merajblueshift
Copy link

merajblueshift commented Nov 21, 2023

Elasticsearch Version

Version: 8.7.1, Build: rpm/f229ed3f893a515d590d0f39b05f68913e2d9b53/2023-04-27T04:33:42.127815583Z, JVM: 20.0.1

Installed Plugins

discovery-ec2

Java Version

bundled

OS Version

Linux 6.1.59-84.139.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Oct 24 20:57:25 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

We had an index containing many docs.deleted. In order to clean up the index it was restored on a cluster where no read/writes were happening on the index. After that, we ran a _forcemerge with only_expunge_deletes as true on the index to optimize it. Before the force merge operation was started, SLM took one snapshot of the index to s3.
The force merge operation optimized the index and reduced the number of segments from 5k to 1.5k. However, when we took a fresh snapshot of the index and restored it, the index didn't match the state after the force merge. The restored index still had the state that was before the force merge operation. On checking the snapshot details we also observed that the incremental snapshot taken after the force merge operation finished in 800 ms despite the fact that all underlying segment files had changed.

Steps to Reproduce

  1. Create an index and add, and delete documents on it. Make sure the docs.deleted count on the index reaches a very high number.
  2. Stop all reads/writes to the index.
  3. Change the index.merge.policy.expunge_deletes_allowed setting to 1.
  4. Take a snapshot of the index to s3.
  5. Run a _forcemerge operation with only_expunge_deletes set to true.
  6. docs.deleted count and number of segments on the index should reduce after _forcemerge.
  7. Take a new snapshot of the index.
  8. Restore the index from the snapshot taken in step 7.
  9. The restored index still has the state of the index before _forcemerge operation.

Logs (if relevant)

No response

@merajblueshift merajblueshift added >bug needs:triage Requires assignment of a team area label labels Nov 21, 2023
@DaveCTurner
Copy link
Contributor

DaveCTurner commented Nov 21, 2023

The reason for this behaviour is that we skip shard snapshots where the shard contents have not changed meaningfully since the last snapshot, where "meaningfully" is defined here:

return userCommitData.get(Engine.HISTORY_UUID_KEY)
+ "-"
+ userCommitData.getOrDefault(Engine.FORCE_MERGE_UUID_KEY, "na")
+ "-"
+ maxSeqNo;

Note that this includes some consideration of force-merges, but that doesn't include the ?only_expunge_deletes case:

if (onlyExpungeDeletes) {
indexWriter.forceMergeDeletes(true /* blocks and waits for merges*/);
} else if (maxNumSegments <= 0) {
indexWriter.maybeMerge();
} else {
indexWriter.forceMerge(maxNumSegments, true /* blocks and waits for merges*/);
this.forceMergeUUID = forceMergeUUID;
}

As a workaround I suspect that another force-merge with ?max_num_segments=10000 would set the force-merge ID without actually merging anything, bypassing the no-op detection on the snapshot. But in the long run I do agree that we should probably consider an ?only_expunge_deletes force-merge as a meaningful change here too.

@DaveCTurner DaveCTurner reopened this Nov 21, 2023
@DaveCTurner DaveCTurner added :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed needs:triage Requires assignment of a team area label labels Nov 21, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Nov 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

3 participants