Skip to content

Conversation

@rahulgoswami
Copy link
Contributor

https://issues.apache.org/jira/browse/SOLR-17725

Description

Please provide a short description of the changes you're making with this pull request.

Solution

Please provide a short description of the approach taken to implement your solution.

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide
  • I have added a changelog entry for my change

@rahulgoswami rahulgoswami marked this pull request as draft November 20, 2025 08:42
@kotman12
Copy link
Contributor

Thank you for sharing this! 🥳

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no merge is found, perhaps it should seek a possible merge of the previous major version? Or just leave that as a future improvement possibility, I suppose.

I could see this being contributed to Lucene.

* Only allows latest version segments to be considered for merges. That way a snapshot of older
* segments can remain consistent
*/
public class LatestVersionFilterMergePolicy extends MergePolicy {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen FilterMergePolicy for delegation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion. Much better for delegation.

import org.apache.lucene.util.Version;

/**
* Only allows latest version segments to be considered for merges. That way a snapshot of older
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest this wording instead:

Prevents merging segments with differing Lucene major version number origins. This assists in upgrading to a future Lucene major version.

@Override
public MergeSpecification findMerges(
MergeTrigger mergeTrigger, SegmentInfos infos, MergeContext mergeContext) throws IOException {
/*we don't want to remove from the original SegmentInfos, else the segments may not carry forward upon a commit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could merely say that input SegmentInfos should be treated as immutable, and that modifying it is dangerous (ideally would be impossible). An FYI. Your text had me thinking for a moment that the concern was specific to this MP but it's a general statement across Lucene, I see. When I see other usages in Lucene, I don't see dire warnings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased to not sound alarming

Map<SegmentCommitInfo, Boolean> segmentsToMerge,
MergeContext mergeContext)
throws IOException {
return delegatePolicy.findForcedMerges(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the constraint isn't applied, but shouldn't it?

@Override
public MergeSpecification findForcedDeletesMerges(
SegmentInfos segmentInfos, MergeContext mergeContext) throws IOException {
return delegatePolicy.findForcedDeletesMerges(segmentInfos, mergeContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the constraint isn't applied, but shouldn't it?

@rahulgoswami
Copy link
Contributor Author

Appreciate the review @dsmiley ! Still working through some of the details of this draft and will revise soon. Thanks.

@rahulgoswami
Copy link
Contributor Author

rahulgoswami commented Nov 26, 2025

If no merge is found, perhaps it should seek a possible merge of the previous major version? Or just leave that as a future improvement possibility, I suppose.

Including previous major version could mean an older version segment could potentially merge with a current version segment. Which means when we upgrade to the next Lucene version, the index may not open, since the check in Lucene now checks for SegmentCommitInfo.minVersion (oldest of the different versions of segments merging to form a given segment) and that may now fall below the N-1 version compatibility.
Having said that, I have kept the allowSegmentForMerge() method 'protected' to allow for a different predicate if a class wants to extend this.

I could see this being contributed to Lucene.

Makes sense. I'll try asking on the dev list.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said this:

If no merge is found, perhaps it should seek a possible merge of the previous major version? Or just leave that as a future improvement possibility, I suppose.

I wasn't clear, based on your response. I mean, if there is no current version merge, then perhaps we could look to merge segments of the previous version. I definitely don't mean merging segments across major versions, since that's precisely what we want to avoid.

But on second thought, it's not so simple, based on the way Lucene works. Once Lucene decides that the previous version's segments don't need another merge (i.e. there aren't quite too many of them), it's never going to change. But maybe this strategy makes sense for a force merge (i.e. optimize).

return in.findFullFlushMerges(mergeTrigger, getFilteredInfosClone(infos), mergeContext);
}

private SegmentInfos getFilteredInfosClone(SegmentInfos infos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: clone is an implementation detail that doesn't belong in the method. For example, maybe we see that rebuilding this thing is expensive-ish and we decide to first check if all segments are current to not clone. Or if no segments are updated, and return an empty one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants