Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-10078: Enable merge-on-refresh by default. #921

Merged
merged 10 commits into from
Jun 7, 2022

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented May 23, 2022

This gives implementations of findFullFlushMerges to LogMergePolicy and
TieredMergePolicy and enables merge-on-refresh with a default timeout of
500ms.

The idea behind the 500ms default is that it felt both high-enough to have time
to run merges of small segments, and low enough that the freshness of the data
wouldn't look badly affected for users who have high refresh rates (e.g.
refreshing every second).

In both cases, findFullFlushMerges delegates to findMerges and filters
merges whose segments are all below the min/floor size.

This gives implementations of `findFullFlushMerges` to `LogMergePolicy` and
`TieredMergePolicy` and enables merge-on-refresh with a default timeout of
500ms.

The idea behind the 500ms default is that it felt both high-enough to have time
to run merges of small segments, and low enough that the freshness of the data
wouldn't look badly affected for users who have high refresh rates (e.g.
refreshing every second).

For `findFullFlushMerges`, `LogMergePolicy` looks at tail segments to see if it
can find at least `mergeFactor` flush segments below the min segment size, and
`TieredMergePolicy` looks for a merge that has at least `segmentsPerTier`
segments where the largest segment of the merge is a flush segment and below
the floor size.
Copy link
Member

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I do wonder if we should document that if SerialMS is used that commits will always wait for the merges to finish and the timeout is not respected.

@jpountz
Copy link
Contributor Author

jpountz commented Jun 7, 2022

Good call, I added a warning to javadocs on IndexWriterConfig#setMaxFullFlushMergeWaitMillis.

@jpountz jpountz merged commit b5795db into apache:main Jun 7, 2022
@jpountz jpountz deleted the lucene10078 branch June 7, 2022 14:59
jpountz added a commit that referenced this pull request Jun 7, 2022
This gives implementations of `findFullFlushMerges` to `LogMergePolicy` and
`TieredMergePolicy` and enables merge-on-refresh with a default timeout of
500ms.

The idea behind the 500ms default is that it felt both high-enough to have time
to run merges of small segments, and low enough that the freshness of the data
wouldn't look badly affected for users who have high refresh rates (e.g.
refreshing every second).

In both cases, `findFullFlushMerges` delegates to `findMerges` and filters
merges whose segments are all below the min/floor size.
shaie pushed a commit to mdmarshmallow/lucene that referenced this pull request Jun 22, 2022
This gives implementations of `findFullFlushMerges` to `LogMergePolicy` and
`TieredMergePolicy` and enables merge-on-refresh with a default timeout of
500ms.

The idea behind the 500ms default is that it felt both high-enough to have time
to run merges of small segments, and low enough that the freshness of the data
wouldn't look badly affected for users who have high refresh rates (e.g.
refreshing every second).

In both cases, `findFullFlushMerges` delegates to `findMerges` and filters
merges whose segments are all below the min/floor size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants