Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9827: avoid wasteful recompression for small segments #28

Merged
merged 3 commits into from
Apr 6, 2021

Commits on Mar 21, 2021

  1. LUCENE-9827: avoid wasteful recompression for small segments

    Require that the segment has enough dirty documents to create a clean
    chunk before recompressing during merge, there must be at least maxChunkSize.
    
    This prevents wasteful recompression with small flushes (e.g. every
    document): we ensure recompression achieves some "permanent" progress.
    
    Expose maxDocsPerChunk as a parameter for Term vectors too, matching the
    stored fields format. This allows for easy testing.
    rmuir committed Mar 21, 2021
    Configuration menu
    Copy the full SHA
    f598e4e View commit details
    Browse the repository at this point in the history

Commits on Mar 22, 2021

  1. LUCENE-9827: increment numDirtyDocs for partially optimized merges

    If segment N needs recompression, we have to flush any buffered docs
    before bulk-copying segment N+1. Don't just increment numDirtyChunks,
    also make sure numDirtyDocs is incremented, too.
    
    This doesn't have a performance impact, and is unrelated to tooDirty()
    improvements, but it is easier to reason about things with correct
    statistics in the index.
    rmuir committed Mar 22, 2021
    Configuration menu
    Copy the full SHA
    0a369d8 View commit details
    Browse the repository at this point in the history

Commits on Mar 23, 2021

  1. Configuration menu
    Copy the full SHA
    4856b6f View commit details
    Browse the repository at this point in the history