Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record if block API has been used in SegmentInfo #12685

Merged
merged 11 commits into from
Oct 23, 2023

Conversation

s1monw
Copy link
Member

@s1monw s1monw commented Oct 16, 2023

If the add/updateDocuments(List<>) API is used, lucene guarantees that all documents are indexed in the same segment with consecutive document IDs. This enables features like nested documents etc. This change records the usage of this API in SegmentInfo and preserves this property across merges.

Relates to #12665

If the add/updateDocuments(List<>) API is used, lucene guarantees that
all documents are indexed in the same segment with consecutive document IDs.
This enables features like nested documents etc. This change records the usage
of this API in SegmentsInfo and preserves this property across merges.

Relates to apache#12665
if (docCount < 0) {
throw new CorruptIndexException("invalid docCount: " + docCount, input);
}
final boolean isCompoundFile = input.readByte() == SegmentInfo.YES;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing a byte each time here and change index format we could also write a bitset to mark features like this. it might be easier down the road. not sure how often it happens.

@s1monw s1monw changed the title Record if block API has been used in SegmentsInfo Record if block API has been used in SegmentInfo Oct 16, 2023
@s1monw
Copy link
Member Author

s1monw commented Oct 18, 2023

@jpountz I pushed new commits, wanna take a new look

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

lucene/CHANGES.txt Outdated Show resolved Hide resolved
@s1monw s1monw merged commit 6677109 into apache:main Oct 23, 2023
4 checks passed
@s1monw s1monw deleted the record_block_use branch October 23, 2023 07:46
s1monw added a commit that referenced this pull request Oct 23, 2023
If the add/updateDocuments(List<>) API is used, lucene guarantees that
all documents are indexed in the same segment with consecutive document IDs.
This enables features like nested documents etc. This change records the usage
of this API in SegmentsInfo and preserves this property across merges.

Relates to #12665
@jpountz
Copy link
Contributor

jpountz commented Oct 25, 2023

FYI we've seen failures on TestIndexWriter recently, which are reproducible (e.g. https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.x/720/). I ran git bisect and it pointed to this change. I'm unsure why though.

@s1monw
Copy link
Member Author

s1monw commented Oct 25, 2023

I pushed fixes... thanks @jpountz

javanna added a commit to elastic/elasticsearch that referenced this pull request Oct 26, 2023
javanna added a commit to elastic/elasticsearch that referenced this pull request Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants