Deduplicate min and max term in single-term FieldReader by original-brownbear · Pull Request #13618 · apache/lucene

original-brownbear · 2024-07-30T11:22:47Z

I noticed that single-term readers are an edge case but not that uncommon in Elasticsearch heap dumps. It seems quite common to have a constant value for some field across a complete segment (e.g. a version value that is repeated endlessly in logs).
Seems simple enough to deduplicate here to save a couple MB of heap, though it's admittedly an edge case that mostly matters for large segment counts.

I noticed that single-term readers are an edge case but not that uncommon in Elasticsearch heap dumps. It seems quite common to have a constant value for some field across a complete segment (e.g. a version value that is repeated endlessly in logs). Seems simple enough to deduplicate here to save a couple MB of heap.

jpountz

It feels like a corner case, but the change is simple enough. LGTM.

original-brownbear · 2024-07-31T18:38:16Z

Thanks Adrien!

I noticed that single-term readers are an edge case but not that uncommon in Elasticsearch heap dumps. It seems quite common to have a constant value for some field across a complete segment (e.g. a version value that is repeated endlessly in logs). Seems simple enough to deduplicate here to save a couple MB of heap.

Found this case in Elasticsearch heap dump's and it'as the same as apache#13618 but for BKDReader instances. Admittedly, as for terms this is a corner case but observed use cases where the distribution of these values is heavily weighted towards a single value would save up to a O(5M) in heap and some cache pressure from this change.

Found this case in Elasticsearch heap dump's and it'as the same as #13618 but for BKDReader instances. Admittedly, as for terms this is a corner case but observed use cases where the distribution of these values is heavily weighted towards a single value would save up to a O(5M) in heap and some cache pressure from this change.

Found this case in Elasticsearch heap dump's and it'as the same as apache#13618 but for BKDReader instances. Admittedly, as for terms this is a corner case but observed use cases where the distribution of these values is heavily weighted towards a single value would save up to a O(5M) in heap and some cache pressure from this change.

jpountz approved these changes Jul 30, 2024

View reviewed changes

original-brownbear merged commit 47650a4 into apache:main Jul 31, 2024

original-brownbear added this to the 9.12.0 milestone Jul 31, 2024

original-brownbear deleted the save-heap-single-term-tree branch July 31, 2024 18:38

original-brownbear mentioned this pull request Mar 10, 2025

Make single value BKDReader instances lighter #14337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate min and max term in single-term FieldReader#13618

Deduplicate min and max term in single-term FieldReader#13618
original-brownbear merged 1 commit intoapache:mainfrom
original-brownbear:save-heap-single-term-tree

original-brownbear commented Jul 30, 2024

Uh oh!

jpountz left a comment

Uh oh!

original-brownbear commented Jul 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

original-brownbear commented Jul 30, 2024

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented Jul 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants