Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not limit terms size when determining stream ids contained in index. #6957

Merged

Conversation

@dennisoelkers
Copy link
Member

dennisoelkers commented Dec 9, 2019

Description

Motivation and Context

When rotating an index, the following details are calculated and stored
in an index range:

  • the smallest timestamp (oldest document)
  • the largest timestamp (youngest document)
  • the set of stream ids contained in documents in this index

For the latter, a terms query is used. Unfortunately, no specific size
was set for it, so it used the default size of 10. This meant that all
index ranges calculated before did not contain more than 10 stream ids.

In the current search, this did not lead to imminent problems. The logic
used to determine the indices which are used for the search was lenient
enough to only consider either the set of stream ids containing the
current stream id or the index being managed by the index set of the
stream.

The new search is stricter and expects the set of stream ids to contain
the current stream being searched for. It only checks if the index set
of the stream manages the index if it is the deflector.

This change is now calculating the stream id set for a give index
correctly, by specifying a size parameter for the terms aggregation
that is hopefully high enough (Integer.MAX_VALUE) to handle all cases.

After applying the patch, it is required to recalculate all index ranges
once to benefit from its effectes.

Fixes #6828.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (non-breaking change)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
When rotating an index, the following details are calculated and stored
in an index range:

  * the smallest timestamp (oldest document)
  * the largest timestamp (youngest document)
  * the set of stream ids contained in documents in this index

For the latter, a terms query is used. Unfortunately, no specific size
was set for it, so it used the default size of 10. This meant that all
index ranges calculated before did not contain more than 10 stream ids.

In the current search, this did not lead to imminent problems. The logic
used to determine the indices which are used for the search was lenient
enough to only consider either the set of stream ids containing the
current stream id or the index being managed by the index set of the
stream.

The new search is stricter and expects the set of stream ids to contain
the current stream being searched for. It only checks if the index set
of the stream manages the index if it is the deflector.

This change is now calculating the stream id set for a give index
correctly, by specifying a `size` parameter for the terms aggregation
that is hopefully high enough (`Integer.MAX_VALUE`) to handle all cases.

Fixes #6828.
@dennisoelkers dennisoelkers added this to the 3.2.0 milestone Dec 9, 2019
@dennisoelkers dennisoelkers requested a review from bernd Dec 9, 2019
@bernd bernd self-assigned this Dec 10, 2019
@bernd
bernd approved these changes Dec 10, 2019
Copy link
Member

bernd left a comment

Good catch! Thank you! 👍

@bernd bernd merged commit 8e7560b into master Dec 10, 2019
4 checks passed
4 checks passed
ci-web-linter Jenkins build graylog-pr-linter-check 5420 has succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
graylog-project/pr Jenkins build graylog-project-pr-snapshot 7199 has succeeded
Details
license/cla Contributor License Agreement is signed.
Details
@bernd bernd deleted the do-not-limit-terms-size-when-determining-streams-in-index branch Dec 10, 2019
bernd added a commit that referenced this pull request Dec 10, 2019
…x. (#6957)

When rotating an index, the following details are calculated and stored
in an index range:

  * the smallest timestamp (oldest document)
  * the largest timestamp (youngest document)
  * the set of stream ids contained in documents in this index

For the latter, a terms query is used. Unfortunately, no specific size
was set for it, so it used the default size of 10. This meant that all
index ranges calculated before did not contain more than 10 stream ids.

In the current search, this did not lead to imminent problems. The logic
used to determine the indices which are used for the search was lenient
enough to only consider either the set of stream ids containing the
current stream id or the index being managed by the index set of the
stream.

The new search is stricter and expects the set of stream ids to contain
the current stream being searched for. It only checks if the index set
of the stream manages the index if it is the deflector.

This change is now calculating the stream id set for a give index
correctly, by specifying a `size` parameter for the terms aggregation
that is hopefully high enough (`Integer.MAX_VALUE`) to handle all cases.

Fixes #6828.

(cherry picked from commit 8e7560b)
bernd added a commit that referenced this pull request Dec 10, 2019
…x. (#6957) (#6967)

When rotating an index, the following details are calculated and stored
in an index range:

  * the smallest timestamp (oldest document)
  * the largest timestamp (youngest document)
  * the set of stream ids contained in documents in this index

For the latter, a terms query is used. Unfortunately, no specific size
was set for it, so it used the default size of 10. This meant that all
index ranges calculated before did not contain more than 10 stream ids.

In the current search, this did not lead to imminent problems. The logic
used to determine the indices which are used for the search was lenient
enough to only consider either the set of stream ids containing the
current stream id or the index being managed by the index set of the
stream.

The new search is stricter and expects the set of stream ids to contain
the current stream being searched for. It only checks if the index set
of the stream manages the index if it is the deflector.

This change is now calculating the stream id set for a give index
correctly, by specifying a `size` parameter for the terms aggregation
that is hopefully high enough (`Integer.MAX_VALUE`) to handle all cases.

Fixes #6828.

(cherry picked from commit 8e7560b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.