New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return index set wildcard instead of individual indices in Searches#termineAffectedIndices() #4062
Conversation
@joschi Thanks for the PR! One thing I don't understand yet: Why do we need the terms query with the index list? We are already using a timerange filter in the query, how does the terms query improve things? |
Since users can have multiple index sets which contain distinct data sets, specifying the time range alone isn't sufficient. Just think of a search query covering firewall and network appliance logs in a number of index sets which shouldn't include application logs from the same time frame. |
Sorry, I forgot to mention that we also filter on streams. Index sets are bound to streams so this will be handled as far as I can see. Currently, I can search in either one or all streams. If I search in one stream, we have a stream filter in addition to the timerange. If I search in all streams, it's okay to look into all index sets. Not sure if there is another scenario where selecting the indices is important. If not, we can greatly simply the code (as far as I can see) and save some cpu cycles processing the query. |
@bernd agree. |
…termineAffectedIndices() If the number of indices determined in `Searches#determineAffectedIndices()` is larger than `Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective index sets instead of the individual index names. Fixes #4054
@bernd I've implemented a variant returning the write index aliases instead of using a terms filter. |
.map(IndexRange::indexName) | ||
.map(indexSetRegistry::getForIndex) | ||
.flatMap(o -> o.map(java.util.stream.Stream::of).orElseGet(java.util.stream.Stream::empty)) | ||
.map(IndexSet::getWriteIndexAlias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you wanted to use IndexSet#getIndexWildcard()
here? Using graylog_deflector
doesn't make sense here. AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ You're right. That was bollocks.
@@ -607,7 +607,7 @@ public void determineAffectedIndicesOnlyReturnsAliasesIfTooManyIndicesAreFound() | |||
|
|||
final TimeRange absoluteRange = AbsoluteRange.create(now, now.plusDays(numberOfIndices + 1)); | |||
|
|||
assertThat(searches.determineAffectedIndices(absoluteRange, null)).containsExactly(indexSet.getWriteIndexAlias()); | |||
assertThat(searches.determineAffectedIndices(absoluteRange, null)).containsExactly(indexSet.getIndexWildcard()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good practice is to use constant values for expectations, insead of building it using actual code.
This is great example, why ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested the changes and noticed that the Searches#search(SearchesConfig)
is using determineAffectedIndicesWithRanges()
and then collects the index list manually. That means it's still using the full index list instead of the wildcards if the limit is crossed.
graylog2-server/graylog2-server/src/main/java/org/graylog2/indexer/searches/Searches.java
Lines 258 to 259 in 2fd85fd
final Set<IndexRange> indexRanges = determineAffectedIndicesWithRanges(config.range(), config.filter()); | |
final Set<String> indices = indexRanges.stream().map(IndexRange::indexName).collect(Collectors.toSet()); |
I guess the search method should use the regular determineAffectedIndices()
method here, not sure if there was a reason to not use it, though.
Also please check if there are other search methods that are not using the correct index list. Thanks! 😃 |
The idea was not to duplicate the work done in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed another place where we are not using the alias. indicesContainingField()
returns a list of indices.
graylog2-server/graylog2-server/src/main/java/org/graylog2/indexer/searches/Searches.java
Line 467 in 3a99825
final Set<String> affectedIndices = indicesContainingField(determineAffectedIndices(range, filter), field); |
@bernd The |
@joschi Yes, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through it! LGTM 👍
… is too long (#4062) * Return write index alias instead of individual indices in Searches#determineAffectedIndices() If the number of indices determined in `Searches#determineAffectedIndices()` is larger than `Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective index sets instead of the individual index names. Fixes #4054 * Use index wildcard instead of write index alias (duh!) * Use correct indices in Searches#search(SearchesConfig) * Reduce number of indices in Searches#fieldStats() if necessary (cherry picked from commit 3dc4ca1)
… is too long (#4062) (#4078) * Return write index alias instead of individual indices in Searches#determineAffectedIndices() If the number of indices determined in `Searches#determineAffectedIndices()` is larger than `Searches.MAX_INDICES_PER_QUERY`, return the covering write index aliases of the respective index sets instead of the individual index names. Fixes #4054 * Use index wildcard instead of write index alias (duh!) * Use correct indices in Searches#search(SearchesConfig) * Reduce number of indices in Searches#fieldStats() if necessary (cherry picked from commit 3dc4ca1)
This PR changes the behavior of the
Searches
class to use a Terms Query filtering for the_index
field instead of listing all indices in the URI path if a defined number of indices (seeSearches.MAX_INDICES_PER_QUERY
) has been reached.While this leads to more load required on the side of Elasticsearch (because all index shards have to be visited instead of just the ones of the specified indices), it circumvents the problem with the URI being larger than 4 KB.
Using partitioned parallel queries when the maximum URI size has been reached would unfortunately only be feasible for a subset of queries currently supported by the
Searches
class. 😞Fixes #4054