Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make text index query cache a configurable option #5176

Merged
merged 4 commits into from
Apr 6, 2020

Conversation

siddharthteotia
Copy link
Contributor

During design of text index feature, we had done some heap overhead experiments. As part of that we had determined that while Lucene's internal query cache helps with performance significantly for repeatable queries, it adds heap overhead. Therefore, it was disabled during index loading.

It is worthwhile to make this configurable on per-index basis for users to enable it. Of course, they need to know the downside of more heap overhead (which could potentially negate the perf improvements due to more GC).

As part of ongoing internal user acceptance testing, we learned that most of the queries are repeatable. The UI will more or less keep the text search filter constant and tweak the other filters. For such cases, it is good to see the performance improvements by enabling the query cache and if the heap overhead is not significantly high, user might want to keep it enabled.

Note: By default it is still disabled. So nothing really changed in the existing behavior.

@codecov-io
Copy link

codecov-io commented Mar 24, 2020

Codecov Report

Merging #5176 into master will decrease coverage by 21.44%.
The diff coverage is 58.55%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #5176       +/-   ##
=============================================
- Coverage     65.95%   44.50%   -21.45%     
=============================================
  Files          1056     1057        +1     
  Lines         54131    54067       -64     
  Branches       8050     8050               
=============================================
- Hits          35701    24065    -11636     
- Misses        15787    27957    +12170     
+ Partials       2643     2045      -598     
Impacted Files Coverage Δ Complexity Δ
...core/operator/dociditerators/AndDocIdIterator.java 88.23% <ø> (+7.20%) 0.00 <0.00> (ø)
...e/operator/dociditerators/SVScanDocIdIterator.java 69.51% <ø> (+3.99%) 0.00 <0.00> (ø)
...perator/docidsets/ScanBasedMultiValueDocIdSet.java 93.33% <ø> (ø) 0.00 <0.00> (ø)
...erator/docidsets/ScanBasedSingleValueDocIdSet.java 94.44% <ø> (ø) 0.00 <0.00> (ø)
...edicate/BaseDictionaryBasedPredicateEvaluator.java 33.33% <ø> (-27.39%) 0.00 <0.00> (ø)
...predicate/BaseRawValueBasedPredicateEvaluator.java 6.06% <ø> (-82.52%) 0.00 <0.00> (ø)
...ent/index/column/PhysicalColumnIndexContainer.java 70.12% <0.00%> (-23.12%) 0.00 <0.00> (ø)
...ment/index/readers/text/LuceneTextIndexReader.java 0.00% <0.00%> (-78.19%) 0.00 <0.00> (ø)
.../java/org/apache/pinot/spi/config/FieldConfig.java 0.00% <0.00%> (-94.74%) 0.00 <0.00> (ø)
.../core/segment/index/loader/IndexLoadingConfig.java 66.07% <22.22%> (-10.12%) 0.00 <0.00> (ø)
... and 584 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 221e73a...1e5f276. Read the comment docs.

_indexSearcher.setQueryCache(null);
if (textIndexProperties == null ||
textIndexProperties.get(FieldConfig.LUCENE_TEXT_INDEX_ENABLE_QUERY_CACHE) == null ||
textIndexProperties.get(FieldConfig.LUCENE_TEXT_INDEX_ENABLE_QUERY_CACHE).equalsIgnoreCase("false")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll suggest using
!textIndexProperties.get(FieldConfig.LUCENE_TEXT_INDEX_ENABLE_QUERY_CACHE).equalsIgnoreCase("true") instead of false, in case someone mis-spells the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public static String VAR_LENGTH_DICTIONARY_COLUMN_KEY = "field.config.var.length.dictionary";

// Lucene index properties
public static String LUCENE_TEXT_INDEX_REALTIME_READER_REFRESH_KEY = "field.config.text.index.realtime.reader.refresh";
public static String LUCENE_TEXT_INDEX_ENABLE_QUERY_CACHE = "field.config.text.index.enable.query.cache";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put a comment above this line on how this cached will be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -160,7 +162,11 @@ private void createSegment()
private void loadSegment()
throws Exception {
IndexLoadingConfig indexLoadingConfig = new IndexLoadingConfig();
indexLoadingConfig.setTextIndexColumns(new HashSet<>(textIndexColumns));
Map<String, Map<String, String>> textIndexColumnsWithProperties = new HashMap<>();
for (String column : textIndexColumns) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putIfAbsent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change not needed anymore

public static String VAR_LENGTH_DICTIONARY_COLUMN_KEY = "field.config.var.length.dictionary";

// Lucene index properties
public static String LUCENE_TEXT_INDEX_REALTIME_READER_REFRESH_KEY = "field.config.text.index.realtime.reader.refresh";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this but can we remove the field.config. prefix from all these keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -45,7 +45,7 @@
private ReadMode _readMode = ReadMode.DEFAULT_MODE;
private List<String> _sortedColumns = Collections.emptyList();
private Set<String> _invertedIndexColumns = new HashSet<>();
private Set<String> _textIndexColumns = new HashSet<>();
private Map<String, Map<String, String>> _textIndexColumns = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put the whole map here. The map contains all the properties, not for text column only. I think you can keep this config unchanged, but check the field config when loading the text index.

Copy link
Contributor Author

@siddharthteotia siddharthteotia Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Btw, FieldConfig contains a properties map Map<String, String> per column

@kishoreg
Copy link
Member

kishoreg commented Apr 3, 2020

Do we still need this?

@siddharthteotia
Copy link
Contributor Author

siddharthteotia commented Apr 3, 2020

Do we still need this?

Yes, by default nothing is changed. But if the user is interested, they can enable it. May not be as impactful as the optimizations implemented in #5177 and #5199 but still good to have a knob to turn it on

@siddharthteotia
Copy link
Contributor Author

@Jackie-Jiang @jackjlli , I have addressed the review comments. Please take a look at it again

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

// Disable Lucene query result cache. While it helps a lot with performance for
// repeated queries, on the downside it cause heap issues.
_indexSearcher.setQueryCache(null);
if (textIndexProperties == null || textIndexProperties.get(FieldConfig.TEXT_INDEX_ENABLE_QUERY_CACHE) == null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (textIndexProperties == null || textIndexProperties.get(FieldConfig.TEXT_INDEX_ENABLE_QUERY_CACHE) == null
if (textIndexProperties == null || !Boolean.parseBoolean(textIndexProperties.get(FieldConfig.TEXT_INDEX_ENABLE_QUERY_CACHE))) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -60,6 +60,9 @@
private boolean _isDirectRealtimeOffheapAllocation;
private boolean _enableSplitCommitEndWithMetadata;

// constructed from FieldConfig
private Map<String, Map<String, String>> _columnsWithProperties;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming it to _columnProperties. Also don't set it inside the extractTextIndexColumnsFromTableConfig(), set it in extractFromTableConfig() so that other index type can also access it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Siddharth Teotia added 2 commits April 6, 2020 10:36
@siddharthteotia siddharthteotia merged commit 0d5b5e2 into apache:master Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants