Make text index query cache a configurable option #5176

siddharthteotia · 2020-03-24T05:57:57Z

During design of text index feature, we had done some heap overhead experiments. As part of that we had determined that while Lucene's internal query cache helps with performance significantly for repeatable queries, it adds heap overhead. Therefore, it was disabled during index loading.

It is worthwhile to make this configurable on per-index basis for users to enable it. Of course, they need to know the downside of more heap overhead (which could potentially negate the perf improvements due to more GC).

As part of ongoing internal user acceptance testing, we learned that most of the queries are repeatable. The UI will more or less keep the text search filter constant and tweak the other filters. For such cases, it is good to see the performance improvements by enabling the query cache and if the heap overhead is not significantly high, user might want to keep it enabled.

Note: By default it is still disabled. So nothing really changed in the existing behavior.

codecov-io · 2020-03-24T06:37:12Z

Codecov Report

Merging #5176 into master will decrease coverage by 21.44%.
The diff coverage is 58.55%.

@@              Coverage Diff              @@
##             master    #5176       +/-   ##
=============================================
- Coverage     65.95%   44.50%   -21.45%     
=============================================
  Files          1056     1057        +1     
  Lines         54131    54067       -64     
  Branches       8050     8050               
=============================================
- Hits          35701    24065    -11636     
- Misses        15787    27957    +12170     
+ Partials       2643     2045      -598

Impacted Files	Coverage Δ	Complexity Δ
...core/operator/dociditerators/AndDocIdIterator.java	`88.23% <ø> (+7.20%)`	`0.00 <0.00> (ø)`
...e/operator/dociditerators/SVScanDocIdIterator.java	`69.51% <ø> (+3.99%)`	`0.00 <0.00> (ø)`
...perator/docidsets/ScanBasedMultiValueDocIdSet.java	`93.33% <ø> (ø)`	`0.00 <0.00> (ø)`
...erator/docidsets/ScanBasedSingleValueDocIdSet.java	`94.44% <ø> (ø)`	`0.00 <0.00> (ø)`
...edicate/BaseDictionaryBasedPredicateEvaluator.java	`33.33% <ø> (-27.39%)`	`0.00 <0.00> (ø)`
...predicate/BaseRawValueBasedPredicateEvaluator.java	`6.06% <ø> (-82.52%)`	`0.00 <0.00> (ø)`
...ent/index/column/PhysicalColumnIndexContainer.java	`70.12% <0.00%> (-23.12%)`	`0.00 <0.00> (ø)`
...ment/index/readers/text/LuceneTextIndexReader.java	`0.00% <0.00%> (-78.19%)`	`0.00 <0.00> (ø)`
.../java/org/apache/pinot/spi/config/FieldConfig.java	`0.00% <0.00%> (-94.74%)`	`0.00 <0.00> (ø)`
.../core/segment/index/loader/IndexLoadingConfig.java	`66.07% <22.22%> (-10.12%)`	`0.00 <0.00> (ø)`
... and 584 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 221e73a...1e5f276. Read the comment docs.

jackjlli · 2020-03-24T17:53:39Z

...re/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java

-      _indexSearcher.setQueryCache(null);
+      if (textIndexProperties == null ||
+          textIndexProperties.get(FieldConfig.LUCENE_TEXT_INDEX_ENABLE_QUERY_CACHE) == null ||
+          textIndexProperties.get(FieldConfig.LUCENE_TEXT_INDEX_ENABLE_QUERY_CACHE).equalsIgnoreCase("false")) {


I'll suggest using
!textIndexProperties.get(FieldConfig.LUCENE_TEXT_INDEX_ENABLE_QUERY_CACHE).equalsIgnoreCase("true") instead of false, in case someone mis-spells the value.

jackjlli · 2020-03-24T17:54:39Z

pinot-common/src/main/java/org/apache/pinot/common/config/FieldConfig.java

  public static String VAR_LENGTH_DICTIONARY_COLUMN_KEY = "field.config.var.length.dictionary";

+  // Lucene index properties
+  public static String LUCENE_TEXT_INDEX_REALTIME_READER_REFRESH_KEY = "field.config.text.index.realtime.reader.refresh";
+  public static String LUCENE_TEXT_INDEX_ENABLE_QUERY_CACHE = "field.config.text.index.enable.query.cache";


Put a comment above this line on how this cached will be used.

jackjlli · 2020-03-24T17:55:49Z

pinot-core/src/test/java/org/apache/pinot/queries/TestTextSearchQueries.java

@@ -160,7 +162,11 @@ private void createSegment()
  private void loadSegment()
      throws Exception {
    IndexLoadingConfig indexLoadingConfig = new IndexLoadingConfig();
-    indexLoadingConfig.setTextIndexColumns(new HashSet<>(textIndexColumns));
+    Map<String, Map<String, String>> textIndexColumnsWithProperties = new HashMap<>();
+    for (String column : textIndexColumns) {


putIfAbsent

this change not needed anymore

Jackie-Jiang · 2020-03-24T18:41:56Z

pinot-common/src/main/java/org/apache/pinot/common/config/FieldConfig.java

  public static String VAR_LENGTH_DICTIONARY_COLUMN_KEY = "field.config.var.length.dictionary";

+  // Lucene index properties
+  public static String LUCENE_TEXT_INDEX_REALTIME_READER_REFRESH_KEY = "field.config.text.index.realtime.reader.refresh";


Not related to this but can we remove the field.config. prefix from all these keys?

Jackie-Jiang · 2020-03-24T18:47:01Z

pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/IndexLoadingConfig.java

@@ -45,7 +45,7 @@
  private ReadMode _readMode = ReadMode.DEFAULT_MODE;
  private List<String> _sortedColumns = Collections.emptyList();
  private Set<String> _invertedIndexColumns = new HashSet<>();
-  private Set<String> _textIndexColumns = new HashSet<>();
+  private Map<String, Map<String, String>> _textIndexColumns = new HashMap<>();


Don't put the whole map here. The map contains all the properties, not for text column only. I think you can keep this config unchanged, but check the field config when loading the text index.

Done. Btw, FieldConfig contains a properties map Map<String, String> per column

kishoreg · 2020-04-03T06:06:34Z

Do we still need this?

siddharthteotia · 2020-04-03T17:38:42Z

Do we still need this?

Yes, by default nothing is changed. But if the user is interested, they can enable it. May not be as impactful as the optimizations implemented in #5177 and #5199 but still good to have a knob to turn it on

siddharthteotia · 2020-04-03T17:42:10Z

@Jackie-Jiang @jackjlli , I have addressed the review comments. Please take a look at it again

Jackie-Jiang

LGTM otherwise

Jackie-Jiang · 2020-04-04T02:11:29Z

...re/src/main/java/org/apache/pinot/core/segment/index/readers/text/LuceneTextIndexReader.java

-      // Disable Lucene query result cache. While it helps a lot with performance for
-      // repeated queries, on the downside it cause heap issues.
-      _indexSearcher.setQueryCache(null);
+      if (textIndexProperties == null || textIndexProperties.get(FieldConfig.TEXT_INDEX_ENABLE_QUERY_CACHE) == null


Suggested change

if (textIndexProperties == null || textIndexProperties.get(FieldConfig.TEXT_INDEX_ENABLE_QUERY_CACHE) == null

if (textIndexProperties == null || !Boolean.parseBoolean(textIndexProperties.get(FieldConfig.TEXT_INDEX_ENABLE_QUERY_CACHE))) {

Jackie-Jiang · 2020-04-04T02:19:31Z

pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/IndexLoadingConfig.java

@@ -60,6 +60,9 @@
  private boolean _isDirectRealtimeOffheapAllocation;
  private boolean _enableSplitCommitEndWithMetadata;

+  // constructed from FieldConfig
+  private Map<String, Map<String, String>> _columnsWithProperties;


Suggest renaming it to _columnProperties. Also don't set it inside the extractTextIndexColumnsFromTableConfig(), set it in extractFromTableConfig() so that other index type can also access it.

jackjlli reviewed Mar 24, 2020

View reviewed changes

Jackie-Jiang reviewed Mar 24, 2020

View reviewed changes

Make text index query cache a configurable option

ec311dc

siddharthteotia force-pushed the query_cache branch from 6dcae80 to ec311dc Compare April 3, 2020 17:37

Fix doc

96a8d01

Jackie-Jiang approved these changes Apr 4, 2020

View reviewed changes

Siddharth Teotia added 2 commits April 6, 2020 10:36

Fix tests

0a5c8a6

format

1e5f276

siddharthteotia merged commit 0d5b5e2 into apache:master Apr 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make text index query cache a configurable option #5176

Make text index query cache a configurable option #5176

siddharthteotia commented Mar 24, 2020

codecov-io commented Mar 24, 2020 •

edited

Loading

jackjlli Mar 24, 2020

siddharthteotia Apr 3, 2020

jackjlli Mar 24, 2020

siddharthteotia Apr 3, 2020

jackjlli Mar 24, 2020

siddharthteotia Apr 3, 2020

Jackie-Jiang Mar 24, 2020

siddharthteotia Apr 3, 2020

Jackie-Jiang Mar 24, 2020

siddharthteotia Apr 3, 2020 •

edited

Loading

kishoreg commented Apr 3, 2020

siddharthteotia commented Apr 3, 2020 •

edited

Loading

siddharthteotia commented Apr 3, 2020

Jackie-Jiang left a comment

Jackie-Jiang Apr 4, 2020

siddharthteotia Apr 6, 2020

Jackie-Jiang Apr 4, 2020

siddharthteotia Apr 6, 2020

	if (textIndexProperties == null \|\| textIndexProperties.get(FieldConfig.TEXT_INDEX_ENABLE_QUERY_CACHE) == null
	if (textIndexProperties == null \|\| !Boolean.parseBoolean(textIndexProperties.get(FieldConfig.TEXT_INDEX_ENABLE_QUERY_CACHE))) {

Make text index query cache a configurable option #5176

Make text index query cache a configurable option #5176

Conversation

siddharthteotia commented Mar 24, 2020

codecov-io commented Mar 24, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia Apr 3, 2020 • edited Loading

Choose a reason for hiding this comment

kishoreg commented Apr 3, 2020

siddharthteotia commented Apr 3, 2020 • edited Loading

siddharthteotia commented Apr 3, 2020

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Mar 24, 2020 •

edited

Loading

siddharthteotia Apr 3, 2020 •

edited

Loading

siddharthteotia commented Apr 3, 2020 •

edited

Loading