LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache #815

atris · 2019-07-31T12:36:21Z

No description provided.

atris · 2019-08-05T10:38:37Z

I ran luceneutil for wikipedia 10M with the concurrent searching and latency calculation patch applied.

https://gist.github.com/atris/e0fa10e79fb5ef62bd571406acf98433

There was no significant degradation to QPS, and the P999 and P100 latencies generally saw an improvement

msokolov · 2019-08-07T15:03:55Z

It should be enough to report the stats after the last iteration - it is cumulative, so the previous ones just add noise? I agree QPS looks pretty noisy, probably no real change. Could you post the latency stats in a more readable table here? It looks as if you have markdown there: I think github will accept that

atris · 2019-08-13T10:16:49Z

It should be enough to report the stats after the last iteration - it is cumulative, so the previous ones just add noise? I agree QPS looks pretty noisy, probably no real change.

I dont think thats true, since each run is its own JVM?

atris · 2019-09-02T07:44:06Z

Another set of runs on wikimedium all with concurrent searching enabled:

              Fuzzy1       47.29      (7.1%)       45.06     (11.1%)   -4.7% ( -21% -   14%)
        OrHighNotMed      405.86      (3.4%)      392.55      (2.2%)   -3.3% (  -8% -    2%)
       OrNotHighHigh      386.16      (4.7%)      373.54      (4.1%)   -3.3% ( -11% -    5%)
       BrowseDayOfYearTaxoFacets     6003.62      (2.6%)     5808.73      (2.2%)   -3.2% (  -7% -    1%)
             Prefix3      176.87     (10.1%)      172.28      (8.9%)   -2.6% ( -19% -   18%)
       BrowseMonthTaxoFacets     6190.97      (3.8%)     6044.46      (4.9%)   -2.4% ( -10% -    6%)
           MedPhrase       40.97      (5.1%)       40.06      (5.5%)   -2.2% ( -12% -    8%)
        OrNotHighMed      383.00      (3.3%)      374.82      (4.7%)   -2.1% (  -9% -    6%)
          AndHighLow      191.05      (3.4%)      187.88      (3.2%)   -1.7% (  -7% -    5%)
        OrHighNotLow      416.92      (4.4%)      411.50      (4.3%)   -1.3% (  -9% -    7%)
          AndHighMed       39.58      (2.2%)       39.17      (1.9%)   -1.0% (  -5% -    3%)
            Wildcard       24.72      (7.9%)       24.49      (6.6%)   -0.9% ( -14% -   14%)
          HighPhrase       52.11      (5.6%)       51.63      (4.8%)   -0.9% ( -10% -   10%)
           LowPhrase       13.43      (2.7%)       13.33      (2.5%)   -0.8% (  -5% -    4%)
         AndHighHigh       12.68      (3.9%)       12.58      (3.2%)   -0.8% (  -7% -    6%)
            HighTerm      717.09      (4.8%)      712.98      (5.2%)   -0.6% ( -10% -    9%)
            PKLookup       91.70      (2.8%)       91.27      (3.8%)   -0.5% (  -6% -    6%)
              IntNRQ       21.92     (17.9%)       21.83     (18.0%)   -0.4% ( -30% -   43%)
             Respell       34.38      (3.3%)       34.24      (2.3%)   -0.4% (  -5% -    5%)
        HighTermDayOfYearSort       27.44      (3.2%)       27.33      (1.6%)   -0.4% (  -5% -    4%)
       OrHighNotHigh      463.40      (5.4%)      461.74      (4.3%)   -0.4% (  -9% -    9%)
       BrowseDateTaxoFacets        0.69      (0.4%)        0.69      (0.5%)   -0.2% (  -1% -    0%)
       BrowseMonthSSDVFacets        2.63      (1.6%)        2.63      (1.1%)   -0.1% (  -2% -    2%)
             MedTerm      885.64      (5.0%)      885.56      (3.6%)   -0.0% (  -8% -    8%)
              Fuzzy2       39.47      (7.1%)       39.47      (9.4%)   -0.0% ( -15% -   17%)
       BrowseDayOfYearSSDVFacets        2.41      (0.4%)        2.41      (0.3%)   -0.0% (   0% -    0%)
        HighSpanNear        6.53      (1.0%)        6.53      (1.3%)    0.0% (  -2% -    2%)
        MedSloppyPhrase       31.76      (2.0%)       31.79      (1.6%)    0.1% (  -3% -    3%)
       HighIntervalsOrdered        6.10      (1.6%)        6.11      (2.1%)    0.1% (  -3% -    3%)
           OrHighLow      177.27      (2.2%)      177.50      (2.5%)    0.1% (  -4% -    5%)
     LowSloppyPhrase       30.62      (1.9%)       30.67      (1.7%)    0.2% (  -3% -    3%)
         MedSpanNear        7.63      (1.7%)        7.64      (2.0%)    0.2% (  -3% -    3%)
         LowSpanNear        8.34      (1.3%)        8.37      (1.8%)    0.3% (  -2% -    3%)
        OrNotHighLow      308.14      (2.0%)      309.75      (4.9%)    0.5% (  -6% -    7%)
             LowTerm      861.94      (4.6%)      870.93      (2.8%)    1.0% (  -6% -    8%)
    HighSloppyPhrase        5.58      (3.2%)        5.64      (2.8%)    1.1% (  -4% -    7%)
           OrHighMed       10.81      (2.4%)       10.95      (2.6%)    1.4% (  -3% -    6%)
          OrHighHigh       10.28      (2.5%)       10.48      (3.2%)    1.9% (  -3% -    7%)

Seems there is no degradation?

atris · 2019-09-10T06:30:10Z

Rebased with master.

Any thoughts on this one? Seems like a useful change with no degradation in the happy path?

mikemccand · 2019-09-10T14:53:49Z

It should be enough to report the stats after the last iteration - it is cumulative, so the previous ones just add noise? I agree QPS looks pretty noisy, probably no real change.

I dont think thats true, since each run is its own JVM?

It is true -- luceneutil runs multiple JVMs to try to sample the noise due to hotspot mis-compilation, and then each iteration in reports the cumulative results of all prior iterations so far. So reporting the last result is good.

mikemccand · 2019-09-10T14:54:37Z

It should be enough to report the stats after the last iteration - it is cumulative, so the previous ones just add noise? I agree QPS looks pretty noisy, probably no real change. Could you post the latency stats in a more readable table here? It looks as if you have markdown there: I think github will accept that

+1 to inline the latency results in a readable way here.

lucene/CHANGES.txt

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

atris · 2019-09-11T10:37:21Z

@mikemccand Thanks for the inputs, updated the PR. Please let me know your comments

atris · 2019-09-13T12:42:05Z

Any further thoughts on this one?

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

atris · 2019-09-16T02:41:52Z

@mikemccand Thanks for reviewing -- updated per comments. Please see and let me know your thoughts.

lucene/CHANGES.txt

lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

atris · 2019-09-19T03:23:51Z

@mikemccand Thanks, fixed. Interestingly, moving the asynchronous load check to cacheAsynchronously also removed the need for the new exception. Please see the latest and share your comments.

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

mikemccand · 2019-09-20T10:20:43Z

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

+  List<Query> inFlightQueries() {
+    lock.lock();
+    try {
+      return new ArrayList<>(inFlightAsyncLoadQueries);


I'm still confused about lock -- do we always hold the lock when checking if query is already in the map? If so, we don't need a ConcurrentHashMap? If not, why do we even have the lock since it is a ConcurrentHashMap.

Not all places which access inFlightAsyncLoadQueries take a lock -- the main being cacheAsynchronously

This specific access site does not lead a lock -- thanks for highlighting that, fixed!

atris · 2019-09-20T11:06:45Z

@mikemccand Thanks for your inputs, updated the same

mikemccand · 2019-09-20T13:04:11Z

@mikemccand Thanks, fixed. Interestingly, moving the asynchronous load check to cacheAsynchronously also removed the need for the new exception. Please see the latest and share your comments.

Ahh that is a nice side effect!

atris · 2019-09-20T13:15:57Z

@mikemccand Thanks, fixed. Interestingly, moving the asynchronous load check to cacheAsynchronously also removed the need for the new exception. Please see the latest and share your comments.

Ahh that is a nice side effect!

Indeed!

Does the latest iteration look ready? Anything that sticks out?

lucene/CHANGES.txt

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

mikemccand · 2019-09-24T11:40:23Z

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

+            return in.scorerSupplier(context);
+          }
+          else {
+            docIdSet = cache(context);


I think this means async caching can be more efficient, because with single threaded caching, multiple threads could do the work to try the cache the same Query, with only one of them winning in the end, but with async caching, we ensure only one search thread does the caching? So e.g. red-line QPS (capacity) could be a bit higher with async, if queries are often duplicated at once?

+1, that is a great observation, thanks for highlighting it!

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

atris · 2019-09-25T10:23:10Z

@mikemccand Updated the PR, please see and let me know.

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

atris · 2019-09-25T16:26:55Z

@mikemccand Updated, please see

mikemccand

I left some minor comments ... I think this is ready after that! Thanks @atris! This is an exciting change, especially because it means in some cases (same query in flight in multiple query threads), if you pass an Executor to IndexSearcher, it's more efficient than the single threaded case.

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

atris · 2019-09-28T12:37:40Z

Thanks @mikemccand ! This was an extensive review -- thank you for spending the time on it!

atris · 2019-09-28T13:27:56Z

Ran the Lucene test suite on the latest iteration -- came in clean.

…815)" This reverts commit 0dfbf55.

…815)" (#914) This reverts commit 0dfbf55.

atris force-pushed the LUCENE-8213 branch 4 times, most recently from 08080fc to 819acd2 Compare August 2, 2019 13:52

atris force-pushed the LUCENE-8213 branch from 819acd2 to 0bcae3e Compare September 9, 2019 12:28

mikemccand reviewed Sep 10, 2019

View reviewed changes

mikemccand reviewed Sep 13, 2019

View reviewed changes

mikemccand reviewed Sep 18, 2019

View reviewed changes

atris self-assigned this Sep 19, 2019

mikemccand reviewed Sep 20, 2019

View reviewed changes

mikemccand reviewed Sep 24, 2019

View reviewed changes

mikemccand reviewed Sep 25, 2019

View reviewed changes

mikemccand approved these changes Sep 28, 2019

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java Outdated Show resolved Hide resolved

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java Show resolved Hide resolved

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java Show resolved Hide resolved

msokolov reviewed Sep 28, 2019

View reviewed changes

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java Outdated Show resolved Hide resolved

Atri Sharma added 11 commits September 28, 2019 17:53

LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache

87962bd

Update comments

68f370d

Updated per comments

f707396

Updated per comments

f889477

Update per review comments

bbfd539

Updated per comments

86654c0

Update per comments

59c05e6

Support rejected execution

0e5350e

Updated per comments

410c5a4

Update per comments

eb42d5f

Remove redundant import

f2d230d

atris force-pushed the LUCENE-8213 branch from f15b3c2 to f2d230d Compare September 28, 2019 12:31

atris merged commit 0dfbf55 into apache:master Sep 28, 2019

atris added a commit that referenced this pull request Oct 2, 2019

Revert "LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache (#…

cd09808

…815)" This reverts commit 0dfbf55.

atris mentioned this pull request Oct 2, 2019

Revert "LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache" #914

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache #815

LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache #815

atris commented Jul 31, 2019

atris commented Aug 5, 2019

msokolov commented Aug 7, 2019

atris commented Aug 13, 2019

atris commented Sep 2, 2019

atris commented Sep 10, 2019

mikemccand commented Sep 10, 2019

mikemccand commented Sep 10, 2019

atris commented Sep 11, 2019

atris commented Sep 13, 2019

atris commented Sep 16, 2019

atris commented Sep 19, 2019

mikemccand Sep 20, 2019

atris Sep 20, 2019

atris Sep 20, 2019

atris commented Sep 20, 2019

mikemccand commented Sep 20, 2019

atris commented Sep 20, 2019

mikemccand Sep 24, 2019

atris Sep 24, 2019

atris commented Sep 25, 2019

atris commented Sep 25, 2019

mikemccand left a comment

atris commented Sep 28, 2019

atris commented Sep 28, 2019

LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache #815

LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache #815

Conversation

atris commented Jul 31, 2019

atris commented Aug 5, 2019

msokolov commented Aug 7, 2019

atris commented Aug 13, 2019

atris commented Sep 2, 2019

atris commented Sep 10, 2019

mikemccand commented Sep 10, 2019

mikemccand commented Sep 10, 2019

atris commented Sep 11, 2019

atris commented Sep 13, 2019

atris commented Sep 16, 2019

atris commented Sep 19, 2019

mikemccand Sep 20, 2019

Choose a reason for hiding this comment

atris Sep 20, 2019

Choose a reason for hiding this comment

atris Sep 20, 2019

Choose a reason for hiding this comment

atris commented Sep 20, 2019

mikemccand commented Sep 20, 2019

atris commented Sep 20, 2019

mikemccand Sep 24, 2019

Choose a reason for hiding this comment

atris Sep 24, 2019

Choose a reason for hiding this comment

atris commented Sep 25, 2019

atris commented Sep 25, 2019

mikemccand left a comment

Choose a reason for hiding this comment

atris commented Sep 28, 2019

atris commented Sep 28, 2019