Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-8213: Introduce Asynchronous Caching in LRUQueryCache #815

Merged
merged 11 commits into from Sep 28, 2019

Conversation

@atris
Copy link
Contributor

commented Jul 31, 2019

No description provided.

@atris atris force-pushed the atris:LUCENE-8213 branch 4 times, most recently from 08080fc to 819acd2 Aug 1, 2019
@atris

This comment has been minimized.

Copy link
Contributor Author

commented Aug 5, 2019

I ran luceneutil for wikipedia 10M with the concurrent searching and latency calculation patch applied.

https://gist.github.com/atris/e0fa10e79fb5ef62bd571406acf98433

There was no significant degradation to QPS, and the P999 and P100 latencies generally saw an improvement

@msokolov

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2019

It should be enough to report the stats after the last iteration - it is cumulative, so the previous ones just add noise? I agree QPS looks pretty noisy, probably no real change. Could you post the latency stats in a more readable table here? It looks as if you have markdown there: I think github will accept that

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2019

It should be enough to report the stats after the last iteration - it is cumulative, so the previous ones just add noise? I agree QPS looks pretty noisy, probably no real change.

I dont think thats true, since each run is its own JVM?

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 2, 2019

Another set of runs on wikimedium all with concurrent searching enabled:

              Fuzzy1       47.29      (7.1%)       45.06     (11.1%)   -4.7% ( -21% -   14%)
        OrHighNotMed      405.86      (3.4%)      392.55      (2.2%)   -3.3% (  -8% -    2%)
       OrNotHighHigh      386.16      (4.7%)      373.54      (4.1%)   -3.3% ( -11% -    5%)
       BrowseDayOfYearTaxoFacets     6003.62      (2.6%)     5808.73      (2.2%)   -3.2% (  -7% -    1%)
             Prefix3      176.87     (10.1%)      172.28      (8.9%)   -2.6% ( -19% -   18%)
       BrowseMonthTaxoFacets     6190.97      (3.8%)     6044.46      (4.9%)   -2.4% ( -10% -    6%)
           MedPhrase       40.97      (5.1%)       40.06      (5.5%)   -2.2% ( -12% -    8%)
        OrNotHighMed      383.00      (3.3%)      374.82      (4.7%)   -2.1% (  -9% -    6%)
          AndHighLow      191.05      (3.4%)      187.88      (3.2%)   -1.7% (  -7% -    5%)
        OrHighNotLow      416.92      (4.4%)      411.50      (4.3%)   -1.3% (  -9% -    7%)
          AndHighMed       39.58      (2.2%)       39.17      (1.9%)   -1.0% (  -5% -    3%)
            Wildcard       24.72      (7.9%)       24.49      (6.6%)   -0.9% ( -14% -   14%)
          HighPhrase       52.11      (5.6%)       51.63      (4.8%)   -0.9% ( -10% -   10%)
           LowPhrase       13.43      (2.7%)       13.33      (2.5%)   -0.8% (  -5% -    4%)
         AndHighHigh       12.68      (3.9%)       12.58      (3.2%)   -0.8% (  -7% -    6%)
            HighTerm      717.09      (4.8%)      712.98      (5.2%)   -0.6% ( -10% -    9%)
            PKLookup       91.70      (2.8%)       91.27      (3.8%)   -0.5% (  -6% -    6%)
              IntNRQ       21.92     (17.9%)       21.83     (18.0%)   -0.4% ( -30% -   43%)
             Respell       34.38      (3.3%)       34.24      (2.3%)   -0.4% (  -5% -    5%)
        HighTermDayOfYearSort       27.44      (3.2%)       27.33      (1.6%)   -0.4% (  -5% -    4%)
       OrHighNotHigh      463.40      (5.4%)      461.74      (4.3%)   -0.4% (  -9% -    9%)
       BrowseDateTaxoFacets        0.69      (0.4%)        0.69      (0.5%)   -0.2% (  -1% -    0%)
       BrowseMonthSSDVFacets        2.63      (1.6%)        2.63      (1.1%)   -0.1% (  -2% -    2%)
             MedTerm      885.64      (5.0%)      885.56      (3.6%)   -0.0% (  -8% -    8%)
              Fuzzy2       39.47      (7.1%)       39.47      (9.4%)   -0.0% ( -15% -   17%)
       BrowseDayOfYearSSDVFacets        2.41      (0.4%)        2.41      (0.3%)   -0.0% (   0% -    0%)
        HighSpanNear        6.53      (1.0%)        6.53      (1.3%)    0.0% (  -2% -    2%)
        MedSloppyPhrase       31.76      (2.0%)       31.79      (1.6%)    0.1% (  -3% -    3%)
       HighIntervalsOrdered        6.10      (1.6%)        6.11      (2.1%)    0.1% (  -3% -    3%)
           OrHighLow      177.27      (2.2%)      177.50      (2.5%)    0.1% (  -4% -    5%)
     LowSloppyPhrase       30.62      (1.9%)       30.67      (1.7%)    0.2% (  -3% -    3%)
         MedSpanNear        7.63      (1.7%)        7.64      (2.0%)    0.2% (  -3% -    3%)
         LowSpanNear        8.34      (1.3%)        8.37      (1.8%)    0.3% (  -2% -    3%)
        OrNotHighLow      308.14      (2.0%)      309.75      (4.9%)    0.5% (  -6% -    7%)
             LowTerm      861.94      (4.6%)      870.93      (2.8%)    1.0% (  -6% -    8%)
    HighSloppyPhrase        5.58      (3.2%)        5.64      (2.8%)    1.1% (  -4% -    7%)
           OrHighMed       10.81      (2.4%)       10.95      (2.6%)    1.4% (  -3% -    6%)
          OrHighHigh       10.28      (2.5%)       10.48      (3.2%)    1.9% (  -3% -    7%)

Seems there is no degradation?

@atris atris force-pushed the atris:LUCENE-8213 branch from 819acd2 to 0bcae3e Sep 9, 2019
@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2019

Rebased with master.

Any thoughts on this one? Seems like a useful change with no degradation in the happy path?

@mikemccand

This comment has been minimized.

Copy link
Member

commented Sep 10, 2019

It should be enough to report the stats after the last iteration - it is cumulative, so the previous ones just add noise? I agree QPS looks pretty noisy, probably no real change.

I dont think thats true, since each run is its own JVM?

It is true -- luceneutil runs multiple JVMs to try to sample the noise due to hotspot mis-compilation, and then each iteration in reports the cumulative results of all prior iterations so far. So reporting the last result is good.

@mikemccand

This comment has been minimized.

Copy link
Member

commented Sep 10, 2019

It should be enough to report the stats after the last iteration - it is cumulative, so the previous ones just add noise? I agree QPS looks pretty noisy, probably no real change. Could you post the latency stats in a more readable table here? It looks as if you have markdown there: I think github will accept that

+1 to inline the latency results in a readable way here.

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 11, 2019

@mikemccand Thanks for the inputs, updated the PR. Please let me know your comments

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 13, 2019

Any further thoughts on this one?

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 16, 2019

@mikemccand Thanks for reviewing -- updated per comments. Please see and let me know your thoughts.

@atris atris self-assigned this Sep 19, 2019
@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

@mikemccand Thanks, fixed. Interestingly, moving the asynchronous load check to cacheAsynchronously also removed the need for the new exception. Please see the latest and share your comments.

List<Query> inFlightQueries() {
lock.lock();
try {
return new ArrayList<>(inFlightAsyncLoadQueries);

This comment has been minimized.

Copy link
@mikemccand

mikemccand Sep 20, 2019

Member

I'm still confused about lock -- do we always hold the lock when checking if query is already in the map? If so, we don't need a ConcurrentHashMap? If not, why do we even have the lock since it is a ConcurrentHashMap.

This comment has been minimized.

Copy link
@atris

atris Sep 20, 2019

Author Contributor

Not all places which access inFlightAsyncLoadQueries take a lock -- the main being cacheAsynchronously

This comment has been minimized.

Copy link
@atris

atris Sep 20, 2019

Author Contributor

This specific access site does not lead a lock -- thanks for highlighting that, fixed!

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2019

@mikemccand Thanks for your inputs, updated the same

@mikemccand

This comment has been minimized.

Copy link
Member

commented Sep 20, 2019

@mikemccand Thanks, fixed. Interestingly, moving the asynchronous load check to cacheAsynchronously also removed the need for the new exception. Please see the latest and share your comments.

Ahh that is a nice side effect!

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2019

@mikemccand Thanks, fixed. Interestingly, moving the asynchronous load check to cacheAsynchronously also removed the need for the new exception. Please see the latest and share your comments.

Ahh that is a nice side effect!

Indeed!

Does the latest iteration look ready? Anything that sticks out?

lucene/CHANGES.txt Outdated Show resolved Hide resolved
return in.scorerSupplier(context);
}
else {
docIdSet = cache(context);

This comment has been minimized.

Copy link
@mikemccand

mikemccand Sep 24, 2019

Member

I think this means async caching can be more efficient, because with single threaded caching, multiple threads could do the work to try the cache the same Query, with only one of them winning in the end, but with async caching, we ensure only one search thread does the caching? So e.g. red-line QPS (capacity) could be a bit higher with async, if queries are often duplicated at once?

This comment has been minimized.

Copy link
@atris

atris Sep 24, 2019

Author Contributor

+1, that is a great observation, thanks for highlighting it!

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2019

@mikemccand Updated the PR, please see and let me know.

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2019

@mikemccand Updated, please see

Copy link
Member

left a comment

I left some minor comments ... I think this is ready after that! Thanks @atris! This is an exciting change, especially because it means in some cases (same query in flight in multiple query threads), if you pass an Executor to IndexSearcher, it's more efficient than the single threaded case.

@atris atris force-pushed the atris:LUCENE-8213 branch from f15b3c2 to f2d230d Sep 28, 2019
@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2019

Thanks @mikemccand ! This was an extensive review -- thank you for spending the time on it!

@atris

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2019

Ran the Lucene test suite on the latest iteration -- came in clean.

@atris atris merged commit 0dfbf55 into apache:master Sep 28, 2019
1 check passed
1 check passed
ant precommit w/ Java 11
Details
atris added a commit that referenced this pull request Oct 2, 2019
atris added a commit that referenced this pull request Oct 2, 2019
…815)" (#914)

This reverts commit 0dfbf55.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.