Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements to use read lock to access LRUQueryCache #13306

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

boicehuang
Copy link

@boicehuang boicehuang commented Apr 15, 2024

Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields.

Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields.

One reason is that it uses a ReentrantLock to limit accessing LRUQueryCache. QPS and costs of their queries are often high, which often causes trying locking failures when obtaining the cache, resulting in low concurrency in accessing the cache.

So I replace the ReentrantLock with a ReentrantReadWriteLock. I only use the read lock when I need to get the cache for a query,

I benchmarked this optimization by mocking some random LongPoint and querying them with one PointInSetQuery with bool filter.

doc count field cardinality query terms count baseline QPS candidate QPS diff percentage
30000000 10 1 2481 5102 105.6%
30000000 1000000 1 6396 6596.48 3.1%

I think this change can help filter queries that need to query low-cardinality fields.

@boicehuang
Copy link
Author

This optimization also benefits high-cost querys such as terms query with 10000 terms, by reading cache more frequently instead of searching inverted index

doc count field cardinality query terms count baseline QPS candidate QPS diff percentage
30000000 1000000 10000 160 473 191.9%

@@ -265,7 +269,6 @@ boolean requiresEviction() {
}

CacheAndCount get(Query key, IndexReader.CacheHelper cacheHelper) {
assert lock.isHeldByCurrentThread();
assert key instanceof BoostQuery == false;
assert key instanceof ConstantScoreQuery == false;
final IndexReader.CacheKey readerKey = cacheHelper.getKey();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with anything multi-threaded, this is tricky. How we are actually doing the LRU, is via the uniqueQueries which is a LinkedHashMap constructed by new LinkedHashMap<>(16, 0.75f, true);

Which means, a get against this will actually do an UPDATE to the LinkedHashMap order. If multiple readers attempt to do this at the same time, I would think this would then become indeterminate.

From the java docs to LinkedHashMap

 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access a linked hash map concurrently, and at least
 * one of the threads modifies the map structurally, it <em>must</em> be
 * synchronized externally.  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the map.

Since we have order dictated at access, not on insertion, we are modifying the map structurally on read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, Collections.synchronizedMap should be used, or locking manually around this particular get.

And that means now we are locking on read again, which might mean all performance gains go away.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we adjust LinkedHashMap to be ConcurrentHashMap ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjust cache AND uniqueQueries to be ConcurrentHashMap objects and get the benchmark results as follows.

doc count field cardinality query point baseline QPS candidate QPS diff percentage
30000000 10 1 2481 5102 105.6%
30000000 10 1 2481 4843 95.2% (using LongAdder )
30000000 10 1 2481 3851 55% (using LongAdder And ConcurrentHashMap)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ConcurrentHashMap, although the order of insertion operations is uncertain, which leads to test failure, it ensures the thread safety of insertion operations and still has a 55% improvement.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but IdentityHashMap is not thread-safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but IdentityHashMap is not thread-safe.

I may be wrong, but I though if the writing to IdentityHashMap was protected (which it is via the write lock, which would block the readers), that means during read it isn't being mutated, and writes are fully synced. Consequently, its perfectly fine to read from multiple threads.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my optimization, uniqueQueries is Collections.synchronizedMap(LinkedHashMap) to achieve LRU expiration, the writing to cache object is protected via the write lock, but reading cache object can be a concurrent action via making cache object to be ConcurrentHashMap. This is the key point of my optimization. I am also trying to read through all the concurrency paths.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need IdentityHashMap to clear the cache of expired queries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write lock will block the read lock (and read will block write). Meaning, structural changes to the IdentityHashMap are protected. It doesn't need to be wrapped Collections.synchronizedMap. In fact, the java docs for this lock has an example exactly like this but for TreeHashMap: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html

@@ -628,7 +632,7 @@ private class LeafCache implements Accountable {

LeafCache(Object key) {
this.key = key;
cache = new IdentityHashMap<>();
cache = Collections.synchronizedMap(new IdentityHashMap<>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Aren't all accesses to LeafCache protected by the LRUQueryCache read/write locks?

Since LeafCache isn't a static class, it should have access to the enclosing class's lock.

For testing safety, putIfAbsent, remove, onDocIdSetCache, and onDocIdSetEviction should all do a assert writeLock.isHeldByCurrentThread();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood here before. Since there is a read lock, concurrent access to IdentityHashMap does not require additional synchronized lock. I optimized the code as per your suggestion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LRUCache 优化结构4

@benwtrent
Copy link
Member

@boicehuang what are the new numbers for your benchmarks for the current iteration? Indeed we will be synchronizing more, so I wonder if we will still see improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants