Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix JSONPath cache inefficient issue #7409

Merged
merged 2 commits into from
Nov 3, 2021

Conversation

Ferrari6
Copy link
Contributor

@Ferrari6 Ferrari6 commented Sep 8, 2021

Description

this commit fixes #7403

Background

When we used jsonpath transformation functions, we found that there was a delay in consumption, and the CPU usage was very high. Analysis of jstack found that the consumption threads were waiting for the lock of LRUCache in jayway, and further analysis of the CPU and lock contented, we can confirm that this inefficient LRUCache is the consumption performance bottleneck.

stack trace
image

**flamegraphs can be found in the issue descriptions #7403 **

Fix

A new JSON path cache is implemented using ConcurrentHashMap, and the cache threshold is set at the same time. When the maximum is exceeded, the JSON path will not be cached anymore.

  • In Pinot, the number of JSON paths is bounded by the size of the transformation config
  • Even if it exceeds the maximum cache size, not cache JSON path may be better than frequent swapping in and out of LRU
  • If JSON path compile is not cached, CPU consumption is also very small

"transformConfigs": [
        {
          "columnName": "id",
          "transformFunction": "jsonPathString(report,'$.identifiers.id','')"
        },
       {
          "columnName": "name",
          "transformFunction": "jsonPathString(report,'$.identifiers.name','')"
        },
 ...
]

Pinot Server Flamegraphs when using ConcurrentHashMap cache (28vcpu)

image
image
jsonpath CPU usage is low and no lock contentions
-->

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2021

Codecov Report

Merging #7409 (fd7aed0) into master (421645d) will decrease coverage by 0.02%.
The diff coverage is 85.71%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7409      +/-   ##
============================================
- Coverage     71.54%   71.51%   -0.03%     
+ Complexity     4036     4033       -3     
============================================
  Files          1579     1580       +1     
  Lines         80390    80386       -4     
  Branches      11945    11944       -1     
============================================
- Hits          57512    57489      -23     
- Misses        18996    19012      +16     
- Partials       3882     3885       +3     
Flag Coverage Δ
integration1 29.21% <74.28%> (-0.03%) ⬇️
integration2 27.67% <48.57%> (-0.09%) ⬇️
unittests1 68.61% <80.00%> (-0.05%) ⬇️
unittests2 14.55% <0.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...m/function/JsonExtractScalarTransformFunction.java 49.52% <60.00%> (-0.48%) ⬇️
...form/function/JsonExtractKeyTransformFunction.java 78.26% <92.30%> (+4.18%) ⬆️
...rg/apache/pinot/common/function/JsonPathCache.java 100.00% <100.00%> (ø)
...he/pinot/common/function/scalar/JsonFunctions.java 80.35% <100.00%> (-1.00%) ⬇️
...a/manager/realtime/RealtimeSegmentDataManager.java 50.00% <0.00%> (-25.00%) ⬇️
...nt/local/startree/v2/store/StarTreeDataSource.java 40.00% <0.00%> (-13.34%) ⬇️
...n/java/org/apache/pinot/common/utils/URIUtils.java 66.66% <0.00%> (-7.41%) ⬇️
.../common/request/context/predicate/EqPredicate.java 66.66% <0.00%> (-6.67%) ⬇️
...mmon/request/context/predicate/NotEqPredicate.java 66.66% <0.00%> (-6.67%) ⬇️
...elix/core/periodictask/ControllerPeriodicTask.java 76.00% <0.00%> (-6.00%) ⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 421645d...fd7aed0. Read the comment docs.

Comment on lines 54 to 64
public void testSimpleJsonPathMapCacheWorks() throws JsonProcessingException {
String path = "$.contact.email";
assertEquals(JsonFunctions.jsonPathString(_jsonString, path), "test@example.com");
JsonPathMapCache cache = (JsonPathMapCache) CacheProvider.getCache();

// verify json path has been cached
LinkedList<Predicate> filterStack = new LinkedList<>(Collections.emptyList());
String cacheKey = Utils.concat(path, filterStack.toString());
JsonPath jsonPath = cache.get(cacheKey);
assertNotNull(jsonPath);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on undocumented library behaviour, it assumes the format of the cache keys. I have actually submitted a PR to the library to change this behaviour, which would break this test: json-path/JsonPath#750

I suggest iterating over the map instead (requires exposing the keys on JsonPathMapCache).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just use a Mockito spy and verify put was called, I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestions.
I always prefer to use Mockitio for unit testing, but all method calls related to JSONFunctions are static, so I try to check the map size now.

* of it.
*/
public class JsonPathMapCache implements Cache {
private final ConcurrentMap<String, JsonPath> _pathCache = new ConcurrentHashMap<>(128, 0.75f, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to explicitly set the concurrency level or load factor, these are the default values any way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it. This is the habit of my previous team!

* the number of JSON paths is bounded by the size of the transformation config,
* add this threshold for protection from the extreme cases.
*/
public static final int MAX_JSON_PATH_CACHE_SIZE = 1024 * 16;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should be removed and the constant inlined into JsonPathMapCache where it is relevant.

Copy link
Member

@richardstartin richardstartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an approver but this looks good to me, and solves a big problem 👍🏻

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

* a lot of unnecessary lock waits during high concurrent data ingestion,
* and LRU mechanism is inappropriate for Pinot bounded size of the
* transformation config, so we should use this simple Map cache instead
* of it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JsonPath can also be used at query time (see JsonExtractScalarTransformFunction). Let's add a TODO here to add some evict policy in the future

Copy link
Member

@richardstartin richardstartin Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future this can be handled better by just precompiling JsonPath objects and bypassing the library's cache entirely. For transformConfig, it would require changes to the way ScalarFunction is defined as well as changes to InbuiltFunctionEvaluator.planExecution (this would also permit validation and cleansing of config, which is currently impossible). For JsonExtractScalarTransformFunction the change is trivial.

I think in the meantime using a guava Cache instead of a ConcurrentHashMap as suggested on slack would alleviate this concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think guava Cache is a more safe cache solution cause the JsonPath cache will also be applied to the query.
For one of my table ingestion cases, reading data QPS close to 10w and configuring JsonPath transformations with dozens of fields, this will put a certain pressure on GC and CPU because of the LRU algorithm needs to record every get. Of course, this should not become a system bottleneck, but the implementation is a bit too heavy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guava Cache uses recencyQueue to record cache accesses. A large number of CAS enqueue operations generated by hotkeys will cause CPU consumption, and this synchronous recording will also slow down cache reading.

I think maybe Caffeine Cache is a good choice. Its Striped-RingBuffer design is more efficient for GC and CPU usage. And its API is very similar to Guava Cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether using Guava's cache would result in high CPU consumption needs measurement. I agree that Caffeine is a better cache implementation, but Guava is already on the classpath. Is this particular use case important enough to add a 1MB depedency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the default value of the concurrencyLevel of Guava cache. In this scenario, there will be very few caches writes and no need to update the cache. My concern is the way Guava records cache reads, which is not very efficient.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance of cache reads improves with the concurrency level due to reducing the likelihood of contending on the same CLQ instance. The benchmarks use a zipf to simulate a hotspot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the insights @ben-manes. FWIW I have been using Caffeine in commercial projects for years, the only question here is whether this particular use case is worth adding a relatively large (in bytes) dependency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a problem. I co-authored Guava's so either way my code is used. 😄

@Ferrari6 is correct in what your preference and default should be, though. Prior to my involvement, the Guava team bet on reference caching (MapMaker) and spent their complexity budget by forking the hash table as an optimization. This was a mistake because soft references can cause GC death spirals of stop the world events and unpredictable evictions, but looked fine in a naive benchmark. This blew their complexity budget, so porting size eviction from CLHM favored simplicity over performance.

The longer-term problem you'll face is that no one maintains CacheBuilder. The last big change was adding Map.compute, but it was riddled with major bugs and done inefficiently. I fixed some of those problems, but there are show stoppers. If you keep to the historic functionality then Guava's can be made acceptable in most cases. Caffeine does have jar bloat by code generating per-configuration entry classes to minimize the memory footprint. In those cases where disk is a premium some projects embed CLHM into their code base, e.g. msjdbc and groovy. You'll probably have many caches throughout Pinot making the Caffeine dependency worthwhile even if you can get by in a case-by-case basis.

@snleee
Copy link
Contributor

snleee commented Sep 15, 2021

@Ferrari6 Can you rebase based on the master branch and retrigger github action tests? We recently introduced the flakyness to one of our integration test and fixed from #7432

@Ferrari6
Copy link
Contributor Author

Ferrari6 commented Sep 15, 2021

@Ferrari6 Can you rebase based on the master branch and retrigger github action tests? We recently introduced the flakyness to one of our integration test and fixed from #7432

yes, sure. But because the setCache of CacheProvider in Jayway can only be called once, and once getCache is called, it cannot be setCache again. So I need to do some code changes.

@Ferrari6
Copy link
Contributor Author

Both JsonFunctions and JsonExtractScalarTransformFunction use JsonPath, so need to set Jayway's CacheProvider before the initialization of these two functions and can only be set once. At the same time, I am also doing some performance tests on Guava Cache, I will submit some more code soon.

@mayankshriv
Copy link
Contributor

Some tests failed, rerunning to see if intermittent issue due to timeouts.

@Ferrari6
Copy link
Contributor Author

Some tests failed, rerunning to see if intermittent issue due to timeouts.

Thanks @mayankshriv. I have fixed the failed tests.

@mayankshriv
Copy link
Contributor

w

Some tests failed, rerunning to see if intermittent issue due to timeouts.

Thanks @mayankshriv. I have fixed the failed tests.

@Ferrari6 seems like there are still failing tests.

@Jackie-Jiang Jackie-Jiang force-pushed the jsonpath-cache-improve branch 3 times, most recently from 740572f to e78ac2c Compare November 2, 2021 19:32
Addressed comments to improve unit test

Addressed review comments
- Remove Constants class and configure inlined into JsonPathMapCache
- Remove useless ConcurrentHashMap parameters

Addressed review comments

Co-authored-by: Xiaotian (Jackie) Jiang <17555551+Jackie-Jiang@users.noreply.github.com>

Change cache maximum size to reduce memory usage

use Guava cache for json path

add json path cache integration test
Fix the issue of setting different default configurations
Do not access the JsonPath cache in the transform function
@Jackie-Jiang Jackie-Jiang merged commit 14c377d into apache:master Nov 3, 2021
richardstartin added a commit to richardstartin/pinot that referenced this pull request Nov 3, 2021
walterddr pushed a commit to walterddr/pinot that referenced this pull request Nov 3, 2021
kriti-sc pushed a commit to kriti-sc/incubator-pinot that referenced this pull request Dec 12, 2021
Also fix the issue of setting different default configurations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSONPath cache is inefficient
7 participants