Flatten atomic arrays, add tests for LRUMap size #3675

yawkat · 2022-11-18T10:55:37Z

Flatten readBuffers array to be one-dimensional, this should have no performance impact (if it does, probably positive)
Move to Atomic*Array to reduce footprint (similar to WIP: Lower PrivateMaxEntriesMap readBuffer size #3670 but also for the long arrays)
Reduce NUMBER_OF_READ_BUFFERS and READ_BUFFER_THRESHOLD to match the values in WIP: Lower PrivateMaxEntriesMap readBuffer size #3670, this is required so that the test doesn't rely on the number of cores of the test runner
Add a test based on jol to verify that the ObjectMapper doesn't have too large a footprint

Using the JOL test, I compared the footprints of various versions of this code. My machine has 24 cores, which influences some of the results.

2.13 (target)       4376
2.14                259240
+ flat ARefArray    60760
+ ALongArray        57016
+ READ_BUFFERS = 4  11992
+ THRESHOLD = 4     6616

The tests before READ_BUFFERS = 4 depend on cpu count. As you can see, with all the changes in this PR, we are barely above the memory use of 2.13.

…environment

…lure...

yawkat · 2022-11-18T11:30:59Z

Had to change the JOL measurement code so that the full test suite runs. This also changed the numbers a bit, I've adjusted the original PR comment.

pjfanning · 2022-11-19T12:58:56Z

@yawkat could you rebase this? There was a merge to 2.14 that removed some unnecessary methods from PrivateMaxEntriesMap.

yawkat · 2022-11-19T13:02:09Z

will do

yawkat · 2022-11-19T13:06:51Z

@pjfanning you mean 6eacc7b? it's already in.

ben-manes · 2022-11-19T18:42:16Z

very minor field removals you can also make,

concurrencyLevel was retained for recreating after serialization, can be removed as no longer relevant for CHM
weightedSize might be removed in favor of Map.size() due to no weigher. It should already handle races (too large but nothing in the evictionDeque).
capacity could be a final int as no longer modified at runtime, tiny caches
keySet / values / entrySet could be removed and new instances returned
AbstractMap has private unused fields, could inline the useful base methods: equals, hashCode, toString
AtomicLongArray could be AtomicIntegerArray since it just wraps around
LinkedDeque was to pull out an abstraction from a large, intertwined, complex class. Mostly it is boilerplate for the Deque interface with only a few methods used for the doubly-linked list manipulation. Inlining wouldn't save much, so toss up.
WeightedValue might be replaced with a nullable value. If non-null then "alive", else "retired". This removes the wrapper for the weight field.

src/test/java/com/fasterxml/jackson/databind/MapperFootprintTest.java

cowtowncoder · 2022-11-20T02:49:26Z

src/test/java/com/fasterxml/jackson/databind/MapperFootprintTest.java

+
+public class MapperFootprintTest {
+    @Test
+    @Ignore


Probably remove this?

see comment below

Ah. Another thing we can do to avoid running as part of test suite (but to allow manual run) would be to move under failing. But we can figure out best way later on.

cowtowncoder · 2022-11-20T02:52:31Z

Sounds great so far! Just LMK when you feel this is ready @yawkat and I'll merge it in.
And then I can start thinking about 2.14.1 release since I think this warrants patch release, even if it's quite a bit of work and with fewer fixes as usual for .1.

I wonder if we could/should start using jol for testing other data structures as well; probably only if/when we have specific cases. For 3.0 could perhaps test JsonMapper for total size, with strict limits to see how its size increases.
Actually could even do for 2.x for same reason: add limit as size grows, but have some idea of cumulative size (of newly created "empty" mapper).
But that'd be for different PRs, just an idea.

yawkat · 2022-11-20T06:44:18Z

oh this is annoying – I thought just the .subtract fixed it, but apparently I just forgot to un-ignore the test so CI passed...

JOL goes through all objects transitively referenced by a given root (in this case ObjectMapper), and counts them up. For jackson, much of this is reflection instances, e.g. Class<?> and Method. The problem when running this in the test suite is that all the other tests populate lazy fields in these reflection instances, so this naive count will show more memory used by a new ObjectMapper after all other test have run, than the same count will show before the tests are run.

The solution to this is to only count the new objects each new ObjectMapper references. That way, the core reflection instances are not counted. This is why I use the GraphLayout.subtract method in my test: I create two ObjectMappers, and only count the memory footprint of the objects that they don't share.

Unfortunately the test suite still fails when using subtract. My guess is that this is caused by GC: GC can move objects in memory, which changes their memory address, which breaks the deduplication logic of subtract. The javadoc of subtract says to "quiesce the heap" to avoid this – I thought I could get away without doing that, but apparently I can't. I added a simple gc loop before the test. This was enough to get the full test suite working locally, but CI seems to be more memory-constrained and fails easily.

To be clear, the test works well when run in isolation, and the footprint results I listed in the PR text are good. But I'd hoped to get the test running in CI, but it seems to be too flaky for that atm. I have disabled it with @Ignore again – maybe someone else can figure out a solution.

ben-manes · 2022-11-20T06:57:37Z

I use GcFinalization.awaitFullGc() from guava’s testlib for anything that depends on that running (e.g. weak/soft references). I use Parallel GC for those tests as region based, like G1, are more incremental. That might help stabilize here.

ben-manes · 2022-11-20T07:07:14Z

Otherwise for a rougher estimate you can try jamm which won’t be impacted by GC moves as higher level than address checks for its dedupe.

yawkat · 2022-11-20T07:18:36Z

@ben-manes awaitFullGc is better than the loop, and guava is already in the pom, thanks. However it does not seem to be enough, the fact that even running the analysis twice does not help seems to indicate that gc is triggered even by two calls to new ObjectMapper.

I hadn't heard of jamm, but a quick look at the api does not show a way to run the equivalent of the subtract call in jol? It is needed in order to exclude objects that are global, such as reflection instances.

(btw re your other optimization suggestions, they seem reasonable, but I will let someone that is actually familiar with the LRUMap impl do them in a future PR)

ben-manes · 2022-11-20T07:25:54Z

The default GC is probably G1, so it will only collect a single region. Also if your test suite is parallelized then other tests might cause some mayhem if you are relying on stable physical addresses. That would require isolated testing this is more finicky than normal cases.

I think that MemoryMeter.measureDeep(Object) could give you close enough to what you want.

yawkat · 2022-11-20T07:34:49Z

Yea, I don't know if the tests are parallel, that would be an added complication.

I haven't tried it, but the problem I see with measureDeep is one of consistency depending on test order, instead of consistency depending on GC. I saw this with the initial test that didn't use subtract: When the test runs early in the suite (e.g. when it runs in isolation), the memory size is fairly small, because classes like Method are populated lazily. After the other tests run, these Method objects were populated by the other tests, so the same count will give a bigger estimate. The subtract call fixes this problem by removing any objects that are shared between ObjectMapper instances, such as these lazily populated fields (or rather the Method instances that contain them).

If I understand correctly what jamm measureDeep does, I think JOL can actually do the same thing, simply by not using subtract. JOLs GraphLayout.parseInstance does not have the GC problems, it uses an IdentityHashSet for deduplication. It's only the subtract call that is problematic.

ben-manes · 2022-11-20T07:45:18Z

Thanks, that makes more sense to me now. It sounds like this should not be run as part of the normal test suite due to global pollution. What about creating a new test category and running this exclusively and outside of the normal suite?

yawkat · 2022-11-20T07:51:37Z

Yea, that sounds reasonable. I don't want to fiddle with the build too much in this PR, but we can do it after the fix is in.

Another solution could be changing jol to use object identity instead of addresses for the subtract call. I've reached out to shipilev about this.

cowtowncoder · 2022-11-20T18:54:53Z

Ok yeah, while CI would of course be really good to have, manually run verification is enough for this PR.
So @yawkat it sounds like you think this is ready for initial merging? If so, I'll go ahead and merge it.

Follow-up steps then could include:

Add (some or all of) suggestions by @ben-manes and have a look how they behave
Add manually run task (or cli command / script) for running jol to inspect various sizes to help figure out effect of changes locally

But even without these I think we'd be ready for minimal set of 2.14.1, and could consider if there are other critical fixes/improvements to add, before proceeding.

Thank you @yawkat @ben-manes and @pjfanning for all your work here, I very much appreciate your expertise and help.

yawkat · 2022-11-20T18:57:30Z

@cowtowncoder correct, the PR is ready to merge, and agree on the follow up steps. I can take a look at the testing situation tomorrow, most likely using @ben-manes' idea of a different test group. Then it can run in CI too.

pjfanning · 2022-11-20T19:49:40Z

It's safe to proceed with this IMHO.

The new map is slower than the v2.14.0 map due to more contention on the smaller readBuffer. Nothing drastic. And for ObjectMapper, there is plenty of other places where CPU time is spent so all in all, I doubt if the small slowdown in the PrivateMaxEntriesMap will have any real effect.

Benchmark                                  (cacheType)   Mode  Cnt         Score         Error  Units
ConcurrentBench.read_only                Jackson214Map  thrpt   25  15810558.870 ± 2112646.327  ops/s
ConcurrentBench.read_only                       NewMap  thrpt   25  12438884.722 ± 2064042.671  ops/s
ConcurrentBench.readwrite                Jackson214Map  thrpt   25  13503648.956 ±  647087.994  ops/s
ConcurrentBench.readwrite:readwrite_get  Jackson214Map  thrpt   25  11261181.499 ±  616862.111  ops/s
ConcurrentBench.readwrite:readwrite_put  Jackson214Map  thrpt   25   2242467.457 ±  108543.968  ops/s
ConcurrentBench.readwrite                       NewMap  thrpt   25  10146482.677 ±  748278.055  ops/s
ConcurrentBench.readwrite:readwrite_get         NewMap  thrpt   25   8246303.960 ±  680316.563  ops/s
ConcurrentBench.readwrite:readwrite_put         NewMap  thrpt   25   1900178.717 ±   70658.801  ops/s
ConcurrentBench.write_only               Jackson214Map  thrpt   25   8198165.057 ±  210140.537  ops/s
ConcurrentBench.write_only                      NewMap  thrpt   25   7123823.312 ±  225330.816  ops/s

https://github.com/pjfanning/jackson-lru-cache-bench

pjfanning · 2022-11-20T19:51:06Z

I'm not sure that the extra changes suggested for PrivateMaxEntriesMap will have much effect.

cowtowncoder · 2022-11-21T00:39:52Z

Ok. Thank you for getting these numbers @pjfanning !

Somewhat disappointing to note no performance improvement for test cases, although I realize the main goal was to avoid that one-off for cache filling up. But I agree that either way for steady state effect would probably not be measurable for typical usage since cache lookup rate is not super high. We'll go with this.

yawkat added 3 commits November 18, 2022 11:42

Use atomic arrays, flatten readBuffers

44c0e21

limit NUMBER_OF_READ_BUFFERS to 4 to fix the test being dependent on …

5866c76

…environment

reduce READ_BUFFER_THRESHOLD as well

f7131a5

yawkat mentioned this pull request Nov 18, 2022

ObjectMapper default heap consumption increased significantly from 2.13.x to 2.14.0 #3665

Closed

yawkat added 2 commits November 18, 2022 12:06

print footprint before the assertion so i can figure out the test fai…

0115910

…lure...

fix jol size estimates to run full test suite

6b70a48

cowtowncoder reviewed Nov 20, 2022

View reviewed changes

src/test/java/com/fasterxml/jackson/databind/MapperFootprintTest.java Outdated Show resolved Hide resolved

cowtowncoder reviewed Nov 20, 2022

View reviewed changes

This was referenced Nov 20, 2022

WIP: Lower PrivateMaxEntriesMap readBuffer size #3670

Closed

revert jackson 2.14.0 LRUMap changes (LRUMap now acts like it does in Jackson 2.13.x) #3673

Closed

yawkat added 2 commits November 20, 2022 07:34

fix test again

59443b9

fix test again

a2c7ce4

ignore the test again - see pr comment :(

6475098

yawkat marked this pull request as ready for review November 20, 2022 06:53

yawkat added 2 commits November 20, 2022 08:13

try guava awaitFullGc

890d4a9

ignore the test again - see pr comment :(

af8f828

cowtowncoder approved these changes Nov 20, 2022

View reviewed changes

cowtowncoder merged commit 5ab1d3e into FasterXML:2.14 Nov 20, 2022

cowtowncoder added this to the 2.14.1 milestone Nov 20, 2022

yawkat mentioned this pull request Nov 21, 2022

Isolated execution of MapperFootprintTest #3678

Merged

carterkozak mentioned this pull request Apr 12, 2023

TypeFactory cache performance degradation with constructSpecializedType() #3876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flatten atomic arrays, add tests for LRUMap size #3675

Flatten atomic arrays, add tests for LRUMap size #3675

yawkat commented Nov 18, 2022 •

edited

yawkat commented Nov 18, 2022

pjfanning commented Nov 19, 2022

yawkat commented Nov 19, 2022

yawkat commented Nov 19, 2022

ben-manes commented Nov 19, 2022 •

edited

cowtowncoder Nov 20, 2022

yawkat Nov 20, 2022

cowtowncoder Nov 20, 2022

cowtowncoder commented Nov 20, 2022

yawkat commented Nov 20, 2022

ben-manes commented Nov 20, 2022

ben-manes commented Nov 20, 2022

yawkat commented Nov 20, 2022

ben-manes commented Nov 20, 2022

yawkat commented Nov 20, 2022

ben-manes commented Nov 20, 2022

yawkat commented Nov 20, 2022

cowtowncoder commented Nov 20, 2022

yawkat commented Nov 20, 2022

pjfanning commented Nov 20, 2022

pjfanning commented Nov 20, 2022

cowtowncoder commented Nov 21, 2022 •

edited

Flatten atomic arrays, add tests for LRUMap size #3675

Flatten atomic arrays, add tests for LRUMap size #3675

Conversation

yawkat commented Nov 18, 2022 • edited

yawkat commented Nov 18, 2022

pjfanning commented Nov 19, 2022

yawkat commented Nov 19, 2022

yawkat commented Nov 19, 2022

ben-manes commented Nov 19, 2022 • edited

cowtowncoder Nov 20, 2022

Choose a reason for hiding this comment

yawkat Nov 20, 2022

Choose a reason for hiding this comment

cowtowncoder Nov 20, 2022

Choose a reason for hiding this comment

cowtowncoder commented Nov 20, 2022

yawkat commented Nov 20, 2022

ben-manes commented Nov 20, 2022

ben-manes commented Nov 20, 2022

yawkat commented Nov 20, 2022

ben-manes commented Nov 20, 2022

yawkat commented Nov 20, 2022

ben-manes commented Nov 20, 2022

yawkat commented Nov 20, 2022

cowtowncoder commented Nov 20, 2022

yawkat commented Nov 20, 2022

pjfanning commented Nov 20, 2022

pjfanning commented Nov 20, 2022

cowtowncoder commented Nov 21, 2022 • edited

yawkat commented Nov 18, 2022 •

edited

ben-manes commented Nov 19, 2022 •

edited

cowtowncoder commented Nov 21, 2022 •

edited