Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatten atomic arrays, add tests for LRUMap size #3675

Merged
merged 10 commits into from
Nov 20, 2022

Conversation

yawkat
Copy link
Member

@yawkat yawkat commented Nov 18, 2022

See #3665

  • Flatten readBuffers array to be one-dimensional, this should have no performance impact (if it does, probably positive)
  • Move to Atomic*Array to reduce footprint (similar to WIP: Lower PrivateMaxEntriesMap readBuffer size #3670 but also for the long arrays)
  • Reduce NUMBER_OF_READ_BUFFERS and READ_BUFFER_THRESHOLD to match the values in WIP: Lower PrivateMaxEntriesMap readBuffer size #3670, this is required so that the test doesn't rely on the number of cores of the test runner
  • Add a test based on jol to verify that the ObjectMapper doesn't have too large a footprint

Using the JOL test, I compared the footprints of various versions of this code. My machine has 24 cores, which influences some of the results.

2.13 (target)       4376
2.14                259240
+ flat ARefArray    60760
+ ALongArray        57016
+ READ_BUFFERS = 4  11992
+ THRESHOLD = 4     6616

The tests before READ_BUFFERS = 4 depend on cpu count. As you can see, with all the changes in this PR, we are barely above the memory use of 2.13.

@yawkat
Copy link
Member Author

yawkat commented Nov 18, 2022

Had to change the JOL measurement code so that the full test suite runs. This also changed the numbers a bit, I've adjusted the original PR comment.

@pjfanning
Copy link
Member

@yawkat could you rebase this? There was a merge to 2.14 that removed some unnecessary methods from PrivateMaxEntriesMap.

@yawkat
Copy link
Member Author

yawkat commented Nov 19, 2022

will do

@yawkat
Copy link
Member Author

yawkat commented Nov 19, 2022

@pjfanning you mean 6eacc7b? it's already in.

@ben-manes
Copy link

ben-manes commented Nov 19, 2022

very minor field removals you can also make,

  • concurrencyLevel was retained for recreating after serialization, can be removed as no longer relevant for CHM
  • weightedSize might be removed in favor of Map.size() due to no weigher. It should already handle races (too large but nothing in the evictionDeque).
  • capacity could be a final int as no longer modified at runtime, tiny caches
  • keySet / values / entrySet could be removed and new instances returned
  • AbstractMap has private unused fields, could inline the useful base methods: equals, hashCode, toString
  • AtomicLongArray could be AtomicIntegerArray since it just wraps around
  • LinkedDeque was to pull out an abstraction from a large, intertwined, complex class. Mostly it is boilerplate for the Deque interface with only a few methods used for the doubly-linked list manipulation. Inlining wouldn't save much, so toss up.
  • WeightedValue might be replaced with a nullable value. If non-null then "alive", else "retired". This removes the wrapper for the weight field.


public class MapperFootprintTest {
@Test
@Ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably remove this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Another thing we can do to avoid running as part of test suite (but to allow manual run) would be to move under failing. But we can figure out best way later on.

@cowtowncoder
Copy link
Member

Sounds great so far! Just LMK when you feel this is ready @yawkat and I'll merge it in.
And then I can start thinking about 2.14.1 release since I think this warrants patch release, even if it's quite a bit of work and with fewer fixes as usual for .1.

I wonder if we could/should start using jol for testing other data structures as well; probably only if/when we have specific cases. For 3.0 could perhaps test JsonMapper for total size, with strict limits to see how its size increases.
Actually could even do for 2.x for same reason: add limit as size grows, but have some idea of cumulative size (of newly created "empty" mapper).
But that'd be for different PRs, just an idea.

@yawkat
Copy link
Member Author

yawkat commented Nov 20, 2022

oh this is annoying – I thought just the .subtract fixed it, but apparently I just forgot to un-ignore the test so CI passed...

JOL goes through all objects transitively referenced by a given root (in this case ObjectMapper), and counts them up. For jackson, much of this is reflection instances, e.g. Class<?> and Method. The problem when running this in the test suite is that all the other tests populate lazy fields in these reflection instances, so this naive count will show more memory used by a new ObjectMapper after all other test have run, than the same count will show before the tests are run.

The solution to this is to only count the new objects each new ObjectMapper references. That way, the core reflection instances are not counted. This is why I use the GraphLayout.subtract method in my test: I create two ObjectMappers, and only count the memory footprint of the objects that they don't share.

Unfortunately the test suite still fails when using subtract. My guess is that this is caused by GC: GC can move objects in memory, which changes their memory address, which breaks the deduplication logic of subtract. The javadoc of subtract says to "quiesce the heap" to avoid this – I thought I could get away without doing that, but apparently I can't. I added a simple gc loop before the test. This was enough to get the full test suite working locally, but CI seems to be more memory-constrained and fails easily.

To be clear, the test works well when run in isolation, and the footprint results I listed in the PR text are good. But I'd hoped to get the test running in CI, but it seems to be too flaky for that atm. I have disabled it with @Ignore again – maybe someone else can figure out a solution.

@yawkat yawkat marked this pull request as ready for review November 20, 2022 06:53
@ben-manes
Copy link

I use GcFinalization.awaitFullGc() from guava’s testlib for anything that depends on that running (e.g. weak/soft references). I use Parallel GC for those tests as region based, like G1, are more incremental. That might help stabilize here.

@ben-manes
Copy link

Otherwise for a rougher estimate you can try jamm which won’t be impacted by GC moves as higher level than address checks for its dedupe.

@yawkat
Copy link
Member Author

yawkat commented Nov 20, 2022

@ben-manes awaitFullGc is better than the loop, and guava is already in the pom, thanks. However it does not seem to be enough, the fact that even running the analysis twice does not help seems to indicate that gc is triggered even by two calls to new ObjectMapper.

I hadn't heard of jamm, but a quick look at the api does not show a way to run the equivalent of the subtract call in jol? It is needed in order to exclude objects that are global, such as reflection instances.

(btw re your other optimization suggestions, they seem reasonable, but I will let someone that is actually familiar with the LRUMap impl do them in a future PR)

@ben-manes
Copy link

The default GC is probably G1, so it will only collect a single region. Also if your test suite is parallelized then other tests might cause some mayhem if you are relying on stable physical addresses. That would require isolated testing this is more finicky than normal cases.

I think that MemoryMeter.measureDeep(Object) could give you close enough to what you want.

@yawkat
Copy link
Member Author

yawkat commented Nov 20, 2022

Yea, I don't know if the tests are parallel, that would be an added complication.

I haven't tried it, but the problem I see with measureDeep is one of consistency depending on test order, instead of consistency depending on GC. I saw this with the initial test that didn't use subtract: When the test runs early in the suite (e.g. when it runs in isolation), the memory size is fairly small, because classes like Method are populated lazily. After the other tests run, these Method objects were populated by the other tests, so the same count will give a bigger estimate. The subtract call fixes this problem by removing any objects that are shared between ObjectMapper instances, such as these lazily populated fields (or rather the Method instances that contain them).

If I understand correctly what jamm measureDeep does, I think JOL can actually do the same thing, simply by not using subtract. JOLs GraphLayout.parseInstance does not have the GC problems, it uses an IdentityHashSet for deduplication. It's only the subtract call that is problematic.

@ben-manes
Copy link

Thanks, that makes more sense to me now. It sounds like this should not be run as part of the normal test suite due to global pollution. What about creating a new test category and running this exclusively and outside of the normal suite?

@yawkat
Copy link
Member Author

yawkat commented Nov 20, 2022

Yea, that sounds reasonable. I don't want to fiddle with the build too much in this PR, but we can do it after the fix is in.

Another solution could be changing jol to use object identity instead of addresses for the subtract call. I've reached out to shipilev about this.

@cowtowncoder
Copy link
Member

Ok yeah, while CI would of course be really good to have, manually run verification is enough for this PR.
So @yawkat it sounds like you think this is ready for initial merging? If so, I'll go ahead and merge it.

Follow-up steps then could include:

  1. Add (some or all of) suggestions by @ben-manes and have a look how they behave
  2. Add manually run task (or cli command / script) for running jol to inspect various sizes to help figure out effect of changes locally

But even without these I think we'd be ready for minimal set of 2.14.1, and could consider if there are other critical fixes/improvements to add, before proceeding.

Thank you @yawkat @ben-manes and @pjfanning for all your work here, I very much appreciate your expertise and help.

@yawkat
Copy link
Member Author

yawkat commented Nov 20, 2022

@cowtowncoder correct, the PR is ready to merge, and agree on the follow up steps. I can take a look at the testing situation tomorrow, most likely using @ben-manes' idea of a different test group. Then it can run in CI too.

@cowtowncoder cowtowncoder merged commit 5ab1d3e into FasterXML:2.14 Nov 20, 2022
@cowtowncoder cowtowncoder added this to the 2.14.1 milestone Nov 20, 2022
@pjfanning
Copy link
Member

It's safe to proceed with this IMHO.

The new map is slower than the v2.14.0 map due to more contention on the smaller readBuffer. Nothing drastic. And for ObjectMapper, there is plenty of other places where CPU time is spent so all in all, I doubt if the small slowdown in the PrivateMaxEntriesMap will have any real effect.

Benchmark                                  (cacheType)   Mode  Cnt         Score         Error  Units
ConcurrentBench.read_only                Jackson214Map  thrpt   25  15810558.870 ± 2112646.327  ops/s
ConcurrentBench.read_only                       NewMap  thrpt   25  12438884.722 ± 2064042.671  ops/s
ConcurrentBench.readwrite                Jackson214Map  thrpt   25  13503648.956 ±  647087.994  ops/s
ConcurrentBench.readwrite:readwrite_get  Jackson214Map  thrpt   25  11261181.499 ±  616862.111  ops/s
ConcurrentBench.readwrite:readwrite_put  Jackson214Map  thrpt   25   2242467.457 ±  108543.968  ops/s
ConcurrentBench.readwrite                       NewMap  thrpt   25  10146482.677 ±  748278.055  ops/s
ConcurrentBench.readwrite:readwrite_get         NewMap  thrpt   25   8246303.960 ±  680316.563  ops/s
ConcurrentBench.readwrite:readwrite_put         NewMap  thrpt   25   1900178.717 ±   70658.801  ops/s
ConcurrentBench.write_only               Jackson214Map  thrpt   25   8198165.057 ±  210140.537  ops/s
ConcurrentBench.write_only                      NewMap  thrpt   25   7123823.312 ±  225330.816  ops/s

https://github.com/pjfanning/jackson-lru-cache-bench

@pjfanning
Copy link
Member

I'm not sure that the extra changes suggested for PrivateMaxEntriesMap will have much effect.

@cowtowncoder
Copy link
Member

cowtowncoder commented Nov 21, 2022

Ok. Thank you for getting these numbers @pjfanning !

Somewhat disappointing to note no performance improvement for test cases, although I realize the main goal was to avoid that one-off for cache filling up. But I agree that either way for steady state effect would probably not be measurable for typical usage since cache lookup rate is not super high. We'll go with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants