Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mechanism for forcing BufferRecycler released (to call on shutdown) #400

Closed
jborgers opened this issue Sep 14, 2017 · 39 comments
Closed
Milestone

Comments

@jborgers
Copy link

We see a class loader memory leak by using Jackson: on redeployment of our application in WebSphere we see an increase of heap usage of a couple of hundred MB's en after several redeployments heap usage becomes close to the heap size and garbage collection takes a lot of CPU. Note that most extra heap is taken by the BufferRecyclers retaining char[][]'s.

SoftReferences may help to prevent out of memory errors, it doesn't help for gc overhead (including long compaction pauses.)
In addition, the BufferRecycler classes of previous deployed versions of the app are still in the ThreadLocals of all threads of the threadpool and prevent the classloader with all its classes to be unloaded.
See here: https://stackoverflow.com/questions/17968803/threadlocal-memory-leak for more on classloader leaks.

We would like Jackson to release/remove the BufferRecyclers from the ThreadLocals on shutdown, by calling a shutdown method on e.g. JsonFactory.
See also: http://java.jiderhamn.se/2012/02/26/classloader-leaks-v-common-mistakes-and-known-offenders/

@jborgers
Copy link
Author

See #67 and #189

@jborgers
Copy link
Author

jborgers commented Sep 14, 2017

From #67 posted by @cowtowncoder:
I can see potential issues with hot reloading, although TBH I didn't think anyone would really want to use those in production systems these days (but rather just during development).
If there are convenient hooks for clearing state (and if ThreadLocal allows purging across Threads -- I know that under the hood there are probably means, but JsonFactory does not keep track of anything by itself) adding those would be an improvement.

Thank you for the links that might be helpful here.

One thought on BufferRecycler usage: while bit ugly, it would probably be possible to replace its use with old-skool combination of recycler "class", for static utility methods, but passing Object[] that contains buffers to be recycled. These would need to be cast, but doing this would remove any Jackson provided classes -- buffers are just byte[]s and char[]s after all.
If this is the only class remaining it could help remove ClassLoader reference itself I assume.

@jborgers
Copy link
Author

Thanks for your reply.
We actually do hot re-deployments in production.
I can imagine a solution with WeakReferences in stead of SoftReferences in ThreadLocal, to BufferRecycler, and in addition a (strong) reference from e.g. JsonFactory to all BufferRecycler objects, which can be dereferenced/released by invoking a shutDown method. On application stop/undeploy event we could then invoke this JsonFactory.shutDown method.

@jborgers
Copy link
Author

In your thought, if you don't put any objects of jackson classes on the ThreadLocal, this would solve the classloading leak. However, if you reference a BufferRecycler by an Object reference (or in array Object[]) the objects itself still reference BufferRecycler class and thus the classloader. Therefore, it doesn't solve the issue.

@cowtowncoder
Copy link
Member

@jborgers Use of ThreadLocal is a core feature and rather important for performance. It works really nicely, allows memory to be cleared via GC (with soft references), and I don't have much interest in implementing more complicated buffer reuse scheme.
I am not 100% sure I understand reference wrt weak vs soft references in this context. My recollection was that downside of soft references was that they really disappear very quickly and do not work as well for caches; but that they are indeed cleared when memory pressure so dictates.
So to me this choice does not seem like it would affect retention of ClassLoaders via class.

But as to BufferRecycler, what I meant was that it would be perfectly possible to only recycle an Object[] that contains actual buffers, and either make BufferRecycler only consist of static methods, or just create wrapper from Object[]. This way reference from ThreadLocal would only be to JDK classes (Object[] and char[] / byte[] buffers retained).
It would be slightly ugly but nothing too bad, and if still using recycler instance seems like it could actually be hidden as implementation detail.

@jborgers
Copy link
Author

@cowtowncoder I agree that use of ThreadLocal is a nice feature for performance. Only I have an issue with SoftReferences. They prevent out-of-memory however, cause high heap usage and high gc overhead.

As I understand it, having no BufferRecycler instances, so char[][] and byte[][] directly in Object[] in ThreadLocal will indeed solve the classloader leak: there is no reference from the system classloader to a class loaded by the classloader of the undeployed application, so that classloader and its classes can be gc-ed.

However, since each application creates its own ThreadLocal which ends up in a Map in Thread, the buffers created for the undeployed applications are still there. This is most of the added heap usage by the undeployed application.

In my proposed solution, WeakReferences are used instead of SoftReferences, so if no other (stronger) reference references the BufferRecycler, it will be gc-ed. We keep a reference from JsonFactory to all BufferRecycler's in order to be able to release them (dereference) in a shutdown method. I think this list of references should be SoftReferences, to keep the same behavior in case of memory pressure, when not using the shutdown.

@jborgers
Copy link
Author

@cowtowncoder I cloned the source, added and changed a few lines of code in JsonFactory as prototype implementation of my proposed solution. Would that be helpful? Should I create a fork?

@cowtowncoder
Copy link
Member

Sure, fork is usually a good way, even if just reproducing a problem or showing approach.

@jborgers
Copy link
Author

Right, we are going to test this forked version in a load test and do heap analysis to see if it shows the desired effect when doing re-deployments: no more classloader leak and releasing of the unneeded BufferRecyclers in ThreadLocals, resulting in lower, stable heap usage.

@cowtowncoder
Copy link
Member

cowtowncoder commented Sep 20, 2017

@jborgers That would be great. I think there may be one or two other places (jackson-dataformat-smile has additional recycler, and JsonStringEncoder with this project too. However, eliminating latter should be easy if you can verify that approach works (i.e. if you can see what are the references blocking GC of ClassLoaders). It should also be possible to locally hack JsonStringEncoder to elminiate recycling for testing; and if you don't use smile format it's not relevant yet.

I can go ahead and check other format modules as well; these are the only components with buffer recycling within Jackson components.

@jborgers
Copy link
Author

My colleague had good results with our improved version of JsonFactory/jackson-core with a small webservice running on HotSpot: buffers were gc-ed quickly after shutdown. Friday they had a version ready for the whole application running in WebSphere, I expect it will be fully load tested beginning of next week.

@cowtowncoder
Copy link
Member

Excellent, looking forward to seeing more!

@jborgers
Copy link
Author

jborgers commented Oct 9, 2017

@cowtowncoder We have good results from the load test on WebSphere/J9-JVM! The BufferRecyclers are released after the shutdown method is called on undeploy and the classloader is also released.
I put the change in JsonFactory in my fork: https://github.com/jborgers/jackson-core.
How can we go forward?

@cowtowncoder
Copy link
Member

cowtowncoder commented Oct 10, 2017

One possibility would be a PR; I could either merge it directly, or cut'n paste depending on how things go (I'd want to get this in 2.9 branch most likely, but master is diverging from it quite rapidly due to 3.0 being major API-incompatible change).
Anyway, I just need to see diffs.

Regardless of mode of merging changes one thing I would need is CLA:

https://github.com/FasterXML/jackson/blob/master/contributor-agreement.pdf

(or Corporate CLA variant that's found next to it).

This is usually printed, filled & signed, scanned and email to info at fasterxml dot com.
Only needs to be done once for all Jackson contributions, but we need it before first one. Keeps corporate lawyers less unhappy wrt distribution etc.

@cowtowncoder
Copy link
Member

@jborgers Um. I must have misunderstood here -- this is nothing at all what I had in mind.

What I would be interested is the idea I had for getting rid of the class, not new piece of machinery to keep track of ThreadLocal referenced things.

@jborgers
Copy link
Author

jborgers commented Dec 1, 2017

@cowtowncoder Sorry for the delay. Based on a code review, I made some improvements to the code. We did retesting on both Hotspot and IBM J9/WebSphere (SoftReferences behave differently for them.) We had some unavailability of team members. New version is now fully tested, also duration tested. I analyzed heap dumps and the conclusion is that everything works as expected. BufferRecyclers get released after shutdown call. Classes can be unloaded. We see a nice reduction in memory usage now and we are happy with the result. I will proceed with signing the doc.

Not sure what you mean by your remark of Oct 14. Maybe I can explain the workings if unclear.
I think we should add some doc how/why/when to invoke the shutdown.

Like discussed above, getting rid of the class (BufferRecycler) does not solve our problem. There needs to be a way to release the buffers (being BufferRecyclers or byte[], char[]) (we run with 400 threads) hence a reference is needed from a central place. In our improved version, we got rid of the overhead of the WeakReference, a strong reference to the SoftReference is enough. We use a Set instead of a List to have faster access and a ReferenceQueue to prevent a small memory leak. I added doc in the source to explain things.

@cowtowncoder
Copy link
Member

Ok. So, my original assumption was that there was some way to either trigger freeing of all SoftReference, from shutdown hook. Possibly using something not normally accessible (like Unsafe). If so, while I wouldn't be too happy about it, I might be ok.

But adding the whole overhead of linked list for each and every alloc and/or release is something I am not ok with: I will not add that into core. With potentially hundreds of threads that overhead will be sizable and you'd probably be better off just disabling recycling, and dropping use of ThreadLocal.
So, I will not merge this PR.

However. I did refactor handling of BufferRecycler so that actual ThreadLocal is in new class ThreadLocals, called from JsonFactory. This means that you can sub-class JsonFactory and change handling to your liking, for example using code from this PR if and when it works the way that makes sense for your case.

@jborgers
Copy link
Author

jborgers commented Dec 5, 2017

Hi @cowtowncoder, thanks for your reply.
Did you have a look at the current version? You mention 'the whole overhead of a linked list to every alloc and/or release'. There is actually no linked list in the solution. The current solution is a clear improvement over the earlier version you might have seen.

What is added is adding a reference to a Set. In my view the relative overhead both time and space wise, is neglectable compared to the buffers. Add/remove to a Set is O(1) and very quick and heap taken is very small, I verified in heap analysis with MAT. Also our performance tests show that there is no degradation in response times. Yet, there is a huge win in memory usage after undeploy.

For making use of an Unsafe feature I think that is not a good idea, an unstable solution since Java 9+ will be hiding these features in a java module. Furthermore, performance wise, I think that the high memory use of the buffers with many threads (we have 400) and the use of SoftReferences both makes garbage collection take more time and have longer pauses, e.g. with compaction. In my opinion SoftReferences make poor caches, see for instance https://stackoverflow.com/questions/5757969/softreference-gets-garbage-collected-too-early and are largely over-used.

I think our improvement will be very beneficial to any user with more than a handful of threads using jackson and who do un/redeployments either in production of development. Therefore, I ask you to reconsider your decision to not merge this solution.

To meet your worries about overhead, it might be an approach to enable the releasing functionality on request, register allocated buffers only then. Either by configuration and/or a method call e.g. enableShutdown(), enableReleasabilityOfBuffers() or registerBuffers() and include this in the official jackson.core. This way any user can choose to benefit and it will be possible for us to use this in our whole organization.
Would that be an option for you?

@jborgers
Copy link
Author

jborgers commented Dec 6, 2017

I like to add that a solution with subclassing will not work for us since we use several libraries which in turn use jackson-core. We cannot change the class (name) they use, yet, we can change the jar-file hence the version of the library and JsonFactory class as long as it is backward compatible.

I also want to stress that this is a serious problem for us, several of our high-volume production systems suffer from high heap usage and as a consequence increased cpu usage because of this issue. Your help is much appreciated.

@cowtowncoder
Copy link
Member

Ok, I'll have a look at latest patch.

@cowtowncoder
Copy link
Member

I guess that is an improvement, but there is now a potentially bit Set of all BufferRecyclers that is modified for each usage. And it is synchronized as well, likely becoming synchronization bottleneck.
Access may not be as bad as I thought since hash code is probably identity hash, and hopefully it's hash-based set.

Mention to potential use of Unsafe was just speculative -- I was guessing there might be a way to force cleanup through ThreadLocal reference. But I am not aware of such thing.

I think the best course forward for your usage may really be disabling

 JsonFactory.Feature.USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING

which will avoid memory retention.

@jborgers
Copy link
Author

jborgers commented Dec 7, 2017

Hi @cowtowncoder, not sure what you mean by 'now a potentially bit Set ... that is modified for each usage'. And for what reason you think it is likely a synchronization bottleneck.

I think it would really be helpful to have a more interactive way of communication. Would it be possible for you to join a conference call with us? What is a good time for you? Next Wednesday would be best for us since all of us working on this issue are available and on location then.
Does skype, google hangouts, webex or something else work for you?
Thanks in advance.

@cowtowncoder
Copy link
Member

At this point I don't think more communication helps: I am not planning to include code as suggested in Jackson 2.9. I understand that you consider it important: but this is not an urgent problem for me.
I also have not observed problems, or received for past 8+ years, so including an additional tracking here is not something I would do very quickly either. Especially for 2.9 version for which stability is important.

As to synchronization: it will be needed for each construction of parser, across all threads. For a large multi-threaded systems it becomes an issues, something for which ThreadLocal is designed for.

Beside this, I still do not understand why JVM would not simply free buffers held on to via soft references as expected. There is no good reason why this should happen. I could see why class (BufferRecycler) might not get cleared via ThreadLocal reference; and perhaps with that, class loader that holds onto it. That would be fixable.

Given this, I am interested in helping have customizable aspects, something that you can configure.
But not default settings, nor do I care if it has to be something that automatically works when constructing vanilla JsonFactory. If that is a requirement, you may be best off forking the project.

@jborgers
Copy link
Author

jborgers commented Dec 8, 2017

Hi, yes indeed it is important to us. I just analyzed two production systems (many servers are copies of these in our server farm.)

One server has BufferRecyclers consuming almost half of the live data in the heap: 43%. the other 16%.
In another case we had to switch off buffering as you suggested because some of the service responses were very large, eventually enlarging the buffers in all threads and risking out of memory, especially after re-deployment. Our load tests however showed that this resulted in 34%-38% longer response times, which is of course undesirable.

I think not many developers do heap analysis and understand what they see. In addition, who understands class loading leaks and how SoftReferences and garbage collection work? I think many have these problems yet are unaware of it.

Considering synchronization: the set is only accessed when creating a BufferRecycler, which is only once per thread, right? And the critical section is very small, uses identity hash and is O(1) as documented in the code and I think time taken is almost neglectable compared to the allocation performed by BufferRecycler. Yet, maybe I miss something and we may get lock contention in some cases. We could prevent this contention by use of a Set which is optimized for multi-threaded access: backed by a ConcurrentHashMap. ConcurrentHashMap uses lock striping (and in my opinion a better approach in general than use of ThreadLocals). We can use something like (simplified):

Set<<>> allSoftBufRecyclers = Collections.newSetFromMap(new ConcurrentHashMap<<>, >());

And we are done. I am assuming we can use java 1.6 here.

How do you expect the JVM to free buffers held on to via soft references?
Garbage collectors clear WeakReferences after the target is no longer stronger referenced. SoftReferences are less weak and hold on to the target for some time after it has become non-strongly referenced (by a normal reference that is.) When the gc clears the SoftReference is different per JVM, yet quicker when there is memory pressure.

IBM JVM for instance clears (some of) the SoftReferences which uniquely retain their target once every N full gc's, where N can be 15 (for example). It may however take a while before a full gc takes place, on one production server for instance only once a day, so it takes 15 days in this example for the BufferRecyclers to be de-referenced. There can be multiple re-deployments in these 2 weeks.

But it can also happen quickly, with a full gc every minute, every 15 minutes the BufferRecyclers are cleared from the SoftReferences (since BufferRecyclers are only softly reachable) while you use them, regardless if there is memory pressure or not. And you don't want them to be cleared. A reason why SoftReferences make poor caches: there is no control over the time-to-live.

The classloader leak occurs because an object from a class of the system classloader (ThreadLocal) references an object of a class of the application classloader (BufferRecycler.)

Something configurable could very well work for us. Like I suggested in my previous post. How would you like to have this configured? Like other Feature-s?

@jborgers
Copy link
Author

Hi @cowtowncoder, I am eager to know your answer.

@jborgers
Copy link
Author

I created a new version in my repo where I changed new releasability of buffer recyclers to be optional, enable by a feature and static method. I factored out needed machinery into a separate class, only instantiated if enabled. I also used the multi-threaded optimized Set.
We will test this version, I added some debug println's to be removed later.

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 13, 2017

Ok. So, use of SoftReferences should work as expected, to be held until such time that GC pressure clears them (typically for Full/OldGen GC):

```Soft reference objects, which are cleared at the discretion of the garbage collector in response to memory demand. Soft references are most often used to implement memory-sensitive caches. ``

I am not surprised about longer time-to-live for SoftReferences per se (that is, in many ways, the idea).
But I am surprised that it should become problematic: although buffers are non-trivial in size (otherwise there's no point in trying to reuse in the first place), they wouldn't seem to add up to that much. Even if/when they end up in Old Generation (due to long lifespan).

From earlier descriptions it sounded like there was something special, however, about your use case of hot-reloading classes, something that I have never used (nor seen used in production anywhere, although remember that application servers did allow that). Much more commonly service node would be shutdown, restarted; this simplifies life-cycle of system significantly.
If this is the case it seems/seemed odd that as part of this JVM would not end up clearing soft references as well.

But I'll add bit more thoughts on separate comment, so hopefully we can find common ground, and I can understand your limitations wrt possible solution.

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 13, 2017

So, from my perspective, I would be most interested in finding a way to allow extension point that allows your approach to be used in your case, regardless of what the default behavior would be. I would like to keep existing behavior as unchanged as possible, partly since release cycle is such that 2.x will be in maintenance mode and I'd like to limit amount internal change in this are. This because possible new problems can be equally difficult to track down. As you say, debugging memory management, retention issues is very tricky, especially in heavily multi-threaded systems.

Extension points, then, could come in at least 2 flavors;

  1. Sub-classing: easier to provide from my perspective, but somewhat fragile, and possibly difficult to deploy (and not applicable to other format backends)
  2. Adding new configurable handlers classes (with a default implementation); most modular, and although ideally not done in a patch, I would be ok with such a change
  3. Addition of direct method(s) for changing behavior (ideally per-instance, static configuration is problematic when Jackson is often used as transitive dep of multiple frameworks, libraries)

It sounds like (1) would not work for you, either. But perhaps (2) or (3) would work.
And just to make sure: I do not object to the solution in general (that is, it is a valid approach), but just worry about its impact for existing use cases where retention is not problematic.

Another idea that may or may not make sense: allowing use of WeakReferences. Since they should be cleared much more rapidly, and since they share a base class, it would likely be much easier to add a configuration feature to allow change. But I am not sure if that would be something that would help you -- no point in adding option that is not useful.

@jborgers
Copy link
Author

Thanks for your reply and good to read you want to find common ground and a solution.

I understand you want to keep the existing behavior as unchanged as possible for the reasons you mention.

Happy to look along the lines of extension options (2) and (3).
2- Could you elaborate a bit more on configurable handler classes? What exactly you have in mind?
3- "static configuration is problematic when Jackson is often used as transitive dep of multiple frameworks, libraries" - could you elaborate on that? What exactly is the problem?
We explicitly want to have the registration/releasability of buffers on JsonFactory class-level (static) and for enabling it for all JsonFactory instances especially also for libraries which use jackson-core, which have it as transitive dependency.

WeakReferences will be cleared too rapidly, on the following gc event.

I created a new version along the lines of (2) and (3). It makes our added machinery optional. It separates out all added state and behavior into a new private inner class (ThreadLocalBufferManager) which will only be instantiated and used in case JsonFactory is configured to useReleasableThreadLocalBuffers. New behavior is only applied in case the feature is enabled (default: disabled.) It can be enabled with a new Feature or with a static method enableUseReleasableThreadLocalBuffers, which will instantiate the ThreadLocalBufferManager (bufferMgr), referenced by a static field. New registration will only happen in case the bufferMgr exists (feature is enabled) by the bufferMgr. The shutdown method will delegate to the bufferMgr only in case it exists (feature is enabled).

Could you have a look at this version? Eager to know your thoughts.

@jborgers
Copy link
Author

@cowtowncoder Please have a look at the new version and let me know your thoughts, so we can go forward.

@jborgers
Copy link
Author

jborgers commented Jan 2, 2018

@cowtowncoder Waiting for your reply. We are eager to go forward.

@cowtowncoder
Copy link
Member

Ok. On "static configuration", I just mean things like env variable and system properties: more accurate would be "global settings", which affect all instances everywhere. With jackson being transitive dep on many things, different usage often requires different configs. Almost as bad as system-prop/env would be static singletons, although at least those would be shaded.

On handlers: it would just mean making adding handler type (interface, abstract class), pluggable to factory/-ies, and getting called for allocation.

One thing I forgot to ask was simple limitations from your side, wrt. sub-classing or ability to configure factory instances. I can understand that you can not force use of, say, sub-class of JsonFactory.
But perhaps ability to set a handler instance (which could be from sub-class constructor, or by whatever constructs factory, or even gets factory handed) is acceptable?

From my side I do not want substantial changes to default behavior, esp. for 2.x.
3.0 is different story, but I don't think that is something you would be able to rely on, given that it will not be available for quite some time.

@kirked
Copy link

kirked commented Mar 30, 2018

ThreadLocals should be avoided in library code.

I got here by doing research about possibly choosing Jackson for a project that will deserialise terabytes of JSON, streaming in an Akka-based system, where you don't know, own, or control the thread that you will be running on. In fact, your execution is freely intermingled in Akka's thread pools to maintain high, non-blocking CPU throughput.

At least I can turn off the feature altogether.

As another note, instead of using synchronisation of access for a common set of references, it would probably be better to use an AtomicReference to hold the set, and access it through that so that you maintain non-blocking throughout.

@jborgers
Copy link
Author

jborgers commented Apr 5, 2018

@kirked The pull request implements a releasability of the thread-local buffers. You can release all buffers when needed, e.g. on shutdown of the application. Not sure this will help you. For 3.0 plan is to have a pluggable buffering approach and multiple implementations. Your use case is useful input for that.

@cowtowncoder
Copy link
Member

@kirked I am not going through argue the point on use of ThreadLocal beyond agreeing that its use should be carefully considered by all code, not just libraries and frameworks, and noting that Jackson's use is bit different from typical usage, aiming to optimize away both synchronization (no need since access guaranteed single-threaded) and complexity of buffer tracking, management (since it's essentially fixed set of buffers, per thread, as opposed to variable number of buffers from any and all threads).
Similar system has been successfully used by Jackson for past 9 years, as well as libraries before that like Woodstox, for past 15 years, and not reported as issues in the past. The only concern so far have been notes of seeing many buffers to reuse in memory dumps.

Anyway... as @jborgers said, intent is to allow alternative strategies, where users can choose different trade-offs. There are certainly benefits from different, centralized approach, where actual memory usage can be strictly limited for example.

@cowtowncoder cowtowncoder changed the title BufferRecyclers are not released on shutdown of the application BufferRecyclers are not released on shutdown of the application Apr 10, 2018
@cowtowncoder
Copy link
Member

Getting closer: merged #450 in 2.9, could use test verifying. Performance not different with modest number of threads (tested with 8), as expected. I assume differences, if any, would be for much higher concurrency.

@cowtowncoder cowtowncoder changed the title BufferRecyclers are not released on shutdown of the application Add mechanism for forcing BufferRecycler released (to call on shutdown) Apr 13, 2018
@cowtowncoder cowtowncoder added this to the 2.9.6 milestone Apr 13, 2018
@cowtowncoder
Copy link
Member

So: implementation checks for System Property

com.fasterxml.jackson.core.util.BufferRecyclers.trackReusableBuffers

which is defined as String constant

BufferRecyclers.SYSTEM_PROPERTY_TRACK_REUSABLE_BUFFERS

and specific value of true (String) will enable handling.

Access to clean up functionality is via class com.fasterxml.jackson.core.util.BufferRecyclers method

public static int releaseBuffers()

which may be called at any point, and returns number of references cleared, or -1 to indicate tracking is not enabled. Actual count only gives upper bound of possibly de-referenced buffers.

At this point functionality in its current form is just for 2.9.6 and later 2.x releases: it may or may not make it in 3.x as-is; if not, we will figure out something else to solve specific problem this version was designed to fix.

@cowtowncoder
Copy link
Member

@jborgers 2.9.6 finally released.

@jborgers
Copy link
Author

Nice! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants