Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in MicroProfile Open API's SchemaRegistry.current #24577

Closed
inad9300 opened this issue Mar 7, 2023 · 13 comments · Fixed by #25230
Closed

Memory Leak in MicroProfile Open API's SchemaRegistry.current #24577

inad9300 opened this issue Mar 7, 2023 · 13 comments · Fixed by #25230
Assignees
Labels
in:MicroProfile/OpenAPI Needs member attention release bug This bug is present in a released version of Open Liberty release:23005

Comments

@inad9300
Copy link

inad9300 commented Mar 7, 2023

I am opening this ticket as a follow-up to OpenLiberty/ci.maven#1587, as it was determined that the issue belonged to the Open Liberty runtime rather than the Maven plugin.

Describe the bug

A field such as this:

static final ByteBuffer buf = ByteBuffer.allocate(100_000_000); // Allocate 100 MB.

Placed on any @ApplicationScoped (or perhaps just any) class will be leaked on application restart, meaning that with every application restart there will be 100 MB of additional memory used. This is particularly noticeable during development through the Maven plugin, as the application is restarted automatically with every source change.

Steps to Reproduce

From OpenLiberty/ci.maven#1587 (comment):

I have prepared [...] a minimal repository which I believe is able to reproduce the issue, even though at a [small scale]. You can find it at https://github.com/inad9300/open-liberty-demo-app.

The basic idea is to start a Docker container where three things are run in parallel: the Open Liberty application, something like top for monitoring, and a periodic touch of a source file to trigger application restarts and thus reproduce the memory leak (the exact commands I have used can be found in the README file).

The longer you leave the periodic touch command running, the more memory usage grows. When stopping the command and waiting a bit, memory usage doesn't shrink. When resuming the command, memory usage goes back to growing. It doesn't grow by much each time, so the assumption here is that as the project becomes more complex (bigger number of source files, more dependencies, parameters like -DrecompileDependencies=true, etc.) the leak will be larger.

This will just prove that there are memory leaks of some origin. Additionally, to specifically show the problem with static fields, modify the source and add static final ByteBuffer buf = ByteBuffer.allocate(100_000_000) to RestApplication.java.

Expected behavior

Memory allocated in static fields should be freed somehow during application tear down.

Diagnostic information:

  • OpenLiberty Version: I am unable to determine. Actually, I would like help to do so. I believe the Maven plugin keeps it up to date automatically, but I don't know how to control this or how to check what's the current version of Open Liberty that I'm using.
  • Affected feature(s): possibly all, don't know specifically
  • Java Version: I'm using Docker image eclipse-temurin:17.0.4.1_1-jdk-jammy
  • server.xml configuration: Please find it in Massive memory leak in liberty:dev command ci.maven#1587 (comment)
@inad9300 inad9300 added the release bug This bug is present in a released version of Open Liberty label Mar 7, 2023
@jim-krueger
Copy link
Member

This is likely related to #24155 and #24444.

@jim-krueger jim-krueger added this to Backlog in JAX-RS Team Mar 13, 2023
@inad9300
Copy link
Author

inad9300 commented May 5, 2023

Did you have a chance to look into this problem? Sorry for insisting, but my experience developing with Open Liberty has being terrible for many months now. Several times a day, all 24 GB of memory in my computer get filled, rendering it unusable, often forcing me to hard-reset it.

@jim-krueger jim-krueger self-assigned this May 5, 2023
@jhanders34
Copy link
Member

jhanders34 commented May 8, 2023

Reproduced the problem with the application provided and it appears to be in the Open API function:

Class Name                                                                                                     | Shallow Heap | Retained Heap
----------------------------------------------------------------------------------------------------------------------------------------------
com.ibm.ws.classloading.internal.AppClassLoader @ 0x5d0e148a0                                                  |          136 |        17,488
'- appLoader com.ibm.ws.classloading.internal.ThreadContextClassLoader @ 0x5d1001160                           |           96 |        11,592
   '- classLoader io.smallrye.openapi.runtime.scanner.spi.AnnotationScannerContext @ 0x5d0fede20               |           56 |        12,160
      '- context io.smallrye.openapi.runtime.scanner.SchemaRegistry @ 0x5d0feddf8                              |           40 |        34,304
         '- value java.lang.ThreadLocal$ThreadLocalMap$Entry @ 0x5d0feddd8                                     |           32 |        34,336
            '- [19] java.lang.ThreadLocal$ThreadLocalMap$Entry[128] @ 0x5d0fd6898                              |          528 |        50,824
               '- table java.lang.ThreadLocal$ThreadLocalMap @ 0x5cfe89458                                     |           24 |        50,848
                  '- threadLocals java.lang.Thread @ 0x5cfe88e48  Default Executor-thread-13 JNI Global, Thread|          120 |        51,048
----------------------------------------------------------------------------------------------------------------------------------------------

It is the SchemaRegistry.current ThreadLocal that is the culprit. It would appear that remove is not called in SchemaRegistry. Not sure whose responsibility that is.

@jim-krueger
Copy link
Member

@Azquelt Can you please take a look at this. Perhaps something can be done in the openapi glue code?

@jim-krueger
Copy link
Member

@inad9300 Daniel, You should be able to work around this by removing the microProfile-5.0 feature from your server.xml. If you need microProfile features you could specify only the features you need, and as long as that list does not include the mpOpenAPI feature you should be good to go.

@jim-krueger
Copy link
Member

Looks good Andrew. Thanks

@jim-krueger jim-krueger removed this from Backlog in JAX-RS Team May 9, 2023
@mbroz2 mbroz2 changed the title Static fields leaked on application restarts Memory Leak in MicroProfile Open API's SchemaRegistry.current May 30, 2023
@inad9300
Copy link
Author

I'm sorry to report that my application keeps crashing my computer multiple times a day while developing due to its unbound memory growth. In fact, I have the impression that it happens faster than it used to. I'm not sure which criteria you used to determine that the issue was fixed but, as far as I can tell, my demo app still reproduces the problem in 23.0.0.5...

@inad9300
Copy link
Author

In fact, removing every file within src/ except for server.xml and RestApplication, and leaving RestApplication completely empty, I still see slow-but-steady memory growth when running watch --interval 1 touch src/main/java/com/demo/rest/RestApplication.java. In less than 15 minutes, the memory of one of the Java process doubled, growing from about 700 MB to about 1.4 GB. It seems to me that there are some fundamental memory problems in Open Liberty beyond just static fields.

@Azquelt
Copy link
Member

Azquelt commented Jun 15, 2023

Hi @inad9300, thank you for the follow up.

Running the application locally, I see many instances of the RestApplication class are kept alive because their class loaders are the context class loader for many threads named pool-<number>-thread-1.

This lead me to spot an issue in your application here:

        scheduledPing = newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {

You're creating a new executor on init, but never shutting it down. Because you're using Executors.newSingleThreadScheduledExecutor, the executor isn't managed by open liberty and is not closed for you when the app shuts down.

You could probably just shut it down in onDestroy but the better way to do this is to inject a ManagedScheduledExecutorService and use that instead.

public class SseController {

    @Resource
    private ManagedScheduledExecutorService executor;

    private Sse sse;
    private final List<SseEventSink> sinks = new CopyOnWriteArrayList<>();
    private ScheduledFuture<?> scheduledPing;

    void onInit(@Observes @Initialized(ApplicationScoped.class) Object __) {
        scheduledPing = executor.scheduleAtFixedRate(() -> {
            ...
            ...
            ...

ManagedScheduledExecutorService is part of Jakarta EE concurrency, provided by the concurrent-2.0 feature which is included by the jakartaee-9.1 feature you're already using.

With this change I still see some memory growth early on (heap dump shows some soft references from BaseTraceService.earlierMessages) but this settles down after a while and a later heap dump shows lower usage and only one instance of the com.demo.rest.RestApplication class.

@inad9300
Copy link
Author

Thanks for the tip. This is a repository that I am using to report on different Open Liberty bugs, so I did not pay too much attention to cleaning up resources of SseController. My latest tests were, nevertheless, with nothing inside src/ except for an empty RestApplication class and the server.xml file. I could still observe a significant memory surge over time, peaking in at 3.5 GB. After that, playing further by reintroducing the 100 MB allocation, then a 1 GB allocation, memory usage went up, then down to about 1.6 GB, hinting that there are probably no leaks as I had thought. Sorry for the confusion, I will try to check for possible memory leaks in my application before complaining any further 🙃

By the way, can you help me understand why calling newSingleThreadScheduledExecutor() within onInit() is problematic? Since the future is cancelled, shouldn't the executor instance become garbage-collectable, being referenced by nobody?

@Azquelt
Copy link
Member

Azquelt commented Jun 16, 2023

I was a bit surprised by this too. It looks like the ThreadPoolExecutor used to have a finalizer which would shut down the thread pool when the executor instance was garbage collected, but it was removed in Java 11 because finalizers had been deprecated.

There's no mention of using any replacement mechanism to guard against executors not being shut down in that issue. I can't say for sure that there isn't one but going by what I saw in the memory analyser, it looks like they aren't getting cleaned up.

@inad9300
Copy link
Author

I keep trying to understand why my application leaks memory during development; please, allow me to continue posting on this issue to see if I can get any more help from you.

I have recently run server javadump --include=heap defaultServer and opened the resulting .hprof file with Eclipse Memory Analyzer. When using the "Leak Suspects" function, it reports 1700+ instances of com.ibm.ws.artifact.overlay.internal.DirectoryBasedOverlayContainerImpl and 18 instances of com.ibm.ws.classloading.internal.AppClassLoader, which I thought was a bit odd:

leak-suspects

Is this to be expected? Does this give you any sort of hint? Do you have any suggestions as to what to look for next and/or profiling tool recommendations?

@Azquelt
Copy link
Member

Azquelt commented Jul 4, 2023

18 Instances of AppClassLoader does seem like a lot, particularly if that number keeps growing as you develop your application.

In MemoryAnalyser, you should be able to look to see what has references to each AppClassLoader. Every class will hold a reference to its class loader, so any instance of any class from the app which is kept alive will result in the whole classloader and all its classes not being garbage collected.

Class loaders for running apps may be referenced from lots of places. When an app is stopped nothing should retain a reference to its class loader and it should be garbage collected. If you can, I would take a dump on the first load, then again after developing and redeploying the app for a while, then again after more development so you can look at the AppClassLoader ids and see which are not being cleaned up correctly. Then take an AppClassLoader that should have been cleaned up and show all the references to GC roots to see what has a reference to it.

In the case I looked at before, I could see that the reference path lead to a thread that you had created. You might also find that it leads to an instance of a class being stored in a ThreadLocal. Liberty uses a thread pool, so if something is wrongly left in a ThreadLocal after the app has stopped, it might leak the class loader for that app until that ThreadLocal for that thread is reused and overwritten or the thread itself is removed from the pool. Liberty does use ThreadLocal in places to store information like which app is running or what the current request is, so it's possible that we are not cleaning something like that correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in:MicroProfile/OpenAPI Needs member attention release bug This bug is present in a released version of Open Liberty release:23005
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants