New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inheritable thread-local closure stack in JoinPointImpl causing memory leak in spawned threads (e.g. thread pool) #288
Comments
according to https://rules.sonarsource.com/java/tag/leak/RSPEC-5164/. Fixes #288. Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
according to https://rules.sonarsource.com/java/tag/leak/RSPEC-5164/. Fixes #288. Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
I made an experimental fix, but I did not find a good way to test this automatically yet. Even for manual tests, there is no easy way. Therefore, I need feedback from you. Please try the latest <repositories>
<repository>
<id>ossrh-snapshots</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
</snapshots>
</repository>
</repositories> If you or your colleagues can provide a better way to test this, please let me know. I do not have enough free cycles to play with this as long as I would like to. |
Thanks @kriegaex for such a quick response and fix. I'll test it today and get back to you. Thanks. |
@kriegaex I quickly tested the fix but it does not fix our issue. The change that you made would only clear the Stack for the spawning thread but the spawned thread would still hold the copied stack right? |
You said, the problem occurs independent of number of threads. I did not have much time for looking into this. I would really appreciate a failing test which would pass with a correct fix in place. So, please provide one or multiple automated tests. Thanks in advance. |
@kriegaex I'll try to write one. Just to be clear, the problem for us is the spawned (spawned from the threadpool) thread holding up the memory in inhertiableThreadLocal. |
Thank you, @pagrawalgit. Whatever I do when running your program with AspectJ 1.9.21.1 on JDK 21 or 1.9.9.1 on JDK 8, in my memory dumps the threads all have |
One more question: Are you using compile-time or load-time weaving? I am trying to reproduce your problem with CTW. |
OK, the around advices were inlined automatically in your simple example with default compiler options. I had to use |
Thanks @kriegaex . I should have mentioned this earlier, we are using load time weaving. I'll post code with automated tests this afternoon EST. |
I am using JDK17. I ran tests using IntelliJ with junit 5. I am assuming <=8GB (xmx) of java heap memory allocated in one of the tests. Command used for LTW:
Note: These are a couple of quick and dirty tests I wrote. Both of these pass with 1.9.8 and fail with later versions. If it makes it easier I can push the code to a git repo. Class DemoAspect1 and DemoAspect2 are unchanged. I deleted the Main class. package com.demoapp;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Future;
/**
* Demo class which has a method with 2 aspect annotations. This method also spawns a new Thread.
* This Thread will copy inheritableThreadLocal value and leak memory.
*/
public class Demo {
final ExecutorService taskManager;
final byte[] someBigData = new byte[1024 * 1024 *1024]; //1GB worth of data.
/**
* Constructor.
*
* @param executorService - ExecutorService to spawn a new Thread.
*/
public Demo(final ExecutorService executorService) {
this.taskManager = executorService;
}
/**
* Method surrounded to spown a new Thread.
*/
@DemoAspect1.DemoAnnotation1
@DemoAspect2.DemoAnnotation2
public void doSomething() throws ExecutionException, InterruptedException {
final Future future = taskManager.submit(() -> {
System.out.println("Hello");
});
future.get();
}
} import com.demoapp.Demo;
import org.junit.jupiter.api.Assertions;
import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class TestMemoryLeak {
/**
*
* @param executorServiceList - list of threadpools.
*
*/
private void testAspectJLeak(final List<ExecutorService> executorServiceList) throws Exception {
//Create a Threadpool list to keep the Threads alive until completion.
for (int i = 0; i < executorServiceList.size() ; i++) {
final Demo demo = new Demo(executorServiceList.get(i));
demo.doSomething();
}
System.gc(); // Best effort GC
//Thread.sleep(500000); // Sleep to take a Heap dump
System.out.println("Done");
}
/**
*
* Tests that the inheritableThreadLocals of the spawned threads are either null or contain all null elements.
*/
@org.junit.jupiter.api.Test
public void testNoMemoryLeak_InheritableThreadLocalCleared() throws Exception {
final int numThreadPools = 1;
// Create a new threadpool for every thread to be spawned, to make sure that the threads gets
// initialized for the first time. This allows the spawned thread to copy data into inheritableThreadLocals.
final List<ExecutorService> executorServiceList = getThreadPools(numThreadPools);
//run test
testAspectJLeak(executorServiceList);
// Verify if the spawned threads have null inheritableThreadLocals after completion.
final Set<Thread> threads = Thread.getAllStackTraces().keySet();
threads.stream()
// Could further disambiguate by adding custom thread naming.
.filter(thread -> thread.getName().contains("pool"))
.forEach(thread -> {
try {
final Field mapField = Thread.class.getDeclaredField("inheritableThreadLocals");
mapField.setAccessible(true);
final Object o = mapField.get(thread);
if (o != null) {
final Field tableField = o.getClass().getDeclaredField("table");
tableField.setAccessible(true);
final Object[] inheritableThreadLocalTable = (Object[]) tableField.get(o);
if (inheritableThreadLocalTable != null) {
for (Object entry : inheritableThreadLocalTable) {
// every entry should be null.
Assertions.assertNull(entry);
}
}
}
} catch (Exception e) {
throw new RuntimeException(e);
}
});
System.out.println("Test passed");
}
/**
*
* Spawns 10 threads in separate threadpools to make sure each thread when spawned gets initialized for
* the first time and allocates new inheritableThreadLocals memory.
*
* Since each thread will hold up memory (memory leak case), the program will run of memory.
*
* The fix would eliminate OOM.
*/
@org.junit.jupiter.api.Test
public void testNoMemoryLeak_SystemShouldNotRunOutOfMemory() throws Exception{
final int numThreadPools = 10;
final List<ExecutorService> executorServiceList = getThreadPools(numThreadPools);
testAspectJLeak(executorServiceList);
System.out.println("Test passed");
}
/**
* Creates 1 thread per Threadpool.
*
* @param numThreadPools - Number of threadpools to create.
*
*/
private List<ExecutorService> getThreadPools(int numThreadPools) {
final List<ExecutorService> executorServiceList = new ArrayList<>();
for (int i = 0 ; i < numThreadPools; i++) {
// Could further disambiguate by adding custom thread naming.
executorServiceList.add(Executors.newFixedThreadPool(1));
}
return executorServiceList;
}
} |
This looks quasi identical to the code I was using in my own impromptu test code. 😉 Like I said, where I was stuck until yesterday was that I could not reproduce the problem with CTW, but now I can with the additional compiler switch. Your test code comes in handy, and I guess that at least part of it might end up in the AspectJ test bed, after I have an idea how to fix the problem. Currently, I do not, but I am also busy with non-AspectJ matters. |
Thanks @kriegaex |
I still do not have any good idea how to fix this, because non-inlined annotation-style aspects do need to keep around-closure information due to the possibility of multiple proceed calls in the same advice or proceed happening in a spawned thread. For native syntax aspects, it works differently, because the AspectJ compiler generates advice methods that directly take closures as parameters, while annotation-style advices either have no parameters or a proceeding join point instance. I.e., in this case the around closures must be bound and unbound manually. As AspectJ has no way of knowing if a long-lived thread might eventually issue a proceed call and therefore need the closure information, we cannot just discard it. I could revert 3c80a36, but then asynchronous nested proceeds would fail to work again. Currently, it looks like a choice between the plague and cholera, as we say in German. The suggested solution for now is to simply use native syntax aspects and the AspectJ compiler to compile them. Something like this: public aspect DemoAspect2Native {
Object around() : execution(* *(..)) && @annotation(DemoAnnotation2) {
System.out.println("DemoAspect2Native around");
return proceed();
}
public @interface DemoAnnotation2 {}
} Then, your problem should go away. I tried with your out-of-memory test, and it passes. I am keeping this issue open. BTW, it is related to #141. If I have an idea how to solve that one, probably this one would also be solved. |
I'll read up on native syntax aspects vs non-inlined annotation-style aspects. |
I have just deployed an alternative experimental fix to the snapshot repository. It makes your tests pass and also seems to fix #141 in one shot. The strategy is to use a list instead of a stack and never to pop any around closure elements from it. But at least, it is not thread-local. Instead, what is thread-local is an index for the current element, i.e. what leaks now are only integer values and the stack of around closures, until the joinpoint instance goes out of scope, which I guess is fine. It is not perfect, but hopefully better than the status quo. Please re-test. Thank you. |
according to https://rules.sonarsource.com/java/tag/leak/RSPEC-5164/. Now, there no longer is a thread-local stack of AroundClosure instances, but rather a list of them, which can only grow but never shrink. Instead, we now have a thread-local (integer) list index, for every thread being initialised with pointing to the last element. I.e., every thread can unwind by decrementing the index while proceeding, independently of other threads. A positive side effect is that this approach also works for long-lived threads from thread pools, used by executor services. Hence, test Bugs199Tests.testAsyncProceedNestedAroundAdviceThreadPool_gh128, which was previously commented out, has been activated and passes, see #141. I am not sure if this brings @AspectJ style, non-inlined, nested around advice execution functionally on par with native ones, but at least for current scenarios it seems to work. Fixes #288, #141. Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
according to https://rules.sonarsource.com/java/tag/leak/RSPEC-5164/. Now, there no longer is a thread-local stack of AroundClosure instances, but rather a list of them, which can only grow but never shrink. Instead, we now have a thread-local (integer) list index, for every thread being initialised with pointing to the last element. I.e., every thread can unwind by decrementing the index while proceeding, independently of other threads. A positive side effect is that this approach also works for long-lived threads from thread pools, used by executor services. Hence, test Bugs199Tests.testAsyncProceedNestedAroundAdviceThreadPool_gh128, which was previously commented out, has been activated and passes, see #141. I am not sure if this brings @AspectJ style, non-inlined, nested around advice execution functionally on par with native ones, but at least for current scenarios it seems to work. Fixes #288, #141. Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
Leak was introduced in commit 3c80a36, fixing #128, but introducing #288 instead, which was the lesser of two evils, but still bad for some users unwilling to use native AspectJ syntax for their aspects, avoiding the problem. Relates to #288. Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
Force-pushed regression tests without fix. I want to see them fail in this CI build first before pushing the fix on top of them. Thanks to @pagrawalgit for the code contribution. I adapted the test to instead of a JUnit test, because that better fits the AspectJ integration test structure, but both test cases are basically what you suggested. Update: The two regression tests expected to fail actually do fail. Fine. 👍 |
Thanks @kriegaex. I'll test this today. |
according to https://rules.sonarsource.com/java/tag/leak/RSPEC-5164/. Now, there no longer is a thread-local stack of AroundClosure instances, but rather a list of them, which can only grow but never shrink. Instead, we now have a thread-local (integer) list index, for every thread being initialised with pointing to the last element. I.e., every thread can unwind by decrementing the index while proceeding, independently of other threads. A positive side effect is that this approach also works for long-lived threads from thread pools, used by executor services. Hence, test Bugs199Tests.testAsyncProceedNestedAroundAdviceThreadPool_gh128, which was previously commented out, has been activated and passes, see #141. I am not sure if this brings @AspectJ style, non-inlined, nested around advice execution functionally on par with native ones, but at least for current scenarios it seems to work. Fixes #288, #141. Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
This CI build contains the fix. All tests should pass for it. If they do, I will deploy another snapshot, and if it also works for @pagrawalgit in his application, I can merge it. Update: That build passed, including the reactivated test for #141. The fix seems to be a substantial improvement and fix two problems in one shot. |
If you want to test, I just deployed commit 4414b42 with the polished fix. |
@pagrawalgit, sorry for pushing this, but my time window for creating an AspectJ maintenance release is closing. Soon, I will be busy with non-AspectJ stuff again. I would like this to be in, but not without your team's re-test feedback. |
@kriegaex I tested it locally with the demo application I have. The tests pass. Testing with the actual service will take me a day or two. Feel free to push the commit. Thanks for getting this out so quickly. |
@pagrawalgit: ping! |
It works!! |
Leak was introduced in commit 3c80a36, fixing #128, but introducing #288 instead, which was the lesser of two evils, but still bad for some users unwilling to use native AspectJ syntax for their aspects, avoiding the problem. Relates to #288. Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
according to https://rules.sonarsource.com/java/tag/leak/RSPEC-5164/. Now, there no longer is a thread-local stack of AroundClosure instances, but rather a list of them, which can only grow but never shrink. Instead, we now have a thread-local (integer) list index, for every thread being initialised with pointing to the last element. I.e., every thread can unwind by decrementing the index while proceeding, independently of other threads. A positive side effect is that this approach also works for long-lived threads from thread pools, used by executor services. Hence, test Bugs199Tests.testAsyncProceedNestedAroundAdviceThreadPool_gh128, which was previously commented out, has been activated and passes, see #141. I am not sure if this brings @AspectJ style, non-inlined, nested around advice execution functionally on par with native ones, but at least for current scenarios it seems to work. Fixes #288, #141. Signed-off-by: Alexander Kriegisch <Alexander@Kriegisch.name>
@pagrawalgit, AspectJ 1.9.21.2 has been released today. |
A few of our company production services saw an increase in HeapAfterGCUse metric after upgrading aspectJ from 1.8.14 to 1.9.9.1. We also tried upgrading to aspectj 1.9.21.1 but we are seeing the same behavior. The increase was significant for one of our services 7GB increase (32GB total heap size).
Upon investigation the team found the increase to be related to the AspectJ commit which causes AjcClosure objects to retain in heap memory in some scenarios.
For the impacted services, following conditions hold true.
The heap dump showed increase in (retained) AjcClosure objects referenced by Thread objects in inheritableThreadLocals. We came across this aspectJ commit which introduced InheritableThreadLocalAroundClosureStack.
We do not see this (heap memory increase) issue with 1.9.8 aspectJ version.
Code imitating our use case (also reproduces the issue):
As you can see below, Demo class spawns a new thread inline. In our use case, joinPoint.proceed() call is NOT async. We use a long living thread in a threadpool which is used to run the inline submitted thread by Demo class.
The Demo$AjcClosure classes (as part of the AroundClosure stack) are copied into inheritableThreadLocal of the spawned thread but never released (as seen in the attached heap snapshot)
Heap Dump (after Full GC)
The text was updated successfully, but these errors were encountered: