-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory-leak in multi-threading inference using Tensorflow and having org.bytedeco.javacpp.nopointergc=true (CPU) #690
Comments
Please see tensorflow/java#229 for a potential fix for this problem in the TF Java library. |
@karllessard is it possible to release tf java 0.2.1 with the hotfix? it doesn't look like a breaking change so a patch would be nice. DJl uses release versions of TF Java |
@skirdey 0.3.0-SNAPSHOT is pretty much up now, so please give that a try, but like I tried to explain on the pull request, I don't believe this will fix any issues for DJL. |
Good news @skirdey ! Important build issues that were preventing us to release 0.3.0 are now fixed, I will provide more details a bit later but you can also follow this thread to see the progress. |
The fix could be useful when I do not set nopointergc=True, as it was also
leaking memory just took 70 hours to catch it.
…On Friday, 12 March 2021, Karl Lessard ***@***.***> wrote:
Good news @skirdey <https://github.com/skirdey> ! Important build issues
that were preventing us to release 0.3.0 are now fixed
<tensorflow/java#240>, I will provide more
details a bit later but you can also follow this thread
<tensorflow/java#230> to see the progress.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#690 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHIJQDEYHKNQHHHV6OPGE3TDIGBJANCNFSM4YFNJGOA>
.
|
That doesn't fix any leak. If there's a leak, it's somewhere else. |
DJL team can publish 0.11.0-SNAPSHOT that depends on 0.3.0-SNAPSHOT. I am on it. |
We just had our community meeting and I’m targeting making a release of 0.3.0 by March 15th with the actual snapshot. Only detail is that the Windows MKL+GPU platform won’t be available for that release (at least not for this date), all other platforms will. |
Hi, just to let you know that 0.3.0 is now available on Maven Central. Please let us know how it goes with your experiments on this issue, thank you |
We tried it out and the memory still leaks, very fast when nopointergc=true and it takes 5 hours to halt when nopointergc=false |
Like @skirdey mentioned
We are going to integrate TfNDManager with PointerScope as @saudet suggested for the next step |
@stu1130 Can you point us to the code in DJL for that benchmark? |
|
I mean, what does it end up calling in TF Java? For example, @roywei says that
It sounds like there are dangling references to "constant operands" that are not closed with their sessions. Why is that? How do you make sure they get deallocated? |
Another thing you need to be careful about is that, currently, all output Tensor objects are not closed when a Session is closed, but you need to make sure that all output tensors are closed before the Session is closed. If you do not close all output Tensor before closing their Session, even if they are closed later on, that may still result in memory leaks. It's a bit weird, and I hope the people from TF Java realize that it's not a great way to do things, but I haven't been able to convince them, yet. For now, we need to manage all that manually. |
When TfNDArray.close(), the operand is set to null.
My understanding is that when we create Tensor from EagerSession or Session, the pointer to Tensor is attached to EagerSession/Session. Now with new fix, the Tensor is hold with WeakReference and therefore could be released by GC. How does it result in memory leaks? When we close the EagerSession/Session, shouldn't it free all attached Tensors? |
The behavior is not the same in eager or graph mode. In eager mode, it is true that the lifetime of a In graph mode though, all tensors returned by a
In eager mode, this should indeed do the job as the GC will eventually free the constant. The leak is probably elsewhere then. But note that one drawback on relying on the GC to free up the resources is that it does not know what is the size of the native resources being allocated. Therefore, while the JVM might think that there is plenty of memory available and that a GC cycle can wait, you might be running low, especially if you are keeping hold on very large tensors/constants for a long time. Again, the safest route is to close the eager session as soon as you can, but I'm not sure how's that possible in your current architecture, I'd have to take a deeper look at it to understand clearly how it works. |
Thanks @karllessard for those pointers. We will try to close tensors for both EagerSession/Session ASAP. |
The current implementation is incorrect, but they (the rest of the SIG JVM) are not concerned about this issue. Please make sure that SIG JVM knows that you would like this to be fixed in TF Java. I will keep trying to make this happen as part of SIG JVM, but I can't promise anything, so any help is welcome.
That's not true. I've explained this many times already, so I won't try to reexplain this here, but with the way you've "fixed" this, leaks will still happen, unless of course everything is done manually. |
Hi @stu1130 , I was experimenting with DJL's benchmark this evening and while I cannot draw conclusions yet on the TF side, I've noticed that the benchmark is collecting multiple metrics on each inference, which are being piled up in a list for the whole duration of the test and reaching the millions in number after a few hours. While instances of |
To clarify what I wrote above, the "issue" I'm referring to isn't the one in this thread about memory leaks in DJL, but another one concerning the incorrect use of WeakReference that @stu1130 inquired about. I think everyone is on the same page about fixing memory leaks in general, and right now we can avoid them by making sure that everything gets deallocated manually. Even if we started using WeakReference correctly somehow, it is my opinion that relying on GC is a bad idea anyway, and this is where my opinion diverges with the other members of SIG JVM. We all agree that calling things like Tensor.close() should work though, so if there's a bug there, we'll fix that. |
When running the benchmark above I get a bunch of warnings out of JavaCPP. @saudet any idea what's causing that?
I don't usually get this out of TF-Java when we run it separately, and I know that DJL uses it's own library loader which partially delegates to ours. |
For my use-case, I can not really have a blocking thread listening for GC -
I run Scala / Akka on my end and it somehow affects entire ecosystem
performance.
…On Thu, Mar 25, 2021 at 5:20 PM Karl Lessard ***@***.***> wrote:
@skirdey <https://github.com/skirdey>, nopointergc must not be set to
true. Also, you might need to comment out this line
<https://github.com/awslabs/djl/blob/215226750783d15a06e492e4ea96c429f8d2f103/examples/src/main/java/ai/djl/examples/inference/benchmark/MultithreadedBenchmark.java#L136>
in the benchmark as I did, since metrics collected by DJL during the
benchmark ends up taking a lot of space, please read my previous comment
<#690 (comment)> about
that.
Meanwhile I'll look to run a different test, thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#690 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHIJQGGLR7CBBGKI2OWAX3TFPHNXANCNFSM4YFNJGOA>
.
|
Do you have this problem with Java 9 cleaners too? They are increasingly used inside the JVM to deal with resource cleanup and it's basically the same idiom. |
Still on Java 8 :/
…On Thursday, 25 March 2021, Adam Pocock ***@***.***> wrote:
Do you have this problem with Java 9 cleaners too? They are increasingly
used inside the JVM to deal with resource cleanup and it's basically the
same idiom.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#690 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHIJQFA4ENOTM22D32CY2TTFPLFNANCNFSM4YFNJGOA>
.
|
I'll try out the new fix with nopointergc=false and report back.
…On Thu, Mar 25, 2021 at 5:53 PM Stan Kirdey ***@***.***> wrote:
Still on Java 8 :/
On Thursday, 25 March 2021, Adam Pocock ***@***.***> wrote:
> Do you have this problem with Java 9 cleaners too? They are increasingly
> used inside the JVM to deal with resource cleanup and it's basically the
> same idiom.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#690 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABHIJQFA4ENOTM22D32CY2TTFPLFNANCNFSM4YFNJGOA>
> .
>
|
Thanks @skirdey , please don't forget to disable DJL's metrics since they can grow drastically during the benchmark and cause an OOM if you are running on a low-memory JVM. |
@saudet I am testing against mine new refactoring that purely depends on Javacpp layer. What we do now is to disable GC, track every native resource and close them as soon as we don't need them. But I found there is a small incremental leak on native memory. Is there a way to find it out? I tried the command you showed in another issue but it didn't print anything . Does it only work on Pointer attached to a PointerScope? Or it works on every pointer we created?
I found we don't release TF_Output pointer in session run. Can it be the reason? |
@Craigacp It just means that the
When "org.bytedeco.javacpp.nopointergc" is true, JavaCPP won't be trying to use GC, so we're not going to get any "Collecting ..." lines in the debug log. You'll need to enable GC temporarily to see if it can pick up something that way. It won't be able to track anything that isn't registered with it though, so if you're still not seeing anything with GC enabled, I would say it's something it doesn't know about that is leaking. However, native arrays such as the one that you're linking to for As for detecting memory leaks in native code in general, I find that Valgrind works the least badly with the JVM: |
I did another same test (GC is enabled && setMetric is commented out && jvmArgs = ["-Xmx128", "-Xmx128m"]) but unfortunate got OOM
Maybe I am using a more powful CPU machine (c5.2xlarge) so small objects like Pointer got accumulated faster to reach OOM |
This is strange, indeed, and not easy to debug if I cannot reproduce it on my side. @Craigacp , do you still have access to these fast machines so we can run the same test on them? BTW, TF Java 0.3.1 has been released this morning which not only fixes a leak with the new string tensor but also potentially another leak (never observed so far) that could happened when closing multiple graph sessions. It might be better to work with this version from now on. |
@stu1130 128 MB is not a lot of memory for the JVM itself + a couple of things from TF and DJL. |
@saudet I tested mine new refactoring(1G jvm size && nopointergc=true) and it went OOM after 2 hours. I used the jconsole and found native non-heap memory is pretty stable but old gen kept growing until OOM exception with several old gen GC attempts. I will share my experiment with nopointergc=false (GC enabled) as a baseline. |
@saudet
Here is JVM dump 3,4 mins before OOM. I tried to hit the perform GC button several times. Experiment 2:
Here is the jconsole snapshot @saudet Is there a way to allow GC to release those pointer objects while -Dorg.bytedeco.javacpp.nopointergc=true? I think I found the problem. When nopointergc is true, the deallocator keep appending like a LinkedList with |
….javacpp.nopointergc" (issue deepjavalibrary/djl#690)
Yes, I've noticed as well. I fixed this in commit bytedeco/javacpp@7f27899 along with proper testing for that case. Sorry about that. In practice though, there is little to be gained by disabling GC entirely. As long as we make sure to deallocate explicitly everything we're not going to lose any performance. Incidentally, this gives us a benchmark about the kind of performance we can expect with and without relying on GC. On my machine (Fedora 32), PointerTest#testDeallocator produces these kinds of numbers:
(Windows gives me similar but slightly worse results.) The time for the latter explicit case doesn't change regardless of whether "org.bytedeco.javacpp.nopointergc" is true or not. Clearly, the problem isn't only with GC itself, but also with allocating native memory in one thread, and deallocating it in another thread... |
@saudet awesome! I was testing my new code with explicit deallocator call when I am done with that pointer. Looks like I don't need to create that PR! So I can now totally rely on PointerScope without any explicit deallocate() call right? Or do you still recommend to add deallocate call at the end of PointerScope? The reason why we would like nopointergc=true is not only about GC but we don't want the blocking deallocator thread that calls System.gc(). But thanks for that experiment. We are more clear to what direction is right. Let me know when the 1.5.6-snapshot is out and when you are going to release 1.5.6 with that fix! Thanks! |
PointerScope.close() ends up calling Pointer.deallocate() under the hood, so it's the same as calling it directly. In other words, it's explicit deallocation that doesn't rely on GC. About System.gc(), another way to prevent calls to that is by setting maxBytes to 0, but that's already being done in DJL by default: |
That's all I need. Thanks. I checked javacpp 1.6.0-SNAPSHOT but found the fix bytedeco/javacpp@7f27899 is not included yet. Let me know when it is available. I would like to try it out! |
It's in 1.5.6-SNAPSHOT: https://github.com/bytedeco/javacpp/blob/7f27899578dfa18e22738a3dd49701e1806b464a/pom.xml#L7 1.6.0-SNAPSHOT is something else that's probably not going to happen. |
@saudet So far I didn't find accumulating DeallocatorReference & NativeReference any more with the new fix! |
accumulating DeallocatorReference & NativeReference is what I also saw in the root of the thread, so it is awesome that it is fixed! I don't think that there were any other issues when running DJL / TF / nopointergc=true in prod, but need to try run inference again at scale. |
There should be no discernible difference in speed between JavaCPP 1.5.5 with maxBytes = 0 and JavaCPP 1.5.6-SNAPSHOT with noPointerGC = true, but if you do notice a difference, please let me know and I'll release JavaCPP 1.5.5-1 or something with the fix. Thanks for testing! |
FYI, I've just updated JavaCPP to skip over all synchronized code when GC fallback is disabled, see bytedeco/javacpp@d788390. That may be able to help further with latency, but I'm not seeing any difference in my tests. |
@skirdey BTW, latency may be potentially even lower with TF Lite, so if your models are compatible with it, please give it a try. So I've created builds that expose the C++ API with JavaCPP: Those are currently low-level wrappers for the C++ API, but DJL could of course use this instead of the official API. @stu1130 If this looks like something you guys may want to use but would prefer to maintain them here, please let me know! |
BTW, I've released JavaCPP 1.5.6, so you can start using that. |
@saudet Thank you so much! We have updated to 1.5.6 in master branch, will release with your fix in next release. |
issue was resolved |
Description
Possible memory-leak in multi-threading inference using Tensorflow and having org.bytedeco.javacpp.nopointergc=true
CPU inference.
Expected Behavior
Garbage collection removing objects from Old Generation
Error Message
Java OOM
How to Reproduce?
Run multi-threaded inference for 30minutes with -Dorg.bytedeco.javacpp.nopointergc=true so you don't have JavaCPP Deallocator blocking thread.
/gradlew benchmark -Dorg.bytedeco.javacpp.nopointergc=true -Dai.djl.default_engine=TensorFlow -Dai.djl.repository.zoo.location="https://storage.googleapis.com/tfhub-modules/tensorflow/resnet_50/classification/1.tar.gz?artifact_id=tf_resnet" --args='-n tf_resnet -t 10 -c 1000000 -s 1,224,224,3'
The text was updated successfully, but these errors were encountered: