Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OutOfMemoryError: Physical memory usage is too high: physicalBytes (5300M) > maxPhysicalBytes (5300M) #468

Closed
pwittchen opened this issue Mar 16, 2021 · 23 comments

Comments

@pwittchen
Copy link

Hi!

First of all, thanks for the great library!

I'm using your library via nd4j and I'm getting the following error:

Caused by: java.lang.OutOfMemoryError: Cannot allocate new LongPointer(2): totalBytes = 288, physicalBytes = 5300M
	at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:88)
	at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:53)
	at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.createShapeInfo(NativeOpExecutioner.java:2021)
	at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:3249)
	at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:67)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:195)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:189)
	at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:91)
	at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:420)
	at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4032)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.create(BaseNDArray.java:2002)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.get(BaseNDArray.java:4284)
	at org.deeplearning4j.nn.layers.recurrent.LSTMHelpers.activateHelper(LSTMHelpers.java:252)
	at org.deeplearning4j.nn.layers.recurrent.LSTM.activateHelper(LSTM.java:177)
	at org.deeplearning4j.nn.layers.recurrent.LSTM.activate(LSTM.java:147)
	at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
	at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2380)
	... 23 more
Caused by: java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (5300M) > maxPhysicalBytes (5300M)
	at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:682)
	at org.bytedeco.javacpp.Pointer.init(Pointer.java:127)
	at org.bytedeco.javacpp.LongPointer.allocateArray(Native Method)
	at org.bytedeco.javacpp.LongPointer.<init>(LongPointer.java:80)
	... 39 more

This is caused by the following call:

//... get vectorLength dynamically

double[] flat = new double[vectorLength];

//... fill in flat array here...

Nd4j.create(flat, new int[]{1, vectorLength, 1}, 'c') // OutOfMemory error is thrown in this line

It's weird because even the error message physicalBytes (5300M) > maxPhysicalBytes (5300M) is wrong because these values are the same and totalBytes = 288 is lower value than physicalBytes = 5300M.

Can you suggest any possible solution for this problem? Does it require any sort of configuration on my side or it's a bug inside javacpp library? Of course, I'm willing to improve your library if it's needed and when I'll know how to fix this issue, but I need some guidance.

Thanks for your time. I'm looking forward to your reply.

Kind Regards,
Piotr

@saudet
Copy link
Member

saudet commented Mar 17, 2021

If your application uses memory-mapped files, it counts towards "physical memory" used by its process, at least on Linux.
You can disable that check by setting the "org.bytedeco.javacpp.maxPhysicalBytes" system property to 0:
http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/Pointer.html#maxPhysicalBytes

It looks like that amount used by files is available as the third value in /proc/self/statm tough.
We should probably subtract that quantity from the "physical memory" returned by JavaCPP.
What do you think?

@pwittchen
Copy link
Author

pwittchen commented Mar 18, 2021

Ok, thanks for the reply. In my project, team is going to solve the problem to avoid this error, but it's good to know about such property. Nevertheless, I think property will eliminate this exception, but I suppose it may cause another one.

I'm not sure if I'm able to answer your question in the last paragraph. Probably, I'll need to read more source code of the JavaCPP project.

@liufuyang
Copy link

@pwittchen Just out of curiosity, did you happen to use ZGC in your application? 🤔

@pwittchen
Copy link
Author

To be honest, I don't know what type of GC was running on the machine, where I got this error. It was production server, not my local machine and I don't have direct access to its detailed configuration.

@saudet
Copy link
Member

saudet commented May 6, 2021

As discussed in issue #474, something similar is happening when using ZGC. In that case though, the amount of "shared memory" reported by the kernel on Linux seems to be close to the amount of "resident memory", such that if we remove the former from the later, if looks like we are essentially subtracting the amount of memory used by the Java heap... We could try to figure out all the cases we care about, but since the OS itself isn't able to provide good numbers on its own, I don't think we'll be able to do much better in general. Also, on Mac and Windows, the OS offers no estimate of the shared memory, and we have to compute it by iterating over all pages of the process, manually, adding further overhead to memory allocation and increasing lock contention.This feels like a pretty hard nut to crack. In any case, we do not need to worry about this when not relying on GC, so this further strengthens the case of managing off-heap memory with something like PointerScope instead:
http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/

/cc @Craigacp @karllessard @rnett

saudet added a commit that referenced this issue May 7, 2021
… part of documentation comments (issue #475)

 * Set `Pointer.maxPhysicalBytes` to `4 * Runtime.maxMemory()` by default as workaround for memory-mapped files, ZGC, etc (issue #468)
@saudet
Copy link
Member

saudet commented May 7, 2021

I've doubled the default value to 4 * Runtime.maxMemory() in commit 343045d. That should help users from getting spurious OutOfMemoryError with the default settings, in most circumstances...

@liufuyang
Copy link

Aha, thanks for doing something about this. I can't comment on the solutions here as I know too little about this domian. So basically you guys don't do PR + Review when applying patches in this repo? I am also wondering is bytedeco a company or a non-profit organisation? :)

@saudet
Copy link
Member

saudet commented May 8, 2021

Ah, well, as the owner of this repository, I just push changes since I don't have someone to review my changes anyway... I review patches from everyone though. Bytedeco isn't a company or a non-profit, I'm not sure what it is at the moment, but if I'm given the opportunity to make something out of it, I'm thinking along the lines of something like the "Anaconda of Java":
http://bytedeco.org/news/2018/07/17/bytedeco-as-distribution/

@saudet
Copy link
Member

saudet commented Jul 10, 2021

As per issue bytedeco/javacv#1477 and many other similar threads, the same kind of thing happens on Android, where even after setting "largeHeap" to "true" as per https://developer.android.com/guide/topics/manifest/application-element Runtime.maxMemory() often returns values below 1 GB, which is way too small given the amount of memory on today's devices.

@saudet
Copy link
Member

saudet commented Aug 3, 2021

I've released JavaCPP 1.5.6 with the workaround described above. Thanks for reporting! Please let me know though if you encounter any cases where the new default values still cause spurious failures.

@balgillo
Copy link

balgillo commented Oct 1, 2021

Following up on our discussion from #516:

That sounds like a good number to use on Mac, but I'm not sure how we'd go about getting the equivalent of that on Linux and Windows. Do you happen to know the definition of "physical footprint"? Also, on Windows, getting a better estimate involves a lot of processing, so I'm not sure it's worth the extra cycles spent on that. As imperfect as it may be, the "resident/working set" is at least a pretty well defined number across platform. Trying to use random numbers on different platforms is bound to cause problems.

We've dug a bit deeper to understand the difference between physical footprint and RSS on Mac better. The big discrepancy we are seeing on ARM-based Mac is mainly accounted for by a memory block called the "AOT shared cache file". It's described in this article: Reverse-engineering Rosetta 2 part2: Analyzing other aspects of Rosetta 2 runtime and AOT shared cache files. As you can see from that article, this block can be quite large in the vmmap output:

mapped file              7ffe963e4000-7fff20100000 [  2.2G  59.4M     0K     0K] r-x/r-x SM=COW          Object_id=8f689f0f

We also observe this same size on our M1 Mac. We think this block accounts for most of the 3-4 GB RSS we saw when running the same application on M1 Mac under Rosetta2 compared to 1 GB RSS on x64 Mac. Footprint was similar ~750 MB on both platforms. So Footprint is a better reflection of the memory allocated by the app rather than the platform.

The other difference between RSS and footprint is resident clean pages - these only count in RSS. It's explained nicely in this talk: iOS Memory Deep Dive. We saw about a 20% overhead from clean pages (RSS was 20% higher than footprint on an x64 Mac, and this was accounted for by the clean pages, as shown by the output from the footprint tool).

JavaCPP is currently using the RSS number on Mac to determine whether to request a GC and whether it can allocate more native memory. I think it would be better to monitor JavaCPP's allocations only, rather than try to detect overall process memory usage, which is fraught with difficulties, especially OS-driven overheads which don't correspond to any mallocs by application code. If you want to persist with measuring overall process memory, I still think that on Mac, footprint is a better measure than RSS, because it cuts out the non-malloc overheads. Perhaps you could count the malloc sections from the vmmap summary (MALLOC_LARGE, MALLOC_MEDIUM, MALLOC_SMALL and MALLOC_TINY) but reading footprint seems much simpler than doing that.

@saudet
Copy link
Member

saudet commented Oct 1, 2021

That value sounds fine for Mac, and we can make that available as an option to users. Please do send a pull request with changes that include a call to that function!

However, until we have equivalents for other platforms, it's not something we can use by default. The default for maxPhysicalBytes is going to be larger than 4 GB anyway, so it shouldn't cause problems for most users.

@saudet
Copy link
Member

saudet commented Jan 21, 2022

Revisiting this since simply increasing the maxPhysicalBytes isn't going to cut it, see qupath/qupath#856 (comment).

So, it sounds like "phys_footprint" includes swapped memory as well:
https://programmer.ink/think/learn-more-about-oom-low-memory-crash-in-ios.html
There's a similarly sounding "PrivateUsage" number on Windows:
https://docs.microsoft.com/en-us/windows/win32/api/psapi/ns-psapi-process_memory_counters_ex
I'm guessing that on Linux the equivalent number is RssAnon + VmSwap...

What about creating a new number, like processBytes() or something with these values from each platform, and set a new maxProcessBytes threshold, while setting the default for maxPhysicalBytes to 0, which would allow users to keep using that threshold if they have it set manually to something else? @balgillo @petebankhead

saudet added a commit that referenced this issue Jan 23, 2022
@saudet
Copy link
Member

saudet commented Jan 23, 2022

Nope, "PrivateUsage" is virtual memory, which isn't useful information. So, I've figured that we should just try to enhance physicalBytes() by taking away shared memory from it, as per commit 2616b0b, and see what that gives. It's not clear that adding swap memory to that figure would help, but we would for sure incur additional overhead on both Linux and Windows, so let's leave that out of the picture for now. On Mac, "phys_footprint" adds swap memory and other stuff to "internal" (which is what we're after for "physicalBytes"), so "internal" should be strictly smaller than "phys_footprint", and thus should work just fine for Rosetta.

@petebankhead Could you give it a try with QuPath on your M1 with Rosetta to see what that gives?

Now, on Windows, QueryWorkingSet() is quite a lot slower than GetProcessMemoryInfo(), but there doesn't appear to be any more efficient way to get that information. Things on Windows aren't fast in general anyway, so it might not matter, but if someone figures out something faster at some point, we can always update physicalBytes() at that time.

@saudet
Copy link
Member

saudet commented Feb 10, 2022

It turns out that the slowness of QueryWorkingSet() is a big issue on moderately sized working sets, so I've added back the old code and mapped it to Pointer.physicalBytesInaccurate() that gets called first instead of Pointer.physicalBytes(), which only gets called when we really need it. It should work well in most cases, especially since I haven't heard of any Windows user suffering from this issue in the first place.

These changes have been released with JavaCPP 1.5.7, so I'll close this issue, but please let me know if you're still having problems like that. Thank you everyone for your time on this!

@b005t3r
Copy link

b005t3r commented Feb 25, 2022

I just tried the version 1.5.7 with ZGC and it crashed with an out of memory exception after running for a while, so I guess the issue is still there (I'm running it on macOS, btw).

Is there a known workaround for this? Like setting memory limits somewhere below the actual system memory amount?

@saudet
Copy link
Member

saudet commented Feb 25, 2022

When using ZGC you might need to set the maximum to a higher value, yes, so try to increase the maximum and see what happens.

@b005t3r
Copy link

b005t3r commented Feb 25, 2022

OK, I tried setting higher limits, it doesn't help. Memory usage always slowly increases and the app crashes with the exception mentioned above.

@saudet
Copy link
Member

saudet commented Feb 25, 2022

It sounds like you just have a memory leak. That's unrelated to ZGC. Find what isn't getting deallocated and deallocate it!

@b005t3r
Copy link

b005t3r commented Feb 25, 2022

Well, I do, it's in the javacpp Pointer class :)

This does not happen, if I use G1, though.

@saudet
Copy link
Member

saudet commented Feb 25, 2022 via email

@b005t3r
Copy link

b005t3r commented Feb 25, 2022

I think you've been discussing issues with ZGC and javacpp here, I'm just reporting back that those are still there, I'm not expecting a fix - it looks like it's not fixable (from your previous comments).

@saudet
Copy link
Member

saudet commented Feb 25, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants