Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mat in opencv leaks off-heap memory and causes OutOfMemoryError #423

Closed
mythly opened this issue May 18, 2017 · 29 comments
Closed

Mat in opencv leaks off-heap memory and causes OutOfMemoryError #423

mythly opened this issue May 18, 2017 · 29 comments
Labels

Comments

@mythly
Copy link

mythly commented May 18, 2017

I'm using opencv in javacpp-presets and I found the deallocator of Mat is not working properly.

I test this by creating Mat objects (without keeping their references)

import org.bytedeco.javacpp.Pointer;
import org.bytedeco.javacpp.opencv_core;

public class MemoryTest {

    public static void main(String args[]) {
        for (int i = 0; i < 10000000; ++i) {
            opencv_core.Mat mat = new opencv_core.Mat(200, 100, opencv_core.CV_8UC3);
            if (i % 10000 == 0) {
                long maxPhysicalBytes = Pointer.maxPhysicalBytes();
                long physicalBytes = Pointer.physicalBytes();
                Runtime runtime = Runtime.getRuntime();
                long usedBytes = runtime.totalMemory() - runtime.freeMemory();
                System.out.printf("%d %d %d %d\n", i, maxPhysicalBytes, physicalBytes, usedBytes);
            }
        }
    }

}

The output

0 932184064 56340480 10609104
10000 932184064 100937728 11909384
......
100000 932184064 500686848 22312256
......
180000 932184064 856911872 32505856
2017-05-18T19:29:44.556+0800: 0.574: [GC (Allocation Failure) [PSYoungGen: 31744K->5088K(36864K)] 31744K->22919K(121856K), 0.0309626 secs] [Times: user=0.07 sys=0.02, real=0.03 secs] 
190000 932184064 896753664 26030728
......
250000 932184064 913297408 33923344
2017-05-18T19:29:44.760+0800: 0.778: [GC (System.gc()) [PSYoungGen: 15913K->5104K(68608K)] 33744K->31671K(153600K), 0.0131685 secs] [Times: user=0.04 sys=0.03, real=0.02 secs] 
2017-05-18T19:29:44.773+0800: 0.791: [Full GC (System.gc()) [PSYoungGen: 5104K->0K(68608K)] [ParOldGen: 26567K->22753K(84992K)] 31671K->22753K(153600K), [Metaspace: 5864K->5864K(1056768K)], 0.1087988 secs] [Times: user=0.72 sys=0.00, real=0.11 secs] 
2017-05-18T19:29:44.982+0800: 1.000: [GC (System.gc()) [PSYoungGen: 0K->0K(68608K)] 22753K->22753K(153600K), 0.0023905 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 
......
2017-05-18T19:29:45.842+0800: 1.860: [Full GC (System.gc()) [PSYoungGen: 0K->0K(68608K)] [ParOldGen: 773K->773K(84992K)] 773K->773K(153600K), [Metaspace: 5864K->5864K(1056768K)], 0.0033012 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
Exception in thread "main" java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes = 913M > maxPhysicalBytes = 889M
	at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:562)
	at org.bytedeco.javacpp.Pointer.init(Pointer.java:121)
	at org.bytedeco.javacpp.opencv_core$Mat.allocate(Native Method)
	at org.bytedeco.javacpp.opencv_core$Mat.<init>(opencv_core.java:15395)
	at MemoryTest.main(MemoryTest.java:8)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Heap
 PSYoungGen      total 68608K, used 1655K [0x00000000f5980000, 0x00000000fa180000, 0x0000000100000000)
  eden space 63488K, 2% used [0x00000000f5980000,0x00000000f5b1dcf0,0x00000000f9780000)
  from space 5120K, 0% used [0x00000000f9780000,0x00000000f9780000,0x00000000f9c80000)
  to   space 5120K, 0% used [0x00000000f9c80000,0x00000000f9c80000,0x00000000fa180000)
 ParOldGen       total 84992K, used 773K [0x00000000e0c00000, 0x00000000e5f00000, 0x00000000f5980000)
  object space 84992K, 0% used [0x00000000e0c00000,0x00000000e0cc15a8,0x00000000e5f00000)
 Metaspace       used 5881K, capacity 6302K, committed 6656K, reserved 1056768K
  class space    used 640K, capacity 654K, committed 768K, reserved 1048576K

One can see that java heap memory is almost not used (30MB) and the mat objects are all collected by gc.
However, the value of Pointer.physicalBytes() keeps increasing until the limitation of Pointer.maxPhysicalBytes() (930MB), and finally throws OutOfMemoryError.

Is this a bug of the deallocator?

package version (I'm using sbt): "org.bytedeco.javacpp-presets" % "opencv" % "3.2.0-1.3" classifier "" classifier "linux-x86_64".
extra java options: -Xmx500m -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps

@saudet
Copy link
Member

saudet commented May 18, 2017

If you need to allocate more than that amount of Mat objects, increase either the Runtime.maxMemory() value with for example the java -Xmx command line option, or set the "org.bytedeco.javacpp.maxphysicalbytes" system property:
http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/Pointer.html#maxPhysicalBytes

@mythly
Copy link
Author

mythly commented May 19, 2017

@saudet Thanks for your reply.

My point is: I didn't keep references to the Mat objects I allocated, so their memory should be deallocated, both the java heap memory and off-heap memory. But the testing shows that the off-heap memory is not correctly released.

@mythly mythly changed the title creating some instances of Mat in opencv throws OutOfMemoryError Mat in opencv leaks off-heap memory and causes OutOfMemoryError May 19, 2017
@saudet
Copy link
Member

saudet commented May 19, 2017

I see, does it help if you increase http://bytedeco.org/javacpp/apidocs/org/bytedeco/javacpp/Pointer.html#maxRetries (via the "org.bytedeco.javacpp.maxretries" system property) to something like 1000?

@mythly
Copy link
Author

mythly commented May 19, 2017

I tried but no help.

The gc log shows that the java heap is clean enough after the third full gc.

2017-05-19T10:56:47.596+0800: 1.129: [Full GC (System.gc()) [PSYoungGen: 0K->0K(68608K)] [ParOldGen: 4907K->773K(84992K)] 4907K->773K(153600K), [Metaspace: 5887K->5887K(1056768K)], 0.0051643 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]

I double checked this using jmap -J-d64 -histo <pid>, which shows no more 'org.bytedeco.javacpp.opencv_core.Mat' or 'org.bytedeco.javacpp.Pointer$NativeDeallocator' objects live at that moment.

@saudet
Copy link
Member

saudet commented May 19, 2017

So even manually calling Mat.deallocate() right after every allocation also ends up running out of memory?

@mythly
Copy link
Author

mythly commented May 19, 2017

Manually calling Mat.deallocate() right after every allocation solves out of memory problem.

It is also the work-around solution currently using in our application.

@saudet
Copy link
Member

saudet commented May 19, 2017

Strange, the deallocator thread just basically calls Mat.deallocate() on each object...

@mythly
Copy link
Author

mythly commented May 19, 2017

I think I found the problem.

The deallocator in jni code use delete to release off-heap memory

static void JavaCPP_org_bytedeco_javacpp_opencv_1core_00024Mat_deallocate(void *p) { delete (::cv::Mat*)p; }

and use resident set size to evaluate the size of used off-heap memory.

/** Returns the amount of physical memory currently used by the whole process, or 0 if unknown.
     * Also known as "resident set size" (Linux, Mac OS X, etc) or "working set size" (Windows). */
    @Name("JavaCPP_physicalBytes") public static synchronized native long physicalBytes();

However this link discussed that delete does not return the memory to operating system immediately, which means resident set size would not decrease right after System.gc().
That's why javacpp says the Physical memory usage is too high, even after deleting many Mat objects.

The solution on linux could be calling malloc_trim(0); after System.gc() in

while (count++ < maxRetries && ((maxBytes > 0 && DeallocatorReference.totalBytes + r.bytes > maxBytes) || (maxPhysicalBytes > 0 && (lastPhysicalBytes = physicalBytes()) > maxPhysicalBytes))) {
  // try to get some more memory back
  System.gc();
  Thread.sleep(100);
}

@saudet saudet added bug and removed question labels May 19, 2017
@saudet
Copy link
Member

saudet commented May 19, 2017

That's interesting, thanks for investigating this!

@saudet
Copy link
Member

saudet commented May 20, 2017

The behavior of those calls are pretty complex. Have you been able to confirm that a call to malloc_trim(0) helps?

@mythly
Copy link
Author

mythly commented May 20, 2017

Yes, I have tested it on my PC ubuntu 14.04, x86_64, glibc 2.19, g++4.8.2, opencv 2.4.13

The testing code

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <malloc.h>

#include <opencv2/core/core.hpp>
using namespace cv;

const int MAXN = 10000;

void info(int iter)
{	
	int fd = open("/proc/self/statm", O_RDONLY, 0);
    if (fd >= 0) {
        char line[256];
        char* space;
        lseek(fd, 0, SEEK_SET);		
        if (read(fd, line, sizeof(line)) > 0 && (space = strchr(line, ' ')) != NULL) {
			printf("%d: %s\n", iter, line);
		}
	}	
	close(fd);
}

int main()
{
	Mat* mats[MAXN];
	for (int i = 0; i < MAXN; ++i) {
		mats[i] = new Mat(200, 100, CV_8UC3);
		if (i % 1000 == 0) {
			info(i);	
		}
	}
	for (int i = 0; i < MAXN; ++i) {
		delete mats[i];
		if (i % 1000 == 0) {
			info(i);	
		}
	}
	info(MAXN + 1);
	malloc_trim(0);
	info(MAXN + 2);
		
	return 0;
}

compile and run

g++ -g -O2 --std=c++11 test.cpp -o test `pkg-config --cflags --libs opencv`
./test

the output

0: 6910 495 435 2 0 116 0

1000: 21587 2156 1019 2 0 14793 0

2000: 36304 3212 1019 2 0 29510 0

3000: 50976 4268 1019 2 0 44182 0

4000: 65649 5324 1019 2 0 58855 0

5000: 80366 6380 1019 2 0 73572 0

6000: 95039 7436 1019 2 0 88245 0

7000: 109712 8492 1019 2 0 102918 0

8000: 124429 9548 1019 2 0 117635 0

9000: 139101 10670 1019 2 0 132307 0

0: 153774 11726 1019 2 0 146980 0

1000: 153774 11726 1019 2 0 146980 0

2000: 153774 11726 1019 2 0 146980 0

3000: 153774 11726 1019 2 0 146980 0

4000: 153774 11726 1019 2 0 146980 0

5000: 153774 11726 1019 2 0 146980 0

6000: 153774 11726 1019 2 0 146980 0

7000: 153774 11726 1019 2 0 146980 0

8000: 153774 11726 1019 2 0 146980 0

9000: 153774 11726 1019 2 0 146980 0

10001: 153774 11726 1019 2 0 146980 0

10002: 6882 1115 1019 2 0 88 0
6980 0

You can see that number of pages keeps 11726 while deleting 10000 Mat objects at one moment.
After calling malloc_trim(0) , the number of pages decreases to 1115 immediately.

However, this behavior depends on the implementation of glibc and should be tested on more environments if possible.

@saudet
Copy link
Member

saudet commented May 20, 2017 via email

@saudet
Copy link
Member

saudet commented May 21, 2017

I tested it and it still releases most of the memory, interesting. What would be the equivalent on Mac OS X and Windows? Does Android implement it?

@mythly
Copy link
Author

mythly commented May 22, 2017

You worry about the hole problem and you want to know "what happens if we forget to delete the Mat with the highest address?"

I tested it too and calling malloc_trim(0) in that case:

  1. does not change the number of virtual pages (the first integer in /proc/self/statm)
  2. does not change the location of program break (could be obtained by sbrk(0))
  3. does decrease the number of resident pages (the second integer in /proc/self/statm), similar with what you tested

As it claims here, malloc_trim can release only the free (virtual) memory at the top of the heap, which matches observation 1 & 2. However, to my best knowledge, no document about observation 3 could be found. So, I had to regard "releasing most of the (physical) memory, even the one not at the top of the heap" as some side effect of malloc_trim(0). Some discussion could be found here.

Whether to use this magic trick is on your decision.

About other platforms, should try to reproduce this bug first?

@saudet
Copy link
Member

saudet commented May 28, 2017

Ok, I verified that the same thing happens on Mac (though what is the equivalent of malloc_trim(0) there?), but it does not happen on Windows. However, Windows has other issues. When too much memory gets allocated, deallocation becomes very slow, so we shouldn't be allocating more memory then we really need anyway...

@mythly
Copy link
Author

mythly commented Jun 6, 2017

What's your Mac OS version?

I tested (with the code I posted in the very beginning) on OS X Yosemite 10.10.5 and found no OutOfMemoryError. When javacpp reaches the bound of maxPhysicalBytes() and calls System.gc();, the value of physicalBytes() does decreases from 500MB to 400MB.

This link shortly describes the memory management of Mac OS and the things are quite different from those on linux. I have no idea whether the equivalent of malloc_trim(0) on Mac exists or not.

@saudet
Copy link
Member

saudet commented Jun 6, 2017

Yosemite here as well. I only tried your second "testing code" though. When things get called across threads, maybe it acts differently?

@mythly
Copy link
Author

mythly commented Jun 6, 2017

Yes, the second "testing code" shows that the physical memory is not immediately freed after delete on Mac. But then, the behavior in different platforms and different use cases is complicated and beyond my knowledge. It would be nice if some operating system expert gives some suggestions.

@youzp
Copy link

youzp commented Jun 21, 2017

@mythly In the case of multi-threaded high concurrent request, whether malloc_trim(0) should be locked by std::mutex?At the end of each request need to be called malloc_trim(0) ?

@mythly
Copy link
Author

mythly commented Jun 22, 2017

@youzp malloc_trim is thread safe and calling malloc_trim(0) after System.gc() would be a better choice.

saudet added a commit to bytedeco/javacpp that referenced this issue Jul 9, 2017
@saudet
Copy link
Member

saudet commented Jul 9, 2017

Finally got around to adding the call. Let me know if you still have issues with that! Thanks

@saudet
Copy link
Member

saudet commented Jul 27, 2017

Fix released in JavaCPP 1.3.3. Thanks for reporting and testing this out!

@mcloh
Copy link

mcloh commented Nov 4, 2017

Hi guys, I am still having this issue... https://stackoverflow.com/q/47112338/2046728

java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes = 1G > maxPhysicalBytes = 1G
org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:576)

Already downloaded the latest version of JavaCPP (1.3.3) but kept the same behavior :-/

@saudet
Copy link
Member

saudet commented Nov 5, 2017

What's your platform?

@mcloh
Copy link

mcloh commented Nov 6, 2017

Linux (Ubuntu 16.04 64bits)

@saudet
Copy link
Member

saudet commented Nov 6, 2017

You're probably just corrupting your memory. For example, you're calling cvClearMemStorage(storage) before accessing that region of memory...

@mcloh
Copy link

mcloh commented Nov 10, 2017

@saudet, so you are saying that I should't call cvClearMemStorage(storage) there? I understand as it is cleaning the storage before calling the loop of cvGetSeqElem(...) in order to place the rectangles in the image - and even if I place cvClearMemStorage(...) after all or before all it still accumulating... what would be the best practice here then?

@saudet
Copy link
Member

saudet commented Nov 10, 2017 via email

@mcloh
Copy link

mcloh commented Nov 13, 2017

Hi @saudet ,
I've rewritten my code to something alike the sample - it works basically the same way - monitoring the physical memory it increases over time although jvm heap size is under control - after about 650 image analysis it crashes with the same error.
Currently I am working around this by running the OpenCV part in a parallel process -> ObjectDetection is a separate executable jar and I call it from my application as a command line, so it runs in a separate jvm and has a much shorter lifecycle - the collateral effect is performance, of course, but memory is not a problem in this case. My application used to analyze 8 to 10 images per second (crashing after a minute or so) with the previous design, and now, with this workaround the application is performing 3 to 5 image analysis per second, but still up and running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants