Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory usage after repeated FFI calls to clEnqueueReadBuffer #572

Closed
pgmatg opened this issue Apr 5, 2020 · 1 comment
Closed

Comments

@pgmatg
Copy link

pgmatg commented Apr 5, 2020

Need help understanding why my program keeps leaking memory after multiple calls to clEnqueueReadBuffer from OpenCL api through FFI under 2.1 branch of luajit on windows build with various versions of Microsoft Developer studio , but not 2.0 version of luajit.
Any suggestions or explanation, I'm at a loss.
"cl-demo.lua noimg 10" is very consistent representation that only depends on opencl
cl-demo.lua
requires cl.lua

I run it with Dr. Memory:
Windows version: WinVer=105;Rel=1903;Build=18362;Edition=Core
Dr. Memory results for pid 26048: "luajit.exe"
Application cmdline: "d:\dev\luajit-2.1.0\luajit.exe e:\eda3\source\cl-demo.lua"
''
results.txt

@pgmatg
Copy link
Author

pgmatg commented Apr 17, 2020

I have created a workaround for various problems my program had running on windows 10 and 8.1 with latest updates from Microsoft.
First to prevent luajit abrupt unexplained terminations running various versions of my lua code, I have rebuild luajit 2.1 under Microsoft Developer studio 2008 with alternative optimization options (/Ox /Ot). I was able to use similar compile options under 2015 & 2020 versions of MsDev , but had to add /guard:cf /D_CRTDBG_MAP_ALLOC , resulting in slower execution, up to 32%, and still some very weird sporadic aberrations.
To combat memory leak (over 100mb per fractal image generation) , I had to add collectgarbage() after every cl kernel program completion. And add release , free, and recreate memory buffer for every que of results from OpenCL code execution, which solves most of memory problems, but slows execution depending on size and complexity of running formulas from 19% to 41%.
added code :
clEnqueueReadBuffer(commands, output[jb], cl.CL_TRUE, 0, ressize, results, 0, nil, nil)
...
clReleaseMemObject(output[jbo]);
output[jbo] = nil;
output[jbo] = ffi.gc(clCreateBuffer(context, cl.CL_MEM_WRITE_HOST_PTR, ressize), ffi.C.free);

There still some memory issues when I run my program with OpenGL 3D output utilizing iup libraries, mostly due to garbage collection not catching up, but I can probably solve it by implementing use of 4 parallel threads using same content program space but separate kernels.

Still any suggestions, explanation or corrections would be greatly appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants