Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE WITH GPU ACCELERATION AND LOW MEMORY VIDEO CARDS #88

Closed
apprehensivemom opened this issue May 26, 2023 · 22 comments
Closed

ISSUE WITH GPU ACCELERATION AND LOW MEMORY VIDEO CARDS #88

apprehensivemom opened this issue May 26, 2023 · 22 comments

Comments

@apprehensivemom
Copy link

MULTIPAR v1.3.2.7
WINDOWS 7 Ultimate, 32GB RAM, processor Intel Core I7 7700K
VIDEO CARD NVIDIA GEFORCE GTX760

Running with these options:
SSE3, CLMUL, SSE2 ENABLED
GPU acceleration ENABLED

processing a file of about 462 Mbytes or less RUNS OK (even with GPU enabled)
processing a file of about 471 Mbytes or more GIVES ME THIS ERROR:

Computing file hash
Creating recovery data
Error: 0.0% : Creating recovery slice
If I disable GPU acceleration, the recovery completes with no errors.

I think this issue it is related to low memory in my (obsolete) video card.
Is there a way to detect how much memory can use Multipar?
and if file size > gpu memory do something or at least give the user a warning

Another solution would be:
when the program runs with GPU enabled (almost always because is faster)
in case of this specific GPU error, (give a warning) disable GPU and rerun the data creation

Thank you

@Yutaka-Sawada
Copy link
Owner

Thank you for bug report.

processing a file of about 462 Mbytes or less RUNS OK (even with GPU enabled)
processing a file of about 471 Mbytes or more GIVES ME THIS ERROR:
I think this issue it is related to low memory in my (obsolete) video card.

It seems that GPU function doesn't work on your PC. At first, it uses GPU function, only when the data size is larger than 512 MB. So, GPU function may fail for less data size, too.

When it fails to start GPU, it should resume process without GPU. There may be something bug in my code somewhere. It didn't work intendedly on your PC. I might mistake or forgot to handle a rare case.

At this time, I don't know where is wrong. There are two GPU funtions (for SSD or HDD) in par2j. There are two OpenCL functions (for discrete GPU or integreated GPU). Because I don't put graphics board on my PC now, it's hard to test the behavior of discrete GPU. Though I used GeForce GT 240 to implement OpenCL function ago, it's slower than integreated GPU on current i5 CPU.

If you have time to test, I can make debug version to see OpenCL details. MultiPar saves log file (MultiPar.log) at error, or when the item is checked on MultiPar option window. It exists in "save" folder in MultiPar's install folder. Sending the log file to me by e-mail may help to see how the error occurs.

@apprehensivemom
Copy link
Author

I am ready to help you testing debug versions. write to apprehensivemom AT (remove AT) gmx.com
attached: Multipar.log
MultiPar.log
In this test, GPU acceleration is enabled. A file of
489.053.258 gives me error and a file of
487.989.890 bytes completes successfully (and therefore it's not in the Multipar.log)

@Yutaka-Sawada
Copy link
Owner

I made par2j with debug output. While I tested the behavior of some settings, I found that GPU function didn't work with MMX. Though it doesn't affect your case (SSSE3), I fixed the bug. At least, your post gave me a chance to find one problem. Thank you.

I forced the sample to use GPU always for debug usage. It will try to use GPU for small data. You may compare it with previous smaller data, which you could success.

I put the debug version (par2j_debug_2023-05-27.zip) in "MultiPar_sample" folder on OneDrive. Please test with it. Though it will fail again, it would give something information of what was wrong. As I found a bug today, there may be another more bug.

@apprehensivemom
Copy link
Author

OK. now EVERY file I try, it fails with that error.
Even a 29KB file fails!
Attached: Multipar.log
MultiPar.log

@Yutaka-Sawada
Copy link
Owner

Thank you for tests. From the log, it failed before starting GPU. Or it crashed before print out something. Error might be sudden. If C-runtime library cannot return error code, it might be a hardware error. Such like un-supported CPU command.

Do you see something error message on Event Viewer ? There is "Admin tool" on "Control panel" of Windows 7. It may save information about application error. Error Code or something. The application name is "par2j64.exe".

@apprehensivemom
Copy link
Author

apprehensivemom commented May 27, 2023

Yes, a lot of Error events all very similar, all id=1000
This is the last one

err1.txt

it looks error 0xc0000005 might be an hard disk problem but I did a chkdsk and there are no errors
and now even if I disable GPU from the Multipar panel, it gives me the error
(with the previous release, disabling GPU solved the problem)

@Yutaka-Sawada
Copy link
Owner

The debug version tries to enable GPU always, even when it's not set in option. Error 0xc0000005 (memory access violation) is common problem. I might forget to set address in a pointer somewhere. Or it could not find OpenCL on your PC.

I made new sample to show more steps. Then, I can know where it runs till which step. I put the new one (par2j_debug_2023-05-27a.zip) in "MultiPar_sample" folder on OneDrive. Please test with it. It will fails also, but it may print more step info. I want to close the range of possible bad position.

By the way, do you use other OpenCL applications ago ? Normally, GeForce Driver includes driver for OpenCL. There may be GeForce GPU information about supporting OpenCL version.

@apprehensivemom
Copy link
Author

apprehensivemom commented May 27, 2023

You were right!! My NVIDIA driver was the problem.
If I run the program with the old driver (347.25-desktop-win8-win7-winvista-64bit-international-whql.exe) there's ERROR
then I downloaded a new driver (474.30-desktop-win8-win7-64bit-international.exe) and IT WORKS
As a side effect of the new driver, my 3840x2160 monitor resolution (that was working before the driver change),
now is reduced to max 1600x1200 and I have to find out why

attached: Multipar.log BEFORE and AFTER the driver change.

MultiPar.log

UPDATE:
a little Nvidia driver downgrade to 474.12-desktop-win8-win7-64bit-international.exe and resolution is back to 3840x2160 AND your program WORKS :) :) :)

@Yutaka-Sawada
Copy link
Owner

AND your program WORKS :) :) :)

No, GPU function doesn't work yet. I'm sorry. From the log of successfuled operation, it stated that "Available GPU device was not found.". It detected Intel CPU's OpenCL driver only. So, something seems to be wrong with OpenCL driver for GeForce GPU. Because it could not find GPU as OpenCL device, it switched to use CPU funtion automatically. It didn't use GPU actually.

Do you have a GPU detection tool like "GPU-Z" ? It can get supported OpenCL information, too. I'm interested in how another application recognize OpenCL devices on your PC. If another one fails to find GPU as OpenCL device also, there is a problem in OpenCL driver. If another one finds GPU correctly, I might mistake somewhere.

@apprehensivemom
Copy link
Author

Gpu-Z
ggg

@Yutaka-Sawada
Copy link
Owner

Thank you for information. The NVIDIA GeForce GTX 760 driver supports OpenCL indead. There is a tab menu at the top of Advanced panel. The selected item is general at first. Select OpenCL to see the details. If the listed OpenCL information is same as what in MultiPar.log file, it will be OK. But, my par2j failed on your PC at this time.

There may be a problem in my source code to detect OpenCL device. I found a possible fault in my code. It did not check a pointer of NULL at error, and it might cause "memory access violation". But, it would not happen normally. Anyway, I fixed the mistake.

I put the fixed one (par2j_debug_2023-05-28.zip) in "MultiPar_sample" folder on OneDrive. Please try with it.

@apprehensivemom
Copy link
Author

apprehensivemom commented May 28, 2023

OK. I reinstalled from release MultiPar1327_setup.exe in default location
C:\Program Files (x86)\MultiPar
and I replaced your debug version par2j64.exe and enabled debug in the client.
Here is the savefile, located in C:\Users\user\AppData\Roaming\MultiPar\save\Multipar.log
MultiPar.log

and here are the two additional GPU-Z images

t1
and
t

Hope it helps you :)

@Yutaka-Sawada
Copy link
Owner

Oh, I'm sorry that I forgot to write a note. In the new sample, I enabled the GPU option to compare speed on my PC. You need to check "Enable GPU acceleration" to test GPU function. Source file size should be larger than 32 MB.

@apprehensivemom
Copy link
Author

Yes, I forgot to re-enable the GPU checkbox when I reinstalled everything. Now is enabled
GPUenabled-MultiPar.log

@Yutaka-Sawada
Copy link
Owner

The last tests didn't use GPU. Because GPU needs a few seconds to start, par2j uses GPU for large data only. I set some restrictions like below. Please set some redunduncy for more recovery blocks.

Threshold to use GPU:
Block size must be larger than 64 KB.
Number of source block must be more than 256.
Number of parity block must be more than 32.

@apprehensivemom
Copy link
Author

apprehensivemom commented May 28, 2023

Yes, when multipar was re-installed I forgot to set the redundancy parameter
Now files are big enough 9.12GB and 4.56 GB
BIG-MultiPar.log

@Yutaka-Sawada
Copy link
Owner

Thank you for test many times. It seems to work well at last. Its GPU detection showed same information as GPU-Z. The GPU function finished creation without error.

Though I fixed two bugs in my OpenCL code, I'm not sure that it solved this problem. Re-installing Graphics Board driver might help, also. To confirm the calculation result, you may compare results of GPU enabled and disabled. Mostly using NVIDIA GPU is faster than Intel's integrated GPU or AMD GPU.

While I tested with integrated GPU in my using CPU (Core i5 10400), GPU function is slower than CPU function mostly. I use the GPU function only for testing the behavior. So, I un-check "Enable GPU acceleration" on my PC.

As it works now, I removed debug output from par2j. I put the sample (par2j_sample_2023-05-29.zip) in "MultiPar_sample" folder on OneDrive. If someone wants to see debug output, I put old debug version (par2j_debug_2023-05-28.zip) for a while. The latest par2j (MultiPar package and soruce code) is available on GitHub, too.

par2j selects GPU or CPU automatically by data size, even when a user check "Enable GPU acceleration". It's difficult to see actual GPU usage. When GPU function is used, there is one line in the log like below;
OpenCL : NVIDIA GeForce GTX 760, OpenCL 3.0 CUDA, 256*6
OpenCL : Intel(R) UHD Graphics 630, OpenCL 3.0 NEO , 256*23

@apprehensivemom
Copy link
Author

OK. Today I will totally delete and reinstall the graphic driver using DDU also because previously I had an issue with my 4k monitor resolution, and I will try again to install the last NVIDIA driver. Then using the par2j debug driver I will do more tests and I will write here what happened.

By the way, it would be nice to know if program is using GPU or not, an idea would be the color of the progress bar,
red for GPU, blue for normal processor... even if changing the windows style could change it.... or just a text box with "GPU"
that gets visible/invisible if GPU is active or not. Ok I'm going to work, see you later.

@apprehensivemom
Copy link
Author

ok here I am again. I uninstalled every driver with DDU, I tried again to reinstall the most recent driver 474.30 but no luck,
there was a missing signature in the driver and unfortunately the driver is not anymore updated by nvidia.
474-30
This is a known issue, look_here so I downloaded the previous driver, the 474.11 driver, but unfortunately my Anydesk goes white-screen, and I also had some blue screens 0x0000124 when trying different drivers. In the end I had to put back the 472.12 driver that was working before. Now it is ok Maybe in the next days I will find a better (more recent) driver that works for me.
Anyways, I did the tests again and here is the result, it looks ok doesn't it?
MultiPar.log

@Yutaka-Sawada
Copy link
Owner

Anyways, I did the tests again and here is the result, it looks ok doesn't it?

Yes, it uses GPU.

I see that NVIDIA GeForce supports OpenCL 3.0 now. I added clEnqueueMigrateMemObjects of OpenCL 1.2 to my code. It may start copy before calculation. It tries to copy data from PC's RAM to GPU's VRAM, while CPU calculates something other. If GPU is enough faster than CPU, it may improve GPU start-up speed. I'm not sure the feature is worth to use or not. I made a sample with debug output, which includes clEnqueueMigrateMemObjects implementation. I put the sample (par2j_debug_2023-05-30.zip) in "MultiPar_sample" folder on OneDrive. If someone is interested in their speed difference, he may compare them (par2j_debug_2023-05-28.zip and par2j_debug_2023-05-30.zip). But, I feel that it will hard to see noticeable difference. Normally, memory copy speed isn't so big wait in encoding.

@apprehensivemom
Copy link
Author

OK I did the "homework". I tried par2_debug_2023-05-30
Here are the tests on my old GTX760
31-5-2023-GTX760-W7-MultiPar.log

and I tried par2_debug_2023-05-30 also on my other computer, an Intel 6700K
running W10 with nvidia GTX4070
31-5-2023-RTX4070-W10-MultiPar.log

No problems detected.
If you don't need more tests (I'm very happy to help you) I think you can close the issue.
The only little thing I noticed is that the "normal user" doesn't
have any clue (a visible change in the display interface) if the program decides to
use GPU or not (well, of course there are the logs and the speed difference
between GPU and processor)

@Yutaka-Sawada
Copy link
Owner

No problems detected.

Oh, I see. Thank you for help. I fixed some bugs, though I'm not sure that it solved the problem. There might be a problem in Graphics board driver. Anyway, it works well on your PC now. It's good enough. I will close this issue later.

Because I don't use fast GPU (high-end graphics board for 3D game), the GPU feature isn't optimized yet. It may not be fast with low-end GPUs. It's optional, and is disabled by default. It's not so important to show the GPU usage at this time. If a user wants to see that GPU was used actually, he may read log file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants