-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid instrument some events #79
Comments
Hi, Instrumenting CUDA events should not be that expensive. I am surprised. What is the total time beginning-to-end of the application running natively vs. running with the tool above? I would expect pretty similar time since there is no instrumentation of CUDA functions, but if there is a huge overhead then something is not right. Maybe we have some inefficient debug code turned on in the NVBit core. Thanks for reporting this. |
The total time without the instrumentation is ~5s, counting the rand time. When the instrumentation is applied, the execution time is ~90s.
I think that the overhead is coming from the number of events calls. Is it normal to have 112614 calls to a single type of event? |
cuModuleGetFunction is used by the driver when loading CUDA functions. |
Thanks @ovilla. In the meantime, do you know any quick solution that I can apply to at least try to reduce the overhead? |
Hi
I am trying to instrument applications that use Pytorch. However, I`m facing some problems with the overhead that NVBIT adds.
I have created a simple example (simple_conv.py) below:
Then to measure the overhead added by NVBIT functions call, I created the following dummy (dummy.so). The makefile is based on the mov_replace tool from the NVBIT repository. It is expected to count the number of calls for each event that NVBIT instruments.
I build the dummy.so with CUDA 11.3, GCC 7.5.0, NVBIT 1.5.0, and I run on a Titan V GPU.
I run the tool with the following command:
The result that I got is
I see lots of calls for event 23 (cuModuleGetFunction), which increases the overhead of the NVBIT by a lot.
Is there a way to tell NVBIT to avoid instrumenting some events to prevent unnecessary overhead?
Thanks
The text was updated successfully, but these errors were encountered: