New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImportError: cannot import name 'nvcc' #52
Comments
Are you trying to compile the kernel? are you using the docker image? |
Trying to compile the kernel, having installed TVM from source based on these instructions: |
I would strongly suggest you follow the instructions in the |
Thank you for your suggestion. This particular error was due to python looking at longformer's tvm directory instead of the installed tvm library. I renamed longformer's tvm directory and changed the tvm.module import accordingly but now I get a seg fault! I think I'll take your suggestion re using docker. |
Ah, right, this issue is already mentioned here. So yes, please try the instructions in
Did you add |
No I did not, are you saying that could be source of the seg fault or suggesting I use ipdb for debugging? |
tvm throws segfaults for weird reasons, |
I was finally able to compile from scratch without using the docker image. I had to change some tvm imports in diagonaled_mm_tvm.py to tvm.te, rename longformer's tvm directory to something else, install longformer again, and then be very careful about my path and pythonpath environment variables. I still need to use the newly compiled module in pretraining and make sure it works. |
Glad it is working.
Yes, this is the new API with tvm0.7.x
An easier solution for testing would be to run this unit test to make sure the output of |
Thank you for this pointer. Both the above test and code snippet in readme lead to this error at dlpack.py L40:
Just posting it here while I work on it in case others face the same issue. |
Maybe something changed in the 0.7.dev1 api compared to 0.6.0. If you can reproduce the error in a small example, try asking here https://discuss.tvm.ai/. |
So this happens when I use TVM's load_module to load the compiled kernel, something in TVM doesn't know how to handle longformer's custom ndarray type. When I switch to longformer tvm runtime's load function instead, I get:
I can't tell where _LoadFromFile was supposed to be known from, because it's not in the imports :-? |
I don't think our code has a custom ndarray type.
|
This is the ndarray class I meant: |
got it. TVM doesn't have a small runtime code, so I copied a few of the tvm files into longformer to save the user the need to |
So I completely removed dependency on the small tvm runtime, and always import the whole thing. With this I can successfully compile and load the kernel. The sliding chunks test fails though. The non-zero elements in the tvm results match the sliding chunks results but some blocks of the tvm result tensor have all zero elements where sliding chunks gives normal non-zero ones. |
It's weird that I don't get the all-zero blocks consistently. I run the same line |
ha, interesting. Does that happen with the code out-of-the-box or only with the kernel you compiled? |
I never ran the code out of the box since I started with a different tvm version. |
is there an easy way for me to reproduce it? or, can you try it on a very small example and show me how the zero pattern looks like |
N needs to be more than 16 to reproduce this, so I went with N=20 to be divisible by my w*2, setting w=2, M=4, B=D=H=1
|
I copied it from the tutorial and didn't carefully think about how it works. Are you trying to make it faster? The conversation here should be helpful. |
No, I'm trying to debug the block of zeros. |
If you are suspecting it is the scheduler, you can replace the whole scheduler with a naive one for debugging, something like:
|
btw, is this fp16 or fp32? I am asking because there used to be a bug in the codegen of fp16 |
I'm passing |
Right now we don't know where the bug is. In could be in our code, your code, or in TVM itself.
|
I tried the code out of the box, tests pass. So we know the bug is in the splitting and binding. |
Does it still happen randomly or is it consistently breaking? The next step would be to find a minimal example that reproduces the bug and post it on the TVM forum. |
Btw, this is the reason I am using a specific version of TVM, because it changes a lot and things break after they were working |
I have now isolated the issue to this split and the corresponding binds: The split on axis 1 and the reduce_axis split are fine; tests pass with them. Thank you for your help so far. |
Will close this issue for now. Please reopen if you have other questions. |
from tvm.contrib import nvcc
ImportError: cannot import name 'nvcc'
I get this when trying to compile the kernel from scratch. Did I miss something in the cmake config? I can import a lot of TVM modules but not nvcc.
My cuda version is: Cuda compilation tools, release 10.0, V10.0.130
The text was updated successfully, but these errors were encountered: