You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am evaluating the performance of Flash Attention on the text-to-speech task under the Fairseq framework. However, an error occurred when I replaced vanilla causal self attention with flash attention for inference. The error log is as follows:
[W CUDAGuardImpl.h:112] Warning: CUDA warning: an illegal memory access was encountered (function destroyEvent) | 0/15 [00:00<?, ?it/s]terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:1055 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fe48987da22 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x10aa3 (0x7fe489adeaa3 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a7 (0x7fe489ae0147 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fe4898675a4 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: std::vector<c10d::Reducer::Bucket, std::allocator<c10d::Reducer::Bucket> >::~vector() + 0x2f9 (0x7fe49e7635a9 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: c10d::Reducer::~Reducer() + 0x276 (0x7fe49e759fd6 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x7fe49e789c92 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fe49ded39d6 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #8: std::_Sp_counted_ptr<c10d::Logger*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x1d (0x7fe49e78e9ad in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fe49ded39d6 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0xdaf48f (0x7fe49e78c48f in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x4ff598 (0x7fe49dedc598 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0x50089e (0x7fe49dedd89e in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #13: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad28d]
frame #14: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d7524]
frame #15: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad077]
frame #16: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x5889fe]
frame #17: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad28d]
frame #18: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d7524]
frame #19: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad28d]
frame #20: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d7524]
frame #21: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bcd88]
frame #22: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0f00]
frame #23: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0f16]
frame #24: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0f16]
frame #25: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0b1c]
frame #26: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4aeacc]
frame #27: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x55c593]
frame #28: _PyEval_EvalFrameDefault + 0x2976 (0x4b5c16 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #29: _PyFunction_FastCallKeywords + 0x106 (0x4c61f6 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #30: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bae2f]
frame #31: _PyEval_EvalFrameDefault + 0xa9e (0x4b3d3e in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #32: _PyFunction_FastCallKeywords + 0x106 (0x4c61f6 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #33: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bae2f]
frame #34: _PyEval_EvalFrameDefault + 0x971 (0x4b3c11 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #35: _PyEval_EvalCodeWithName + 0x201 (0x4b2041 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #36: _PyFunction_FastCallKeywords + 0x29c (0x4c638c in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #37: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bae2f]
frame #38: _PyEval_EvalFrameDefault + 0x15d2 (0x4b4872 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #39: _PyEval_EvalCodeWithName + 0x201 (0x4b2041 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #40: PyEval_EvalCodeEx + 0x39 (0x4b1e39 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #41: PyEval_EvalCode + 0x1b (0x5537fb in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #42: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x56cfa3]
frame #43: PyRun_StringFlags + 0x7b (0x569e3b in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #44: PyRun_SimpleStringFlags + 0x3b (0x569d2b in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #45: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x5490d7]
frame #46: _Py_UnixMain + 0x3c (0x548fec in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #47: __libc_start_main + 0xf5 (0x7fe4ab7ca555 in /lib64/libc.so.6)
frame #48: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x548e9e]
Thanks for the report.
Could you run it again with CUDA_LAUNCH_BLOCKING=1 as suggested to narrow down which function caused the error?
And could you provide the input shapes and dtype (fp16 or bf16) that were used to call into FlashAttention?
Thanks for your great work!
I am evaluating the performance of Flash Attention on the text-to-speech task under the Fairseq framework. However, an error occurred when I replaced vanilla causal self attention with flash attention for inference. The error log is as follows:
Environment
The text was updated successfully, but these errors were encountered: