An illegal memory access was encountered #121

z562 · 2023-02-08T06:36:51Z

Thanks for your great work!

I am evaluating the performance of Flash Attention on the text-to-speech task under the Fairseq framework. However, an error occurred when I replaced vanilla causal self attention with flash attention for inference. The error log is as follows:

[W CUDAGuardImpl.h:112] Warning: CUDA warning: an illegal memory access was encountered (function destroyEvent)                                                       | 0/15 [00:00<?, ?it/s]terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:1055 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fe48987da22 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x10aa3 (0x7fe489adeaa3 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a7 (0x7fe489ae0147 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fe4898675a4 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: std::vector<c10d::Reducer::Bucket, std::allocator<c10d::Reducer::Bucket> >::~vector() + 0x2f9 (0x7fe49e7635a9 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: c10d::Reducer::~Reducer() + 0x276 (0x7fe49e759fd6 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x7fe49e789c92 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fe49ded39d6 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #8: std::_Sp_counted_ptr<c10d::Logger*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x1d (0x7fe49e78e9ad in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fe49ded39d6 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0xdaf48f (0x7fe49e78c48f in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x4ff598 (0x7fe49dedc598 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0x50089e (0x7fe49dedd89e in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #13: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad28d]
frame #14: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d7524]
frame #15: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad077]
frame #16: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x5889fe]
frame #17: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad28d]
frame #18: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d7524]
frame #19: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad28d]
frame #20: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d7524]
frame #21: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bcd88]
frame #22: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0f00]
frame #23: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0f16]
frame #24: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0f16]
frame #25: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0b1c]
frame #26: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4aeacc]
frame #27: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x55c593]
frame #28: _PyEval_EvalFrameDefault + 0x2976 (0x4b5c16 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #29: _PyFunction_FastCallKeywords + 0x106 (0x4c61f6 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #30: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bae2f]
frame #31: _PyEval_EvalFrameDefault + 0xa9e (0x4b3d3e in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #32: _PyFunction_FastCallKeywords + 0x106 (0x4c61f6 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #33: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bae2f]
frame #34: _PyEval_EvalFrameDefault + 0x971 (0x4b3c11 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #35: _PyEval_EvalCodeWithName + 0x201 (0x4b2041 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #36: _PyFunction_FastCallKeywords + 0x29c (0x4c638c in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #37: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bae2f]
frame #38: _PyEval_EvalFrameDefault + 0x15d2 (0x4b4872 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #39: _PyEval_EvalCodeWithName + 0x201 (0x4b2041 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #40: PyEval_EvalCodeEx + 0x39 (0x4b1e39 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #41: PyEval_EvalCode + 0x1b (0x5537fb in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #42: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x56cfa3]
frame #43: PyRun_StringFlags + 0x7b (0x569e3b in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #44: PyRun_SimpleStringFlags + 0x3b (0x569d2b in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #45: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x5490d7]
frame #46: _Py_UnixMain + 0x3c (0x548fec in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #47: __libc_start_main + 0xf5 (0x7fe4ab7ca555 in /lib64/libc.so.6)
frame #48: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x548e9e]

Environment

CUDA:     11.1
GCC:      8.5.0
Python:   3.9.7
PyTorch:  1.9.0
flash-attn: 0.2.8

The text was updated successfully, but these errors were encountered:

tridao · 2023-02-08T06:40:14Z

Thanks for the report.
Could you run it again with CUDA_LAUNCH_BLOCKING=1 as suggested to narrow down which function caused the error?
And could you provide the input shapes and dtype (fp16 or bf16) that were used to call into FlashAttention?

z562 · 2023-02-10T08:54:25Z

Thanks for the response.@tridao
I updated my environment as follows and found that this problem doesn't occur anymore.

CUDA:       11.7
GCC:        10.2.0
PyTorch:    1.13.1
flash-attn: 0.2.8

z562 closed this as completed Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An illegal memory access was encountered #121

An illegal memory access was encountered #121

z562 commented Feb 8, 2023 •

edited

Loading

tridao commented Feb 8, 2023

z562 commented Feb 10, 2023

An illegal memory access was encountered #121

An illegal memory access was encountered #121

Comments

z562 commented Feb 8, 2023 • edited Loading

Environment

tridao commented Feb 8, 2023

z562 commented Feb 10, 2023

z562 commented Feb 8, 2023 •

edited

Loading