Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An illegal memory access was encountered #121

Closed
z562 opened this issue Feb 8, 2023 · 2 comments
Closed

An illegal memory access was encountered #121

z562 opened this issue Feb 8, 2023 · 2 comments

Comments

@z562
Copy link

z562 commented Feb 8, 2023

Thanks for your great work!

I am evaluating the performance of Flash Attention on the text-to-speech task under the Fairseq framework. However, an error occurred when I replaced vanilla causal self attention with flash attention for inference. The error log is as follows:

[W CUDAGuardImpl.h:112] Warning: CUDA warning: an illegal memory access was encountered (function destroyEvent)                                                       | 0/15 [00:00<?, ?it/s]terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:1055 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fe48987da22 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x10aa3 (0x7fe489adeaa3 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a7 (0x7fe489ae0147 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fe4898675a4 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: std::vector<c10d::Reducer::Bucket, std::allocator<c10d::Reducer::Bucket> >::~vector() + 0x2f9 (0x7fe49e7635a9 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: c10d::Reducer::~Reducer() + 0x276 (0x7fe49e759fd6 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: std::_Sp_counted_ptr<c10d::Reducer*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x12 (0x7fe49e789c92 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fe49ded39d6 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #8: std::_Sp_counted_ptr<c10d::Logger*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() + 0x1d (0x7fe49e78e9ad in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 0x46 (0x7fe49ded39d6 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0xdaf48f (0x7fe49e78c48f in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x4ff598 (0x7fe49dedc598 in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0x50089e (0x7fe49dedd89e in /home/xxx/anaconda3/envs/fairseq/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #13: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad28d]
frame #14: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d7524]
frame #15: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad077]
frame #16: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x5889fe]
frame #17: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad28d]
frame #18: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d7524]
frame #19: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4ad28d]
frame #20: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d7524]
frame #21: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bcd88]
frame #22: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0f00]
frame #23: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0f16]
frame #24: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0f16]
frame #25: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4d0b1c]
frame #26: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4aeacc]
frame #27: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x55c593]
frame #28: _PyEval_EvalFrameDefault + 0x2976 (0x4b5c16 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #29: _PyFunction_FastCallKeywords + 0x106 (0x4c61f6 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #30: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bae2f]
frame #31: _PyEval_EvalFrameDefault + 0xa9e (0x4b3d3e in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #32: _PyFunction_FastCallKeywords + 0x106 (0x4c61f6 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #33: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bae2f]
frame #34: _PyEval_EvalFrameDefault + 0x971 (0x4b3c11 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #35: _PyEval_EvalCodeWithName + 0x201 (0x4b2041 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #36: _PyFunction_FastCallKeywords + 0x29c (0x4c638c in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #37: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x4bae2f]
frame #38: _PyEval_EvalFrameDefault + 0x15d2 (0x4b4872 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #39: _PyEval_EvalCodeWithName + 0x201 (0x4b2041 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #40: PyEval_EvalCodeEx + 0x39 (0x4b1e39 in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #41: PyEval_EvalCode + 0x1b (0x5537fb in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #42: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x56cfa3]
frame #43: PyRun_StringFlags + 0x7b (0x569e3b in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #44: PyRun_SimpleStringFlags + 0x3b (0x569d2b in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #45: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x5490d7]
frame #46: _Py_UnixMain + 0x3c (0x548fec in /home/xxx/anaconda3/envs/fairseq/bin/python)
frame #47: __libc_start_main + 0xf5 (0x7fe4ab7ca555 in /lib64/libc.so.6)
frame #48: /home/xxx/anaconda3/envs/fairseq/bin/python() [0x548e9e]

Environment

CUDA:     11.1
GCC:      8.5.0
Python:   3.9.7
PyTorch:  1.9.0
flash-attn: 0.2.8 
@tridao
Copy link
Member

tridao commented Feb 8, 2023

Thanks for the report.
Could you run it again with CUDA_LAUNCH_BLOCKING=1 as suggested to narrow down which function caused the error?
And could you provide the input shapes and dtype (fp16 or bf16) that were used to call into FlashAttention?

@z562
Copy link
Author

z562 commented Feb 10, 2023

Thanks for the response.@tridao
I updated my environment as follows and found that this problem doesn't occur anymore.

CUDA:       11.7
GCC:        10.2.0
PyTorch:    1.13.1
flash-attn: 0.2.8 

@z562 z562 closed this as completed Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants