You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the following to load the AWQ quantized version on the LLama 3 model on a 4 x A100 GCP m/c. I cannot increase the --max-batch-prefill-tokens since I get the CUDA error: an illegal memory access was encountered. I also observe through nvidia-smi that it does not consume the entire GPU memory but still cause the illegal memory access error.
System Info
Hello Team,
I am using the following to load the
AWQ
quantized version on the LLama 3 model on a 4 x A100 GCP m/c. I cannot increase the--max-batch-prefill-tokens
since I get theCUDA error: an illegal memory access was encountered
. I also observe throughnvidia-smi
that it does not consume the entire GPU memory but still cause theillegal memory access error
.The GPUs are not even utilized half way though
Information
Tasks
Reproduction
Steps are provided in the problem desription.
Expected behavior
The Model should load without exceptions.
The text was updated successfully, but these errors were encountered: