You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was building vllm off-line with clones of CUTLASS and flash-attention. flash-attention (setup.py) does a "git submodule update" to populate the CUTLASS include files it needs. This is problematic for an off-line install. It would be better if it payed attention to VLLM_CUTLASS_SRC_DIR or something like that.
A simple work around is to just copy the CUTLASS include tree into flash-attention csrc/cutlass subdirectory before building vllm.
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.