build(vllm-tensorizer): Compile `vllm-flash-attn` from source #70

Eta0 · 2024-06-15T01:12:32Z

Compile `vllm-flash-attn` from Source

vLLM replaced their usages of the regular flash-attn library with their own vllm-flash-attn fork in vllm-project/vllm#4686, which, as of right now, is fairly easy to compile. This change compiles it from source for compatibility with the ml-containers/torch base images.

This is necessary to enable updating our vllm-tensorizer images to include the newest versions of vLLM.

vLLM replaced their usages of the regular `flash-attn` library with their own `vllm-flash-attn` fork, which, as of right now, is fairly easy to compile. This change compiles it from source for compatibility with the `ml-containers/torch` base images. [skip ci]

sangstar

LGTM. Comments/questions on a thing or two but I don't expect any changes are justified from them.

sangstar · 2024-06-16T13:41:44Z

vllm-tensorizer/Dockerfile

+RUN git clone --filter=blob:none --depth 1 --no-single-branch --no-checkout \
+      https://github.com/vllm-project/flash-attention.git && \
+    cd flash-attention && \
+    git checkout "v${VLLM_FLASH_ATTN_VERSION}" && \


I imagine it's safe to assume the branch names won't ever conflict here and lead to ambiguity, so it's unnecessary to explicitly do git checkout "tags/v${VLLM_FLASH_ATTN_VERSION}"?

sangstar · 2024-06-16T13:42:11Z

vllm-tensorizer/Dockerfile

+      -v --no-cache-dir --no-build-isolation --no-deps \
+      -c /tmp/constraints.txt \
+      ./ && \
+    pip3 uninstall -y vllm-flash-attn


Why is pip uninstalling here necessary?

Eta0 added bug Something isn't working enhancement New feature or request labels Jun 15, 2024

Eta0 requested a review from sangstar June 15, 2024 01:12

Eta0 assigned Eta0 and sangstar Jun 15, 2024

sangstar approved these changes Jun 16, 2024

View reviewed changes

sangstar merged commit 467a303 into ss/vllm-tensorizer Jun 16, 2024

sangstar deleted the es/vllm-tensorizer branch June 16, 2024 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(vllm-tensorizer): Compile `vllm-flash-attn` from source #70

build(vllm-tensorizer): Compile `vllm-flash-attn` from source #70

Eta0 commented Jun 15, 2024

sangstar left a comment

sangstar Jun 16, 2024 •

edited

Loading

sangstar Jun 16, 2024 •

edited

Loading

build(vllm-tensorizer): Compile vllm-flash-attn from source #70

build(vllm-tensorizer): Compile vllm-flash-attn from source #70

Conversation

Eta0 commented Jun 15, 2024

Compile vllm-flash-attn from Source

sangstar left a comment

Choose a reason for hiding this comment

sangstar Jun 16, 2024 • edited Loading

Choose a reason for hiding this comment

sangstar Jun 16, 2024 • edited Loading

Choose a reason for hiding this comment

build(vllm-tensorizer): Compile `vllm-flash-attn` from source #70

build(vllm-tensorizer): Compile `vllm-flash-attn` from source #70

Compile `vllm-flash-attn` from Source

sangstar Jun 16, 2024 •

edited

Loading

sangstar Jun 16, 2024 •

edited

Loading