Skip to content

update dockerfile for cuda130#1087

Merged
llmc-reviewer merged 1 commit into
mainfrom
dock
May 22, 2026
Merged

update dockerfile for cuda130#1087
llmc-reviewer merged 1 commit into
mainfrom
dock

Conversation

@helloyongyang
Copy link
Copy Markdown
Contributor

No description provided.

@llmc-reviewer llmc-reviewer merged commit 181846b into main May 22, 2026
2 checks passed
@llmc-reviewer llmc-reviewer deleted the dock branch May 22, 2026 07:03
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Dockerfile for a CUDA 13.0 environment, integrating several deep learning and kernel optimization libraries. The review identified several critical issues that will prevent a successful build: conflicting curl flags during the Miniforge installation, the use of non-existent PyTorch versions and repository indices, and incorrectly scoped environment variables for build parallelism. Additionally, the CUDA architecture list for the SpargeAttn-Fix library needs to be updated to include Blackwell (12.0) to ensure proper optimization for the target hardware.

libsoup2.4-dev libnice-dev libopus-dev libvpx-dev libx264-dev libsrtp2-dev libglib2.0-dev libdrm-dev libjpeg-dev libpng-dev \
&& apt-get clean && rm -rf /var/lib/apt/lists/* && git lfs install

RUN curl -fsSL -v -o /app/miniconda.sh -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of the -O (remote-name) flag alongside -o /app/miniconda.sh is problematic. In curl, when both are provided, the -O flag typically takes precedence or causes conflict, saving the file with its original remote name (Miniforge3-Linux-x86_64.sh) in the current directory. This will cause the subsequent chmod and bash commands on lines 16-17 to fail as they expect the file to be at /app/miniconda.sh.

RUN curl -fsSL -v -o /app/miniconda.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"


RUN pip install --no-cache-dir packaging ninja cmake scikit-build-core uv meson ruff pre-commit fastapi uvicorn requests -U

RUN pip install torch==2.11.0 torchvision==0.26.0 torchaudio==2.11.0 --index-url https://download.pytorch.org/whl/cu130
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The specified versions torch==2.11.0, torchvision==0.26.0, and torchaudio==2.11.0 do not currently exist in the official PyTorch repositories (the latest stable is 2.5.x). Additionally, the cu130 wheel index is not yet available on download.pytorch.org. Unless these refer to a private or future internal repository, this command will fail to find the packages during the build process.


COPY lightx2v_kernel /app/lightx2v_kernel

RUN git clone https://github.com/NVIDIA/cutlass.git --depth 1 && cd /app/lightx2v_kernel && MAX_JOBS=32 && CMAKE_BUILD_PARALLEL_LEVEL=4 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The environment variable MAX_JOBS=32 is followed by &&, which makes it a separate shell command. This sets a local shell variable but does not export it to the subsequent uv build command. To ensure the build process respects this limit, the environment variables should be prefixed directly to the command without the && separator.

RUN git clone https://github.com/NVIDIA/cutlass.git --depth 1 && cd /app/lightx2v_kernel && MAX_JOBS=32 CMAKE_BUILD_PARALLEL_LEVEL=4 \

RUN cd q8_kernels && git submodule init && git submodule update && python setup.py install && rm -rf build

RUN git clone https://github.com/ModelTC/SpargeAttn-Fix.git --depth 1
RUN cd SpargeAttn-Fix && TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0" pip install --no-cache-dir --no-build-isolation -v -e .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TORCH_CUDA_ARCH_LIST is missing the architecture for Blackwell (12.0 or 10.0), which appears to be the primary target of this CUDA 13.0 Dockerfile (consistent with the architectures listed in line 51). Without including it here, the SpargeAttn-Fix kernels will not be optimized for the intended hardware.

RUN cd SpargeAttn-Fix && TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0;12.0" pip install --no-cache-dir --no-build-isolation -v -e .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants