[plugin][CI/CD] Add CI/CD for OOT and enable OOT docker release by zejunchen-zejun · Pull Request #320 · ROCm/ATOM

zejunchen-zejun · 2026-03-12T15:18:31Z

Design RFC: #255

Copilot

Pull request overview

This PR adds CI/CD coverage for ATOM’s vLLM OOT (out-of-tree) plugin workflow and extends the Docker release pipeline to optionally build/push an OOT vLLM image, while consolidating OOT image build logic into the main multi-stage docker/Dockerfile.

Changes:

Add new OOT CI workflows (per-PR/scheduled “OOT Test” + manual “Full Validation”) and a shared OOT test script for launching vLLM + running GSM8K accuracy checks.
Convert docker/Dockerfile into a multi-stage build with a dedicated oot_image stage and update the nightly docker release workflow to optionally publish OOT images.
Add plugin-mode unit tests for framework selection, env-flag behavior, and vLLM→ATOM config translation.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`tests/plugin/test_plugin_mode_status.py`	Adds unit tests for plugin framework backbone selection and mode helpers.
`tests/plugin/test_plugin_env_flags.py`	Adds unit test verifying `ATOM_DISABLE_VLLM_PLUGIN` disables platform/registration behavior.
`tests/plugin/test_plugin_config_translation.py`	Adds unit tests for translating vLLM config to ATOM config in plugin mode.
`.github/scripts/atom_oot_test.sh`	New helper script to launch vLLM and run GSM8K accuracy + threshold gating.
`.github/workflows/atom-vllm-oot-test.yaml`	New per-PR/push/scheduled OOT workflow building OOT image and running plugin UT + GSM8K accuracy.
`.github/workflows/atom-vllm-oot-full-test.yaml`	New workflow-dispatch full validation across multiple models/runners.
`.github/workflows/atom-test.yaml`	Fixes `ATOM_BASE_NIGTHLY_IMAGE` typo to `ATOM_BASE_NIGHTLY_IMAGE`.
`.github/workflows/docker-release.yaml`	Adds optional inputs/steps to build and push OOT vLLM Docker images; builds `atom_image` stage explicitly.
`docker/Dockerfile`	Introduces `oot_image` stage (vLLM build/install + deps) and renames original flow to `atom_image` stage.
`docker/plugin/Dockerfile_OOT_vLLM`	Removes old dedicated OOT vLLM Dockerfile in favor of the consolidated multi-stage Dockerfile.
`docker/plugin/build_OOT_vLLM.sh`	Removes old local OOT image build script (replaced by consolidated build paths).
`oot_ut_changes.patch`	New committed patch file capturing diffs (appears redundant with PR content).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/atom-vllm-oot-test.yaml

.github/scripts/atom_oot_test.sh

docker/Dockerfile

oot_ut_changes.patch

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/atom-vllm-oot-test.yaml

docker/Dockerfile

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docker/Dockerfile

.github/workflows/atom-vllm-oot-test.yaml

.github/workflows/atom-vllm-oot-full-test.yaml

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docker/Dockerfile

.github/workflows/atom-vllm-oot-test.yaml

.github/workflows/atom-test.yaml

zejunchen-zejun · 2026-03-13T07:51:13Z

Hi, @valarLip @gyohuangxin @wuhuikx

Could you help review this PR? For now, the OOT CI has been established and Kimi-K2 can pass this OOT CI.

@gyohuangxin For now I am testing nightly OOT CI release

Thank you

gyohuangxin · 2026-03-13T08:06:12Z

.github/workflows/atom-vllm-oot-full-test.yaml

+jobs:
+  build-oot-image:
+    name: Build OOT validation image
+    runs-on: linux-atom-mi355-1


please use build-only-atom runner to build images.

gyohuangxin · 2026-03-13T08:08:36Z

.github/workflows/atom-vllm-oot-test.yaml

+          - model_name: "Kimi-K2-Thinking-MXFP4"
+            model_path: "amd/Kimi-K2-Thinking-MXFP4"
+            accuracy_test_threshold: 0.90
+            runner: atom-mi355-8gpu.predownload


Does it need 8 gpus to run kimi-k2-thinking model?

Will use runner: linux-atom-mi355-4 for 4-GPUs

gyohuangxin · 2026-03-13T08:11:15Z

.github/workflows/atom-vllm-oot-test.yaml

+      - name: Build ATOM base image
+        run: |
+          cat <<EOF > Dockerfile.mod
+          FROM ${{ env.ATOM_BASE_NIGHTLY_IMAGE }}
+          RUN pip install -U lm-eval[api]
+          RUN pip show lm-eval || true
+          RUN pip install hf_transfer
+          RUN pip show hf_transfer || true
+          RUN echo "=== Aiter version BEFORE uninstall ===" && pip show amd-aiter || true
+          RUN pip uninstall -y amd-aiter
+          RUN pip install --upgrade "pybind11>=3.0.1"
+          RUN pip show pybind11
+          RUN rm -rf /app/aiter-test
+          RUN git clone --depth 1 https://github.com/ROCm/aiter.git /app/aiter-test && \\
+              cd /app/aiter-test && \\
+              git checkout HEAD && \\
+              git submodule sync && git submodule update --init --recursive && \\
+              MAX_JOBS=64 PREBUILD_KERNELS=0 GPU_ARCHS=gfx950 python3 setup.py develop
+          RUN echo "=== Aiter version AFTER installation ===" && pip show amd-aiter || true
+          RUN echo "=== ATOM version BEFORE uninstall ===" && pip show atom || true
+          RUN pip uninstall -y atom
+          RUN rm -rf /app/ATOM
+          RUN git clone ${{ env.GITHUB_REPO_URL }} /app/ATOM && \\
+              cd /app/ATOM && \\
+              git checkout ${{ env.GITHUB_COMMIT_SHA }} && \\
+              pip install -e .
+          RUN echo "=== ATOM version AFTER installation ===" && pip show atom || true
+          EOF
+


Do we still need build an ATOM image before an OOT image?

Yes, in most cases we need to build the ATOM image, so I move the OOT build after the ATOM build
But for image release, I introduce an argument SKIP_ATOM_NATIVE_BUILD option to skip the ATOM image build because when we want to release nightly image, we need to build the ATOM image and push it to hub firstly, then for OOT image, we can reuse the ATOM image and do incremental build for vLLM

zejunchen-zejun · 2026-03-13T08:59:47Z

docker/Dockerfile

-FROM $BASE_IMAGE
+# --------------------------------------------------------------------
+# OOT image stage: extends an ATOM base image with vLLM + OOT deps.
+# Build with: docker build --target oot_image --build-arg BASE_IMAGE=...


use an argument to control if building proceed for ATOM OOT image

enable OOT docker release Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/scripts/atom_oot_test.sh

docker/Dockerfile

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/atom-vllm-oot-test.yaml

+
+  atom-vllm-oot:
+    needs: [pre-checks]
+    if: ${{ needs.pre-checks.result == 'success' && (!github.event.pull_request || github.event.pull_request.draft == false) }}


.github/workflows/atom-vllm-oot-test.yaml

+      - name: Clean up containers and workspace
+        run: |
+          echo "=== Cleaning up containers on $(hostname) ==="
+          containers=$(docker ps -q)
+          if [ -n "$containers" ]; then
+            docker kill $containers || true
+          fi
+          docker rm -f "$CONTAINER_NAME" 2>/dev/null || true
+          docker run --rm -v "${GITHUB_WORKSPACE:-$PWD}":/workspace -w /workspace --privileged rocm/pytorch:latest bash -lc "ls -la /workspace/ && rm -rf /workspace/*" || true
+


docker/Dockerfile

+RUN if [ "${BUILD_OOT_IMAGE}" != "true" ]; then exit 0; fi && \
+    echo "========== [OOT 1/7] Prepare build tools ==========" && \
+    apt-get update && \
+    apt --fix-broken install -y && \
+    apt-get install -y --no-install-recommends ca-certificates ninja-build vim && \
+    mkdir -p /usr/local/bin && \
+    ln -sf "$(command -v ninja)" /usr/local/bin/ninja && \
+    /usr/local/bin/ninja --version && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN if [ "${BUILD_OOT_IMAGE}" != "true" ]; then exit 0; fi && \
+    echo "========== [OOT 2/7] Verify base packages (atom/aiter/mori) ==========" && \
+    "${VENV_PYTHON}" -m pip show atom || true && \
+    "${VENV_PYTHON}" -m pip show amd-aiter || true && \
+    "${VENV_PYTHON}" -m pip show mori || true
+
+RUN if [ "${BUILD_OOT_IMAGE}" != "true" ]; then exit 0; fi && \
+    echo "========== [OOT 3/7] Clone vLLM ==========" && \
+    rm -rf /app/vllm && \
+    git clone "${VLLM_REPO}" /app/vllm && \
+    cd /app/vllm && \


.github/workflows/atom-vllm-oot-full-test.yaml

+jobs:
+  build-oot-image:
+    name: Build OOT validation image
+    runs-on: build-only-atom


Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docker/Dockerfile

+RUN if [ "${BUILD_OOT_IMAGE}" != "true" ]; then exit 0; fi && \
+    echo "========== [OOT 1/7] Prepare build tools ==========" && \
+    apt-get update && \
+    apt --fix-broken install -y && \
+    apt-get install -y --no-install-recommends ca-certificates ninja-build vim && \
+    mkdir -p /usr/local/bin && \
+    ln -sf "$(command -v ninja)" /usr/local/bin/ninja && \
+    /usr/local/bin/ninja --version && \
+    rm -rf /var/lib/apt/lists/*
+
+RUN if [ "${BUILD_OOT_IMAGE}" != "true" ]; then exit 0; fi && \
+    echo "========== [OOT 2/7] Verify base packages (atom/aiter/mori) ==========" && \
+    "${VENV_PYTHON}" -m pip show atom || true && \
+    "${VENV_PYTHON}" -m pip show amd-aiter || true && \
+    "${VENV_PYTHON}" -m pip show mori || true
+
+RUN if [ "${BUILD_OOT_IMAGE}" != "true" ]; then exit 0; fi && \
+    echo "========== [OOT 3/7] Clone vLLM ==========" && \
+    rm -rf /app/vllm && \
+    git clone "${VLLM_REPO}" /app/vllm && \
+    cd /app/vllm && \


.github/workflows/atom-vllm-oot-test.yaml

+
+  atom-vllm-oot:
+    needs: [pre-checks]
+    if: ${{ needs.pre-checks.result == 'success' && (!github.event.pull_request || github.event.pull_request.draft == false) }}


.github/workflows/atom-vllm-oot-test.yaml

+            docker kill $containers || true
+          fi
+          docker rm -f "$CONTAINER_NAME" 2>/dev/null || true
+          docker run --rm -v "${GITHUB_WORKSPACE:-$PWD}":/workspace -w /workspace --privileged rocm/pytorch:latest bash -lc "ls -la /workspace/ && rm -rf /workspace/*" || true


Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/atom-vllm-oot-test.yaml

+
+  atom-vllm-oot:
+    needs: [pre-checks]
+    if: ${{ needs.pre-checks.result == 'success' && (!github.event.pull_request || github.event.pull_request.draft == false) }}


docker/Dockerfile

+RUN if [ "${BUILD_OOT_IMAGE}" != "true" ]; then exit 0; fi && \
+    echo "========== [OOT 1/7] Prepare build tools ==========" && \
+    apt-get update && \
+    apt --fix-broken install -y && \
+    apt-get install -y --no-install-recommends ca-certificates ninja-build vim && \
+    mkdir -p /usr/local/bin && \
+    ln -sf "$(command -v ninja)" /usr/local/bin/ninja && \
+    /usr/local/bin/ninja --version && \
+    rm -rf /var/lib/apt/lists/*


Copilot AI review requested due to automatic review settings March 12, 2026 15:18

zejunchen-zejun mentioned this pull request Mar 12, 2026

[Plugin][CI/CD] establish CI/CD and add workflow for ATOM OOT #301

Closed

Copilot started reviewing on behalf of zejunchen-zejun March 12, 2026 15:19 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 13, 2026 01:22

Copilot started reviewing on behalf of zejunchen-zejun March 13, 2026 01:24 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

.github/workflows/atom-vllm-oot-test.yaml Show resolved Hide resolved

.github/workflows/atom-vllm-oot-test.yaml Show resolved Hide resolved

.github/workflows/atom-vllm-oot-test.yaml Outdated Show resolved Hide resolved

docker/Dockerfile Outdated Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 13, 2026 03:57

Copilot started reviewing on behalf of zejunchen-zejun March 13, 2026 03:58 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 13, 2026 06:12

Copilot started reviewing on behalf of zejunchen-zejun March 13, 2026 06:13 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

docker/Dockerfile Show resolved Hide resolved

docker/Dockerfile Outdated Show resolved Hide resolved

.github/workflows/atom-vllm-oot-test.yaml Outdated Show resolved Hide resolved

.github/workflows/atom-test.yaml Show resolved Hide resolved

gyohuangxin reviewed Mar 13, 2026

View reviewed changes

zejunchen-zejun commented Mar 13, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 13, 2026 10:57

zejunchen-zejun added 9 commits March 13, 2026 18:58

[plugin][CI/CD/Docker release] Add CI/CD for OOT and

00d3ec8

enable OOT docker release Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

remove patch file

a8e5479

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

e3ecb60

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

1454e8b

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

ebb84ec

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

d40808d

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

b992bb1

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

8af8d9b

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

37e19c7

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot started reviewing on behalf of zejunchen-zejun March 13, 2026 10:58 View session

zejunchen-zejun force-pushed the zejun/establish_oot_ci_cd_docker_release branch from 6fc865f to 37e19c7 Compare March 13, 2026 10:58

Copilot AI reviewed Mar 13, 2026

View reviewed changes

.github/scripts/atom_oot_test.sh Outdated Show resolved Hide resolved

docker/Dockerfile Show resolved Hide resolved

docker/Dockerfile Show resolved Hide resolved

add

d9e7572

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

31f7177

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot AI review requested due to automatic review settings March 14, 2026 07:24

Copilot started reviewing on behalf of zejunchen-zejun March 14, 2026 07:26 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

zejunchen-zejun added 2 commits March 14, 2026 16:17

add

7e73015

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

e63017e

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot AI review requested due to automatic review settings March 14, 2026 09:19

Copilot started reviewing on behalf of zejunchen-zejun March 14, 2026 09:21 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

zejunchen-zejun added 2 commits March 14, 2026 18:17

add

6f038f8

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

add

032bf4a

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot AI review requested due to automatic review settings March 14, 2026 10:43

Copilot started reviewing on behalf of zejunchen-zejun March 14, 2026 10:44 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Conversation

zejunchen-zejun commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zejunchen-zejun commented Mar 13, 2026

Uh oh!

gyohuangxin Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

zejunchen-zejun Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gyohuangxin Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

zejunchen-zejun Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

gyohuangxin Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

zejunchen-zejun Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zejunchen-zejun Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

zejunchen-zejun commented Mar 12, 2026 •

edited

Loading

zejunchen-zejun Mar 13, 2026 •

edited

Loading