[feat] update build-env-ldb-toolchain docker images after prebuilt release#389
Conversation
…lease After each successful thirdparty prebuilt release, automatically rebuild and push the apache/doris:build-env-ldb-toolchain-latest and apache/doris:build-env-ldb-toolchain-no-avx2-latest Docker images. The upstream docker/compilation/Dockerfile is fetched at CI time and patched in-place, so any toolchain/dependency updates from apache/doris are automatically inherited without maintaining a local Dockerfile copy. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the existing thirdparty prebuilt release workflow to also rebuild and push the build-env-ldb-toolchain Docker images after a successful prebuilt release, by fetching and patching the upstream Doris compilation Dockerfile at CI time.
Changes:
- Add
workflow_dispatchsupport to allow manual runs. - Introduce an
update-dockerjob that fetches and patches the upstream Doris Dockerfile, then builds/pushes two Docker images (normal + no-avx2) to Docker Hub. - Wire
update-dockerinto the workflow’ssuccess/failurejobs via updatedneeds.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| success: | ||
| name: Success | ||
| needs: [prerelease, build] | ||
| needs: [prerelease, build, update-docker] | ||
| if: needs.prerelease.outputs.should_release == 'true' |
There was a problem hiding this comment.
Adding update-docker to the success job’s needs means the release note will no longer be updated to SUCCESS if the Docker build/push fails (or is skipped), even when the prebuilt artifacts were successfully built and uploaded. If that coupling isn’t intended, consider decoupling the release-status update from the Docker publish (e.g., keep success depending only on prerelease/build, and have a separate status line/section for the image publish).
| if: | | ||
| always() && ( | ||
| (needs.prerelease.outputs.should_release == 'true' && needs.build.result == 'success') || | ||
| (github.event_name == 'workflow_dispatch' && needs.build.result != 'failure') |
There was a problem hiding this comment.
The update-docker job runs on workflow_dispatch when needs.build.result != 'failure', which also includes cancelled/skipped. That can trigger a Docker push even when the build was cancelled or never produced a fresh prebuilt artifact, leading to confusing failures (missing release asset) or publishing an unintended image. Consider tightening the condition to only allow success (or explicitly success || skipped if intentionally reusing an existing release asset), and/or add an explicit check that the expected release asset exists before building/pushing.
| (github.event_name == 'workflow_dispatch' && needs.build.result != 'failure') | |
| (github.event_name == 'workflow_dispatch' && needs.build.result == 'success') |
| run: | | ||
| # Pull the upstream Dockerfile from apache/doris to stay in sync with | ||
| # any toolchain / dependency changes they make. | ||
| curl -fsSL https://raw.githubusercontent.com/apache/doris/master/docker/compilation/Dockerfile \ | ||
| -o Dockerfile.upstream |
There was a problem hiding this comment.
curl pulls the upstream Dockerfile from the moving master branch. That makes the produced images non-reproducible and introduces a supply-chain risk (the build content can change without any change in this repo). Consider pinning to a specific commit SHA (e.g., derived from the Doris version you already compute in prerelease) and/or verifying the fetched content before building.
| run: | | |
| # Pull the upstream Dockerfile from apache/doris to stay in sync with | |
| # any toolchain / dependency changes they make. | |
| curl -fsSL https://raw.githubusercontent.com/apache/doris/master/docker/compilation/Dockerfile \ | |
| -o Dockerfile.upstream | |
| env: | |
| DORIS_DOCKERFILE_COMMIT: 0123456789abcdef0123456789abcdef01234567 | |
| DORIS_DOCKERFILE_SHA256: REPLACE_WITH_VERIFIED_SHA256 | |
| run: | | |
| # Pull the upstream Dockerfile from a pinned apache/doris commit so the | |
| # build input is reproducible and cannot drift with changes on master. | |
| curl -fsSL "https://raw.githubusercontent.com/apache/doris/${DORIS_DOCKERFILE_COMMIT}/docker/compilation/Dockerfile" \ | |
| -o Dockerfile.upstream | |
| printf '%s %s\n' "${DORIS_DOCKERFILE_SHA256}" Dockerfile.upstream | sha256sum -c - |
| # Patch 1: fix epel metalink. | ||
| # The upstream RUN line installs epel-release then immediately runs yum | ||
| # install/clean/makecache, but epel.repo's metalink may be unreliable. | ||
| # Insert a sed fix between epel-release install and the next yum install. | ||
| old_epel = 'yum install epel-release -y && yum install https://packages.endpointdev.com' | ||
| new_epel = ('yum install epel-release -y \\\n' | ||
| ' && sed -i \\\n' | ||
| ' -e \'s/^metalink=/#metalink=/\' \\\n' | ||
| ' -e \'s|^#baseurl=http://download.fedoraproject.org/pub/epel/7|baseurl=https://mirrors.aliyun.com/epel/7|\' \\\n' | ||
| ' /etc/yum.repos.d/epel*.repo \\\n' | ||
| ' && yum install https://packages.endpointdev.com') | ||
| assert old_epel in content, f"Patch 1 failed: target string not found in upstream Dockerfile" | ||
| patched = content.replace(old_epel, new_epel, 1) | ||
| assert patched != content, "Patch 1 was a no-op: upstream Dockerfile may have changed" | ||
|
|
||
| # Patch 2: replace "clone & build thirdparty" block with downloading | ||
| # our prebuilt artifact. The block starts with "# clone lastest source | ||
| # code" comment and ends with "rm -rf ${DEFAULT_DIR}/doris". | ||
| prebuilt_block = ( | ||
| '# Download prebuilt thirdparty from GitHub Release (built by doris-thirdparty automation)\n' | ||
| 'ARG GITHUB_REPOSITORY\n' | ||
| 'RUN mkdir -p /var/local/thirdparty \\\n' | ||
| ' && wget -q "https://github.com/${GITHUB_REPOSITORY}/releases/download/automation/doris-thirdparty-prebuilt-linux-x86_64.tar.xz" \\\n' | ||
| ' -O /tmp/prebuilt.tar.xz \\\n' | ||
| ' && tar -xf /tmp/prebuilt.tar.xz -C /var/local/thirdparty \\\n' | ||
| ' && rm /tmp/prebuilt.tar.xz\n' | ||
| ) | ||
| patched_2 = re.sub( | ||
| r'# clone lastest source code.*?rm -rf \$\{DEFAULT_DIR\}/doris\n', | ||
| prebuilt_block, | ||
| patched, flags=re.DOTALL | ||
| ) | ||
| assert patched_2 != patched, "Patch 2 was a no-op: 'clone lastest source code' block not found in upstream Dockerfile" | ||
| patched = patched_2 |
There was a problem hiding this comment.
The Python patching logic is very brittle: Patch 2 depends on the exact comment text # clone lastest source code and Patch 1 depends on an exact RUN substring. If upstream reformats or fixes typos, this workflow will start failing. Consider making the regex/string matches more resilient (e.g., tolerate latest/lastest, anchor around more stable commands, or use a small patch file applied with patch), so upstream drift causes fewer spurious failures.
| print("=== normal: check ===") | ||
| for line in open('Dockerfile.patched'): | ||
| if any(k in line for k in ['COPY doris', 'build-thirdparty', 'ARG GITHUB', 'prebuilt', 'USE_AVX2']): | ||
| print(line, end='') | ||
| print("=== noavx2: check ===") | ||
| for line in open('Dockerfile.patched-noavx2'): | ||
| if any(k in line for k in ['COPY doris', 'build-thirdparty', 'ARG GITHUB', 'prebuilt', 'USE_AVX2']): | ||
| print(line, end='') |
There was a problem hiding this comment.
The script only prints diagnostic lines if COPY doris / build-thirdparty are still present, but it doesn’t fail the job. Since this workflow never checks out a build context, leaving any COPY instructions in the patched Dockerfile will make the Docker build fail later with a less actionable error. Consider adding assertions that these patterns are absent in Dockerfile.patched/Dockerfile.patched-noavx2 after patching.
| print("=== normal: check ===") | |
| for line in open('Dockerfile.patched'): | |
| if any(k in line for k in ['COPY doris', 'build-thirdparty', 'ARG GITHUB', 'prebuilt', 'USE_AVX2']): | |
| print(line, end='') | |
| print("=== noavx2: check ===") | |
| for line in open('Dockerfile.patched-noavx2'): | |
| if any(k in line for k in ['COPY doris', 'build-thirdparty', 'ARG GITHUB', 'prebuilt', 'USE_AVX2']): | |
| print(line, end='') | |
| diagnostic_patterns = ['COPY doris', 'build-thirdparty', 'ARG GITHUB', 'prebuilt', 'USE_AVX2'] | |
| forbidden_patterns = ['COPY doris', 'build-thirdparty'] | |
| def check_patched_dockerfile(path, label): | |
| print(f"=== {label}: check ===") | |
| with open(path) as f: | |
| lines = f.readlines() | |
| for line in lines: | |
| if any(k in line for k in diagnostic_patterns): | |
| print(line, end='') | |
| for pattern in forbidden_patterns: | |
| assert not any(pattern in line for line in lines), ( | |
| f"Patch validation failed for {path}: found forbidden pattern '{pattern}'; " | |
| "patched Dockerfile still depends on unavailable build context" | |
| ) | |
| check_patched_dockerfile('Dockerfile.patched', 'normal') | |
| check_patched_dockerfile('Dockerfile.patched-noavx2', 'noavx2') |
Summary
apache/doris:build-env-ldb-toolchain-latestandapache/doris:build-env-ldb-toolchain-no-avx2-latestDocker imagesdocker/compilation/Dockerfileis fetched at CI time and patched in-place, so any toolchain/dependency updates from apache/doris are automatically inherited without maintaining a local Dockerfile copyUSE_AVX2=0in the builder ENV blockTest plan
sed/python3patch output locally:COPY dorisandbuild-thirdpartyremoved,ARG GITHUB_REPOSITORYand prebuilt download block correctly injecteddocker buildx build(no push) passed on tc-server-selectdb for both normal and no-avx2 DockerfilesNotes
Requires two secrets to be configured in the repository:
DOCKERHUB_USERNAMEDOCKERHUB_TOKEN(with push access toapache/dorison Docker Hub)🤖 Generated with Claude Code