Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 30 additions & 3 deletions .github/workflows/beam_PreCommit_Flink_Container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ env:
HARNESS_IMAGES_TO_PULL: gcr.io/apache-beam-testing/beam-sdk/beam_go_sdk:latest
JOB_SERVER_IMAGE: gcr.io/apache-beam-testing/beam_portability/beam_flink1.17_job_server:latest
ARTIFACTS_DIR: gs://beam-flink-cluster/beam-precommit-flink-container-${{ github.run_id }}
DOCKER_REGISTRY: gcr.io
DOCKER_REPOSITORY_ROOT: ${{ github.event_name == 'pull_request_target' && 'gcr.io/apache-beam-testing/beam-sdk-pr' || 'gcr.io/apache-beam-testing/beam-sdk' }}
Copy link
Contributor

@Abacn Abacn Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use us.gcr.io/apache-beam-testing/github-actions which is private and used for testing in other places. gcr.io/apache-beam-testing is a public repository publishing snapshot containers

Copy link
Collaborator Author

@akashorabek akashorabek Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Abacn When changing the repository from public gcr.io to private us.gcr.io, there is an error:
RuntimeError: Pipeline flink-tests-python-0317062609_6454c5c2-eb5f-4fed-9c51-3d7e2df3e1bf failed in state FAILED: java.io.IOException: Received exit code 125 for command 'docker run -d --mount type=bind,src=/var/lib/hadoop-yarn/.config/gcloud,dst=/root/.config/gcloud --network=host --env=DOCKER_MAC_CONTAINER=null us.gcr.io/apache-beam-testing/github-actions/beam_python3.9_sdk:2.64.0-SNAPSHOT --id=1-1 --provision_endpoint=localhost:40945'. stderr: Unable to find image 'us.gcr.io/apache-beam-testing/github-actions/beam_python3.9_sdk:2.64.0-SNAPSHOT' locallydocker: Error response from daemon: Head "https://us.gcr.io/v2/apache-beam-testing/github-actions/beam_python3.9_sdk/manifests/2.64.0-SNAPSHOT": denied: Unauthenticated request. Unauthenticated requests do not have permission "artifactregistry.repositories.downloadArtifacts" on resource "projects/apache-beam-testing/locations/us/repositories/us.gcr.io" (or it may not exist). See 'docker run --help'.

Tried different methods, including the command gcloud auth configure-docker us.gcr.io --quiet. The same error occurred when trying to push a snapshot to the private repository, but adding that command to the workflow YAML helped. But now the error happens in the test itself when it tries to pull the image. I ran that command on all Dataproc workers in the flink_cluster.sh file, including for the yarn user, because based on src=/var/lib/hadoop-yarn/.config/gcloud in the error message, it's yarn that runs the test. But that didn't help. It's strange because when I connect to these Dataproc workers locally, they can do a docker pull from us.gcr.io, but when the workflow runs on GitHub Actions, the test fails. Can we, for now, keep using gcr.io(we use beam-sdk-pr instead of beam-sdk and delete the container after tests finish) and create an issue for further investigation of this since it might take longer than expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand, artifact registry permission is not directly related to test misconfig, thanks for investigation. Then please add beam-sdk-pr to

PUBLIC_REPOSITORIES=(beam-sdk beam_portability beamgrafana beammetricssyncjenkins beammetricssyncgithub)
to be included in auto cleaner

Copy link
Collaborator Author

@akashorabek akashorabek Mar 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Abacn Done. Also created ticket to track the Unauthenticated issue

PYTHON_VERSION: 3.9
PYTHON_SDK_IMAGE_TAG: latest

jobs:
beam_PreCommit_Flink_Container:
Expand All @@ -87,7 +91,7 @@ jobs:
github.event_name == 'pull_request_target' ||
github.event.comment.body == 'Run Flink Container PreCommit'
runs-on: [self-hosted, ubuntu-20.04, main]
timeout-minutes: 45
timeout-minutes: 90
name: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
strategy:
matrix:
Expand All @@ -105,6 +109,24 @@ jobs:
uses: ./.github/actions/setup-environment-action
with:
python-version: default
- name: GCloud Docker credential helper
if: ${{ github.event_name == 'pull_request_target' }}
run: |
gcloud auth configure-docker ${{ env.DOCKER_REGISTRY }}
- name: Set PYTHON_SDK_IMAGE_TAG unique variable based on timestamp
if: ${{ github.event_name == 'pull_request_target' }}
run: echo "PYTHON_SDK_IMAGE_TAG=$(date +'%Y%m%d-%H%M%S%N')" >> $GITHUB_ENV
- name: Build and push to registry
if: ${{ github.event_name == 'pull_request_target' }}
uses: ./.github/actions/gradle-command-self-hosted-action
with:
gradle-command: :sdks:python:container:py39:docker
arguments: |
-PpythonVersion=${{ env.PYTHON_VERSION }} \
-Pdocker-repository-root=${{ env.DOCKER_REPOSITORY_ROOT }} \
-Pdocker-tag=${{ env.PYTHON_SDK_IMAGE_TAG }} \
-PuseBuildx \
-Ppush-containers
- name: Prepare test arguments
uses: ./.github/actions/test-arguments-action
with:
Expand Down Expand Up @@ -141,11 +163,11 @@ jobs:
arguments: |
-PloadTest.mainClass=apache_beam.testing.load_tests.combine_test \
-Prunner=FlinkRunner \
'-PloadTest.args=${{ env.beam_PreCommit_Flink_Container_test_arguments_2 }} --job_name=flink-tests-python-${{env.NOW_UTC}}'
'-PloadTest.args=${{ env.beam_PreCommit_Flink_Container_test_arguments_2 }} --environment_config=${{ env.DOCKER_REPOSITORY_ROOT }}/beam_python${{ env.PYTHON_VERSION }}_sdk:${{ env.PYTHON_SDK_IMAGE_TAG }} --job_name=flink-tests-python-${{env.NOW_UTC}}'

# Run a Java Combine load test to verify the Flink container
- name: Run Flink Container Test with Java Combine
timeout-minutes: 10
timeout-minutes: 20
uses: ./.github/actions/gradle-command-self-hosted-action
with:
gradle-command: :sdks:java:testing:load-tests:run
Expand All @@ -158,3 +180,8 @@ jobs:
if: always()
run: |
${{ github.workspace }}/.test-infra/dataproc/flink_cluster.sh delete

- name: Cleanup Python SDK Container
if: ${{ always() && github.event_name == 'pull_request_target' }}
run: |
gcloud container images delete ${{ env.DOCKER_REPOSITORY_ROOT }}/beam_python${{ env.PYTHON_VERSION }}_sdk:${{ env.PYTHON_SDK_IMAGE_TAG }} --force-delete-tags --quiet
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,5 @@
--parallelism=2
--job_endpoint=localhost:8099
--environment_type=DOCKER
--environment_config=gcr.io/apache-beam-testing/beam-sdk/beam_python3.9_sdk:latest
--top_count=10
--runner=PortableRunner
2 changes: 1 addition & 1 deletion .test-infra/tools/stale_dataflow_prebuilt_image_cleaner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ set -euo pipefail
# Clean up private registry (us.gcr.io)
# Images more than 5 day old and not the latest (either has latest label or newest)

PUBLIC_REPOSITORIES=(beam-sdk beam_portability beamgrafana beammetricssyncjenkins beammetricssyncgithub)
PUBLIC_REPOSITORIES=(beam-sdk beam-sdk-pr beam_portability beamgrafana beammetricssyncjenkins beammetricssyncgithub)
PRIVATE_REPOSITORIES=(java-postcommit-it python-postcommit-it jenkins github-actions)
# set as the same as 6-week release period
if [[ $OSTYPE == "linux-gnu"* ]]; then
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -573,7 +573,7 @@ class BeamModulePlugin implements Plugin<Project> {
}

project.ext.useBuildx = {
return project.containerArchitectures() != [project.nativeArchitecture()]
return (project.containerArchitectures() != [project.nativeArchitecture()]) || project.rootProject.hasProperty("useBuildx")
}

/** ***********************************************************************************************/
Expand Down
Loading