Problem Summary
The publish-kustomize-bundle.yaml workflow is producing inconsistent OCI artifact structures when publishing Kustomize bundles. Sometimes the artifact maintains the correct directory structure with subdirectories, and other times it flattens the structure, placing all files in the root directory.
Evidence
Expected Structure (Working Case)
When the workflow executes correctly, the OCI artifact maintains the proper directory structure:
❯ flux pull artifact oci://ghcr.io/datum-cloud/milo-os-com-kustomize:v0.0.0-main-20251205-040933 --output /tmp/oci-check
✔ artifact content extracted to /tmp/oci-check
❯ ls -lR oci-check
oci-check:
total 8
drwxr-x--- 2 oscar oscar 4096 Dec 14 23:37 base
drwxr-x--- 2 oscar oscar 4096 Dec 14 23:37 gateway
oci-check/base:
total 16
-rw-r--r-- 1 oscar oscar 1811 Dec 31 1969 deployment.yaml
-rw-r--r-- 1 oscar oscar 533 Dec 31 1969 http-route.yaml
-rw-r--r-- 1 oscar oscar 255 Dec 31 1969 kustomization.yaml
-rw-r--r-- 1 oscar oscar 495 Dec 31 1969 service.yaml
oci-check/gateway:
total 24
-rw-r--r-- 1 oscar oscar 282 Dec 31 1969 endpoint.yaml
-rw-r--r-- 1 oscar oscar 1216 Dec 31 1969 gateway.yaml
-rw-r--r-- 1 oscar oscar 968 Dec 31 1969 httproute-redirect.yaml
-rw-r--r-- 1 oscar oscar 620 Dec 31 1969 httproute.yaml
-rw-r--r-- 1 oscar oscar 135 Dec 31 1969 kustomization.yaml
-rw-r--r-- 1 oscar oscar 61 Dec 31 1969 namespace.yaml
Incorrect Structure (Failing Case)
Sometimes the same workflow produces a flattened structure with only the base directory contents:
❯ flux pull artifact oci://ghcr.io/datum-cloud/milo-os-com-kustomize:v0.0.0-main-20251205-045012 --output /tmp/oci-check
✔ artifact content extracted to /tmp/oci-check
❯ ls -lR oci-check
oci-check:
total 16
-rw-r--r-- 1 oscar oscar 1889 Dec 31 1969 deployment.yaml
-rw-r--r-- 1 oscar oscar 533 Dec 31 1969 http-route.yaml
-rw-r--r-- 1 oscar oscar 68 Dec 31 1969 kustomization.yaml
-rw-r--r-- 1 oscar oscar 495 Dec 31 1969 service.yaml
Reproduction Details
Affected Workflow Runs
-
Successful run: https://github.com/datum-cloud/milo-os.com/actions/runs/19952361361/job/57214882319
- Produced correct structure with
base/ and gateway/ directories
-
Failed run: https://github.com/datum-cloud/milo-os.com/actions/runs/19953022324/job/57216733969
- Produced flattened structure with only base files
-
Re-run of failed workflow: https://github.com/datum-cloud/milo-os.com/actions/runs/19953022324/job/58045196175
- Same workflow re-run produced correct structure
Workflow Configuration
All three executions used identical workflow configuration:
publish-kustomize-bundles:
permissions:
id-token: write
contents: read
packages: write
uses: datum-cloud/actions/.github/workflows/publish-kustomize-bundle.yaml@v1.7.4
with:
bundle-name: ghcr.io/datum-cloud/milo-os-com-kustomize
bundle-path: config
image-overlays: config/base
image-name: ghcr.io/datum-cloud/milo-os-com
secrets: inherit
Investigation Findings
Attempted Fixes
- Flux version pinning: Initially suspected that Flux version variability might be causing the issue, so the Flux version was hardcoded. However, the problem persisted even with a fixed Flux version, ruling this out as the root cause.
Suspected Causes
- Kustomize version variability: The workflow downloads the latest Kustomize version on each run, which could introduce inconsistencies
- Working directory state: The use of
cd commands in the workflow might interact unpredictably with GitHub Actions' filesystem
- GitHub Actions runner state: Potential workspace caching or state persistence between runs
Critical Code Section
The issue appears related to this section of the workflow:
- name: Set Image Tags in Kustomize Overlays
if: ${{ inputs.image-overlays != '' && inputs.image-name != '' }}
run: |
# ...
cd "${overlay_path}"
kustomize edit set image "${{ inputs.image-name }}=${{ inputs.image-name }}:${TAG}"
cat kustomization.yaml
cd - > /dev/null
When kustomize edit set image runs within config/base directory that contains references to ../gateway, it may sometimes resolve and flatten these references.
Impact
- Unpredictable OCI artifact structure makes downstream deployments unreliable
- Manual intervention required to verify and potentially re-run failed workflows
- No clear pattern to predict when the issue will occur
Proposed Solutions
Option 1: Pin Kustomize Version
Replace dynamic version download with a fixed version to ensure consistency across runs.
Option 2: Avoid Directory Changes
Use subshells or absolute paths instead of cd commands to prevent working directory issues.
Option 3: Add Structure Validation
Implement a validation step before flux push artifact to verify directory structure integrity.
Option 4: Alternative Image Update Method
Replace kustomize edit set image with direct file manipulation (e.g., using sed) to avoid potential Kustomize path resolution issues.
Priority
High - This issue causes intermittent production deployment failures and requires manual intervention to resolve.
Problem Summary
The
publish-kustomize-bundle.yamlworkflow is producing inconsistent OCI artifact structures when publishing Kustomize bundles. Sometimes the artifact maintains the correct directory structure with subdirectories, and other times it flattens the structure, placing all files in the root directory.Evidence
Expected Structure (Working Case)
When the workflow executes correctly, the OCI artifact maintains the proper directory structure:
Incorrect Structure (Failing Case)
Sometimes the same workflow produces a flattened structure with only the
basedirectory contents:Reproduction Details
Affected Workflow Runs
Successful run: https://github.com/datum-cloud/milo-os.com/actions/runs/19952361361/job/57214882319
base/andgateway/directoriesFailed run: https://github.com/datum-cloud/milo-os.com/actions/runs/19953022324/job/57216733969
Re-run of failed workflow: https://github.com/datum-cloud/milo-os.com/actions/runs/19953022324/job/58045196175
Workflow Configuration
All three executions used identical workflow configuration:
Investigation Findings
Attempted Fixes
Suspected Causes
cdcommands in the workflow might interact unpredictably with GitHub Actions' filesystemCritical Code Section
The issue appears related to this section of the workflow:
When
kustomize edit set imageruns withinconfig/basedirectory that contains references to../gateway, it may sometimes resolve and flatten these references.Impact
Proposed Solutions
Option 1: Pin Kustomize Version
Replace dynamic version download with a fixed version to ensure consistency across runs.
Option 2: Avoid Directory Changes
Use subshells or absolute paths instead of
cdcommands to prevent working directory issues.Option 3: Add Structure Validation
Implement a validation step before
flux push artifactto verify directory structure integrity.Option 4: Alternative Image Update Method
Replace
kustomize edit set imagewith direct file manipulation (e.g., usingsed) to avoid potential Kustomize path resolution issues.Priority
High - This issue causes intermittent production deployment failures and requires manual intervention to resolve.