Skip to content

Intermittent OCI artifact structure corruption in publish-kustomize-bundle workflow #35

@OscarLlamas6

Description

@OscarLlamas6

Problem Summary

The publish-kustomize-bundle.yaml workflow is producing inconsistent OCI artifact structures when publishing Kustomize bundles. Sometimes the artifact maintains the correct directory structure with subdirectories, and other times it flattens the structure, placing all files in the root directory.

Evidence

Expected Structure (Working Case)

When the workflow executes correctly, the OCI artifact maintains the proper directory structure:

❯ flux pull artifact oci://ghcr.io/datum-cloud/milo-os-com-kustomize:v0.0.0-main-20251205-040933 --output /tmp/oci-check
✔ artifact content extracted to /tmp/oci-check

❯ ls -lR oci-check
oci-check:
total 8
drwxr-x--- 2 oscar oscar 4096 Dec 14 23:37 base
drwxr-x--- 2 oscar oscar 4096 Dec 14 23:37 gateway

oci-check/base:
total 16
-rw-r--r-- 1 oscar oscar 1811 Dec 31  1969 deployment.yaml
-rw-r--r-- 1 oscar oscar  533 Dec 31  1969 http-route.yaml
-rw-r--r-- 1 oscar oscar  255 Dec 31  1969 kustomization.yaml
-rw-r--r-- 1 oscar oscar  495 Dec 31  1969 service.yaml

oci-check/gateway:
total 24
-rw-r--r-- 1 oscar oscar  282 Dec 31  1969 endpoint.yaml
-rw-r--r-- 1 oscar oscar 1216 Dec 31  1969 gateway.yaml
-rw-r--r-- 1 oscar oscar  968 Dec 31  1969 httproute-redirect.yaml
-rw-r--r-- 1 oscar oscar  620 Dec 31  1969 httproute.yaml
-rw-r--r-- 1 oscar oscar  135 Dec 31  1969 kustomization.yaml
-rw-r--r-- 1 oscar oscar   61 Dec 31  1969 namespace.yaml

Incorrect Structure (Failing Case)

Sometimes the same workflow produces a flattened structure with only the base directory contents:

❯ flux pull artifact oci://ghcr.io/datum-cloud/milo-os-com-kustomize:v0.0.0-main-20251205-045012 --output /tmp/oci-check
✔ artifact content extracted to /tmp/oci-check

❯ ls -lR oci-check
oci-check:
total 16
-rw-r--r-- 1 oscar oscar 1889 Dec 31  1969 deployment.yaml
-rw-r--r-- 1 oscar oscar  533 Dec 31  1969 http-route.yaml
-rw-r--r-- 1 oscar oscar   68 Dec 31  1969 kustomization.yaml
-rw-r--r-- 1 oscar oscar  495 Dec 31  1969 service.yaml

Reproduction Details

Affected Workflow Runs

  1. Successful run: https://github.com/datum-cloud/milo-os.com/actions/runs/19952361361/job/57214882319

    • Produced correct structure with base/ and gateway/ directories
  2. Failed run: https://github.com/datum-cloud/milo-os.com/actions/runs/19953022324/job/57216733969

    • Produced flattened structure with only base files
  3. Re-run of failed workflow: https://github.com/datum-cloud/milo-os.com/actions/runs/19953022324/job/58045196175

    • Same workflow re-run produced correct structure

Workflow Configuration

All three executions used identical workflow configuration:

publish-kustomize-bundles:
  permissions:
    id-token: write
    contents: read
    packages: write
  uses: datum-cloud/actions/.github/workflows/publish-kustomize-bundle.yaml@v1.7.4
  with:
    bundle-name: ghcr.io/datum-cloud/milo-os-com-kustomize
    bundle-path: config
    image-overlays: config/base
    image-name: ghcr.io/datum-cloud/milo-os-com
  secrets: inherit

Investigation Findings

Attempted Fixes

  • Flux version pinning: Initially suspected that Flux version variability might be causing the issue, so the Flux version was hardcoded. However, the problem persisted even with a fixed Flux version, ruling this out as the root cause.

Suspected Causes

  1. Kustomize version variability: The workflow downloads the latest Kustomize version on each run, which could introduce inconsistencies
  2. Working directory state: The use of cd commands in the workflow might interact unpredictably with GitHub Actions' filesystem
  3. GitHub Actions runner state: Potential workspace caching or state persistence between runs

Critical Code Section

The issue appears related to this section of the workflow:

- name: Set Image Tags in Kustomize Overlays
  if: ${{ inputs.image-overlays != '' && inputs.image-name != '' }}
  run: |
    # ...
    cd "${overlay_path}"
    kustomize edit set image "${{ inputs.image-name }}=${{ inputs.image-name }}:${TAG}"
    cat kustomization.yaml
    cd - > /dev/null

When kustomize edit set image runs within config/base directory that contains references to ../gateway, it may sometimes resolve and flatten these references.

Impact

  • Unpredictable OCI artifact structure makes downstream deployments unreliable
  • Manual intervention required to verify and potentially re-run failed workflows
  • No clear pattern to predict when the issue will occur

Proposed Solutions

Option 1: Pin Kustomize Version

Replace dynamic version download with a fixed version to ensure consistency across runs.

Option 2: Avoid Directory Changes

Use subshells or absolute paths instead of cd commands to prevent working directory issues.

Option 3: Add Structure Validation

Implement a validation step before flux push artifact to verify directory structure integrity.

Option 4: Alternative Image Update Method

Replace kustomize edit set image with direct file manipulation (e.g., using sed) to avoid potential Kustomize path resolution issues.

Priority

High - This issue causes intermittent production deployment failures and requires manual intervention to resolve.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions