Skip to content

Content sync always reports all files as changed on fresh CI runner #17

@sonupreetam

Description

@sonupreetam

Problem

The weekly sync-content-check.yml workflow creates automated PRs listing all 32 synced files as "changed" every run, even when upstream doc content is byte-for-byte identical to the previous week.

Evidence: PR #15 and PR #16 list the exact same 32 files across the same 5 repositories — only the pinned SHAs differ. The has_changes gate introduced in PR #10 is not filtering these out.

Root Cause

writeFileSafe in cmd/sync-content/sync.go detects unchanged files by comparing against existing content on disk:

func writeFileSafe(path string, data []byte) (bool, error) {
	existing, err := os.ReadFile(path)
	if err == nil && bytes.Equal(existing, data) {
		return false, nil
	}
	// ...
	return true, os.WriteFile(path, data, 0o644)
}

However, content/docs/projects/*/ is gitignored — these files are never committed. On a fresh GitHub Actions runner, os.ReadFile always returns "file not found", so writeFileSafe always returns written = true, and every file is recorded as "changed."

Design Constraint

The architecture is intentionally ephemeral: no cached content, no committed doc files, everything fetched fresh on the runner. Any fix must stay within this model.

Possible Approaches (needs research)

  1. Workflow reorder (no code changes): Run --write with the old lock first to establish a baseline on disk, then --update-lock, then --write again with the new lock so writeFileSafe can compare. Trade-off: 3 invocations instead of 2 (more API calls, but negligible at weekly cadence).

  2. Dual-lock comparison in code: Accept both old and new lock files in a single --write pass; fetch content at both SHAs and compare in memory. Trade-off: more complex code, but single invocation.

  3. Content hash in lockfile: Store a content hash per file in .content-lock.json alongside the branch SHA; compare hashes before fetching. Trade-off: lockfile grows, but avoids re-fetching unchanged content entirely.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions