Skip to content

fix(sync): preserve explicit remote path for mesh#88

Merged
acmore merged 11 commits into
mainfrom
fix/sync-path-mesh-subdir
Apr 21, 2026
Merged

fix(sync): preserve explicit remote path for mesh#88
acmore merged 11 commits into
mainfrom
fix/sync-path-mesh-subdir

Conversation

@acmore
Copy link
Copy Markdown
Owner

@acmore acmore commented Apr 20, 2026

Summary

  • preserve the configured sync remote path when mesh reconfigures hub and receiver Syncthing folders
  • add unit coverage for mesh folder path selection and update the mesh PyTorchJob e2e to exercise /workspace/a
  • assert the mesh e2e does not create an unexpected local a/ directory when syncing to a remote subdirectory

Test Plan

  • go test ./internal/cli
  • .venv/bin/pre-commit run --all-files --hook-stage manual okdev-gofmt
  • bash -n scripts/e2e_kind_pytorchjob.sh

acmore and others added 11 commits April 20, 2026 17:03
The initial sync could report "complete" prematurely when the local
syncthing folder hadn't finished its first scan. An empty index
trivially satisfies "remote has everything local has" (0/0 = 100%),
causing okdev up to proceed before files were actually transferred.

Add waitForLocalFolderScan() that polls /rest/db/status until the
folder state transitions out of "scanning". Called in both
runTwoPhaseInitialSync and waitForInitialSync to ensure the index
is populated before trusting completion values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous scan-wait fix was insufficient: after local scan completes,
the remote might still report needBytes=0 because it hasn't received
the local's index yet (doesn't know about any files).

Now both runTwoPhaseInitialSync and waitForInitialSync verify that when
the local has files (globalFiles > 0), the remote also reports
globalFiles > 0 before accepting needBytes=0 as "sync complete". This
prevents the race where index exchange hasn't happened yet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When mesh-hello.txt is missing on master or worker, dump the state of
both containers' /workspace directories, the sidecar's syncthing
config folders, sidecar container logs, local sync log, and
okdev status --details. This gives us concrete evidence to diagnose
why files aren't landing where expected, rather than a single
one-line error message.

Made-with: Cursor
… mount

Generated workload manifests (e.g. `okdev init --workload pytorchjob`) keep
Go template actions like `{{ .WorkloadName }}` unrendered until apply time.
`workspaceMountPathFromManifest` parsed the raw file as YAML, which failed
on those placeholders and silently fell back to `/workspace`.

For PyTorchJob sessions where the sync remote is a subdirectory such as
`/workspace/a`, this caused the `pytorch` container to mount the shared
`workspace` emptyDir at `/workspace/a` while the injected `okdev-sidecar`
kept the default `/workspace`. Syncthing therefore wrote files to
`/workspace/a` (sidecar view) which surfaced on the target as
`/workspace/a/a`, so the E2E mesh check never found `mesh-hello.txt`.

Strip Go template placeholders with a safe stub before YAML parsing so
the workspace mount path is derived correctly regardless of the
template variables.

Made-with: Cursor
@acmore acmore merged commit 17f2c55 into main Apr 21, 2026
2 checks passed
@acmore acmore deleted the fix/sync-path-mesh-subdir branch April 21, 2026 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant