Skip to content

feat(native): add rebuild-on-change for NativeModule#1599

Open
jeff-hykin wants to merge 23 commits intodevfrom
jeff/feat/native_rebuild
Open

feat(native): add rebuild-on-change for NativeModule#1599
jeff-hykin wants to merge 23 commits intodevfrom
jeff/feat/native_rebuild

Conversation

@jeff-hykin
Copy link
Copy Markdown
Member

@jeff-hykin jeff-hykin commented Mar 17, 2026

Problem

Editing source code doesnt cause native modules to rebuild. Very easy to forget and not friendly for ai edits.

Solution

Some generic utils:

  • dimos/utils/change_detect.py: Content-hash-based file change detection using xxhash.
  • NativeModuleConfig.rebuild_on_change: Optional list[str|Path|Glob]

Breaking Changes

None

How to Test

# Change detection tests
pytest dimos/utils/test_change_detect.py -v

# Native module rebuild tests
pytest dimos/core/test_native_rebuild.py -v

# Native module crash/thread leak test
pytest dimos/core/test_native_module.py::test_process_crash_triggers_stop -v

# LCM isolation tests
pytest dimos/protocol/pubsub/test_pattern_sub.py -v
pytest dimos/protocol/pubsub/impl/test_lcmpubsub.py -v

# Full fast suite
pytest -m 'not (tool or slow or mujoco)' dimos/

Contributor License Agreement

  • I have read and approved the CLA.

jeff-hykin and others added 4 commits March 15, 2026 14:53
Add a generic file change detection utility (dimos/utils/change_detect.py)
that tracks content hashes via xxhash and integrate it into NativeModule so
it can automatically rebuild when watched source files change.

- change_detect.did_change() hashes file content, stores per-cache-name
  hash files in the venv, and returns True when files differ
- NativeModuleConfig gains rebuild_on_change: list[str] | None
- NativeModule._maybe_build() deletes stale executables when sources change
- 11 tests for change_detect, 3 integration tests for native rebuild
…avoid unlinking Nix store executables

- Add `cwd` parameter to `did_change()` and `_resolve_paths()` so relative
  glob patterns in `rebuild_on_change` are resolved against the module's
  working directory instead of the process cwd.
- Replace `exe.unlink()` with a `needs_rebuild` flag so executables that
  live in read-only locations (e.g. Nix store) are not deleted; instead
  the build command is re-run which handles the output path itself.
…avoid unlinking Nix store executables

- Add `cwd` parameter to `did_change()` and `_resolve_paths()` so relative
  glob patterns in `rebuild_on_change` are resolved against the module's
  working directory instead of the process cwd.
- Replace `exe.unlink()` with a `needs_rebuild` flag so executables that
  live in read-only locations (e.g. Nix store) are not deleted; instead
  the build command is re-run which handles the output path itself.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jeff-hykin jeff-hykin marked this pull request as draft March 17, 2026 22:56
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR adds a content-hash-based file change detection utility (dimos/utils/change_detect.py) and integrates it into NativeModule so native executables automatically rebuild when watched source files change. It also fixes a thread-leak ordering bug in NativeModule.stop(), improves LCM test isolation with dedicated multicast addresses, and resolves the missing xxhash dependency.

Key changes and findings:

  • xxhash dependency added to pyproject.toml — resolves the previously-flagged missing dependency.
  • Thread-leak fixsuper().stop() (which joins the asyncio loop thread) is now correctly called before _process = None, preventing a race condition in CI.
  • LCM test isolationtest_lcmpubsub.py and test_pattern_sub.py now use dedicated multicast groups (239.255.76.98:7698 and 239.255.76.99:7699) to prevent cross-test contamination.
  • mesh_utils.py regressiondid_change is called without cwd using str(urdf_path). If urdf_path is relative (the function signature accepts Path | str with no absolute constraint), _resolve_paths raises ValueError. Previously the mtime-based approach handled relative paths transparently via the process CWD.
  • Two issues flagged in prior review rounds remain open: the pre-build did_change call writes the new hash before confirming build success (stale-build-on-failure scenario), and the post-build seeding call does not forward cwd for relative path sets.

Confidence Score: 3/5

  • Core feature is functional on the happy path but has known reliability gaps: a build failure leaves the cache in a state that permanently suppresses future rebuild attempts, and a new regression in mesh_utils.py breaks relative URDF paths.
  • Several improvements from prior rounds are addressed (xxhash dependency, thread-leak, LCM isolation), but the two central reliability bugs in the rebuild-on-change feature (cache written before build success, missing cwd in post-build seed call) remain unaddressed from earlier review rounds. The new relative-path regression in mesh_utils.py adds one more concrete fix required before merge.
  • dimos/core/native_module.py (pre-build hash write / missing cwd in post-build seed) and dimos/manipulation/planning/utils/mesh_utils.py (relative urdf_path raises ValueError)

Important Files Changed

Filename Overview
dimos/utils/change_detect.py New content-hash change detection utility using xxhash. Core logic is sound; thread + file locking implemented correctly. The known design trade-off (cache always written on call, not just on success) is the root cause of the previously-flagged build-failure regression, but is otherwise well-documented.
dimos/core/native_module.py Integrates rebuild-on-change into _maybe_build. Thread-leak fix (super().stop() before _process=None) is correct. Two previously-flagged issues remain open: the pre-build did_change call writes the hash before confirming build success, and the post-build seeding call (line 314) does not pass cwd, both already tracked in earlier review rounds.
dimos/manipulation/planning/utils/mesh_utils.py Switches URDF cache invalidation from mtime-in-cache-key to did_change. Introduces a behavioral regression: relative urdf_path values now raise ValueError because did_change is called without cwd, whereas the previous mtime approach worked with relative paths.
dimos/core/test_native_rebuild.py New test file covering happy-path rebuild-on-change scenarios. Uses autouse tmp_path cache redirect for proper isolation. All tests use try/finally to call mod.stop().
dimos/utils/test_change_detect.py Good unit coverage for change_detect.py. One test (test_nonexistent_path_warns) captures caplog but never asserts its contents, previously flagged in an earlier round.
dimos/protocol/pubsub/impl/test_lcmpubsub.py LCM test isolation fix using a dedicated multicast address — clean and correct improvement.
dimos/protocol/pubsub/test_pattern_sub.py Same LCM isolation fix as test_lcmpubsub.py, using a separate isolated multicast group.
pyproject.toml Adds xxhash>=3.0.0 to project dependencies, resolving the missing dependency flagged in a prior review round.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[NativeModule._maybe_build] --> B{rebuild_on_change\nset AND exe exists?}
    B -- No --> C{exe exists?}
    B -- Yes --> D["did_change(cache_name, paths, cwd)\n⚠ writes hash to cache immediately"]
    D -- False\nno change --> C
    D -- True\nfiles changed --> E[needs_rebuild = True\nlog 'Source files changed']
    C -- Yes and not needs_rebuild --> F[return early\nno build]
    C -- No OR needs_rebuild --> G{build_command set?}
    G -- No --> H[raise FileNotFoundError]
    G -- Yes --> I[subprocess.Popen build_command]
    I --> J{returncode == 0?}
    J -- No --> K[raise RuntimeError\n⚠ cache already updated\nfuture rebuilds blocked]
    J -- Yes --> L{exe exists\nafter build?}
    L -- No --> M[raise FileNotFoundError]
    L -- Yes --> N["did_change(cache_name, paths, cwd)\nseed cache post-build\n⚠ cwd arg present here"]
    N --> O[build complete]
Loading

Reviews (2): Last reviewed commit: "fix(test): isolate LCM multicast in flak..." | Re-trigger Greptile

@jeff-hykin jeff-hykin force-pushed the jeff/feat/native_rebuild branch from bfab461 to e01688c Compare March 19, 2026 21:09
@jeff-hykin jeff-hykin force-pushed the jeff/feat/native_rebuild branch 2 times, most recently from fa3defd to cb5041b Compare March 20, 2026 00:40
fcntl.flock is per-file-description, not per-thread. Two threads in the
same process can both hold LOCK_EX simultaneously. Add a per-cache-name
threading.Lock to protect intra-process concurrent access.

Revert: git revert HEAD
Two NativeModule subclasses in the same file would share a cache key,
corrupting each other's rebuild state. Add qualname to disambiguate.

Revert: git revert HEAD
Move super().stop() before self._process = None so the asyncio loop
thread is joined before tests see the exit signal. Wrap crash test
in try/finally with mod.stop().
Use dedicated multicast addresses to prevent cross-test contamination
in test_pattern_sub.py and test_lcmpubsub.py.
@jeff-hykin jeff-hykin changed the title feat(native): add rebuild-on-change for NativeModule [Jeff's Claw Bot] feat(native): add rebuild-on-change for NativeModule Mar 23, 2026
@jeff-hykin jeff-hykin marked this pull request as ready for review March 23, 2026 22:38
@jeff-hykin jeff-hykin changed the title [Jeff's Claw Bot] feat(native): add rebuild-on-change for NativeModule feat(native): add rebuild-on-change for NativeModule Mar 24, 2026
jeff-hykin and others added 5 commits March 30, 2026 13:52
…imos into jeff/feat/native_rebuild

# Conflicts:
#	dimos/core/native_module.py
#	dimos/utils/change_detect.py
- P0: did_change() no longer writes cache before build completes. Added
  update=False param to check without updating, and update_cache() to
  explicitly write after successful build.
- P0: native_module uses update=False for pre-build check, update_cache()
  only after confirmed-good build. Failed builds won't poison the cache.
- P1: xxhash already in pyproject.toml deps (resolved by dev merge).
- P2: did_change update param gives callers control over when to persist.
- P2: test_nonexistent_path_warns now asserts result is False (not just bool).
- Added tests for update=False and update_cache workflows.
@jeff-hykin jeff-hykin mentioned this pull request Mar 31, 2026
6 tasks
@jeff-hykin jeff-hykin enabled auto-merge (squash) March 31, 2026 00:51
# Uses update_cache (not did_change) so we only write the hash after a
# confirmed-good build — a failed build won't poison the cache.
if self.config.rebuild_on_change:
from dimos.utils.change_detect import update_cache
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to the top. You already ahve this file imported.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants