Skip to content

Resync Rust port to graphify-py v0.8.30 (724f1e3)#17

Merged
rblaine95 merged 11 commits into
masterfrom
resync/graphify-py/724f1e3
Jun 3, 2026
Merged

Resync Rust port to graphify-py v0.8.30 (724f1e3)#17
rblaine95 merged 11 commits into
masterfrom
resync/graphify-py/724f1e3

Conversation

@rblaine95
Copy link
Copy Markdown
Member

@rblaine95 rblaine95 commented Jun 3, 2026

Advances the graphify-py submodule to v0.8.30 (724f1e3) and ports the applicable
v0.8.27..v0.8.30 changes to the Rust workspace. Workspace version bumped to 0.8.30.

Ports

  • cluster — total-order community-ID tiebreak ((-len, sorted members)) so an identical
    grouping always gets identical integer IDs run-to-run (#1090 follow-up).
  • detect — F2 office/PDF resource caps: 50 MiB on-disk screen + .docx/.xlsx zip-bomb
    guard (512 MiB decompressed, 200:1 ratio, chunked streaming ceiling).
  • extract — full Dart-extractor rewrite (inheritance/mixins/interfaces/generics,
    Bloc/Riverpod/Navigator patterns, annotations, typedefs, records/destructuring, part of
    redirection, generic type-lookups); F5 — cpp is passed an absolute path so an
    attacker-named Fortran file can't be parsed as an option.
  • llm — F1: validate custom-provider base_url (reject non-http(s), warn on plaintext
    egress) and gate a project-local providers.json behind GRAPHIFY_ALLOW_LOCAL_PROVIDERS;
    F3: hard-block an OLLAMA_BASE_URL that is, or resolves to, a link-local/cloud-metadata
    address while still only warning for a general LAN host.
  • hooks / CLI — Read/Glob PreToolUse nudge for Claude Code (#1114); Kilo Code platform
    (native skill + /graphify command + .kilo plugin + JSONC-safe kilo.json registration);
    Antigravity project-scoped installs now write the full rules/workflow layer; claude uninstall
    removes the orphaned user-scope skill tree (#1121).

Divergences from graphify-py (intentional)

  • The skillgen progressive-disclosure split and per-platform references/ sidecars stay
    collapsed to one canonical skill.md (the Rust binary runs the whole pipeline as a single
    graphify extract, with no host-driven subagent fan-out).
  • _backend_pkg_hint has no Rust equivalent — all backends compile in, so there is no
    optional-package import that can fail.
  • The Dart variable-type blacklist applies to the cleaned type, so a generic primitive like
    Map<String, int> is correctly skipped (graphify-py checks the raw string and leaks a
    spurious map reference — a bug fixed here, not replicated).

Verification

  • cargo nextest run --workspace1715 pass (2 net-prefixed skipped).
  • cargo clippy --all-targets --all-features --workspace — clean.
  • hk check — clean.
  • CodeRabbit — 7 rounds (--type committed), every finding fixed or documented as a dispute
    in-code. The final zero-findings re-verify was blocked by an org usage-credit rate-limit; the
    last actionable finding is fixed in 29d23ce.

Summary by CodeRabbit

  • New Features

    • Kilo Code platform integration now available for installation/uninstallation.
    • Added Read/Glob hook for Claude to suggest graph-based queries when reading source files.
    • Enhanced Dart language extractor with improved pattern recognition and navigation detection.
  • Security

    • Office/PDF document parsing now enforces resource caps to prevent zip-bomb attacks; files exceeding limits are silently skipped.
    • Ollama URL validation now blocks unsafe targets (metadata/link-local addresses) to prevent SSRF attacks.
  • Improvements

    • Deterministic clustering for reproducible community ordering across runs.
    • Added environment configuration options for custom Ollama providers and local provider security.
  • Documentation

    • Updated README and USAGE.md with new security features and configuration guidance.
  • Version

    • Bumped to v0.8.30

rblaine95 added 9 commits June 3, 2026 10:31
Port the applicable v0.8.27..v0.8.30 changes and bump the workspace
version to 0.8.30 in lockstep with the submodule pointer.

Ports:

- cluster: total-order community-ID tiebreak so an identical grouping
  always gets identical integer IDs run-to-run (#1090 follow-up)
- detect: F2 office/PDF resource caps - 50 MiB on-disk screen plus a
  zip-bomb guard (512 MiB decompressed, 200:1 ratio, chunked streaming
  ceiling) gating PDF/DOCX/XLSX text extraction
- extract: rewrite the Dart extractor (inheritance, mixins, interfaces,
  generics, annotations, Riverpod/Bloc codegen, extensions, typedefs,
  records/destructuring, navigation, `part of` redirection, generic
  type-lookups); F5 - pass `cpp` an absolute path so an attacker-named
  Fortran file cannot be parsed as an option
- llm: F1 - validate custom-provider `base_url` (reject non-http(s),
  warn on plaintext egress) and gate a project-local `providers.json`
  behind `GRAPHIFY_ALLOW_LOCAL_PROVIDERS`; F3 - hard-block an
  `OLLAMA_BASE_URL` that is, or resolves to, a link-local/cloud-metadata
  address while still only warning for a general LAN host
- hooks: Read/Glob `PreToolUse` nudge for Claude Code (#1114); Kilo Code
  platform - native skill, `/graphify` command, `.kilo` plugin and
  JSONC-safe `kilo.json` registration; Antigravity project-scoped
  installs now write the full rules/workflow layer; `claude uninstall`
  removes the orphaned user-scope skill tree (#1121)

Divergences from graphify-py (documented in code + notes): the
skillgen progressive-disclosure split and per-platform `references/`
sidecars stay collapsed to one canonical skill; the `_backend_pkg_hint`
install message has no Rust equivalent (all backends compile in).

Glory to the Omnissiah
Fixes:

- kilo: `strip_json_comments` built the output by pushing each input
  byte as a `char`, corrupting multibyte UTF-8 in a `.kilo/kilo.jsonc`;
  accumulate raw bytes and decode once at the end (major)
- llm: parse the `127.0.0.0/8` loopback case as an `Ipv4Addr` instead of
  a `starts_with("127.")` prefix, so a hostname like `127.evil.com` is
  correctly treated as non-loopback (in both `provider_base_url_ok` and
  `validate_ollama_base_url`); strip brackets from an IPv6 host so a
  bracketed link-local literal like `[fe80::1]` is caught
- llm: the `detect_backend` ollama gate now validates with `warn=false`
  so a non-loopback LAN host is not warned about twice
- extract: `resolve_cpp_path` falls back to a `./`-prefixed path when the
  cwd is unavailable, so the result never looks like a `cpp` option
- hooks: `#[must_use]` on `claude_user_skill_dst` and `read_settings_hook`;
  drop redundant `serde_json`/`url` dev-deps (already normal deps)
- tests: IPv6 (`[fe80::1]`/`[::1]`/fe80 resolver) ollama cases, a
  `127.evil.com` provider case, a `#[cfg(unix)]` guard on the leading-`/`
  cpp assertion, and a `HomeGuard` RAII in the kilo tests

Disputes (documented in code comments):

- `cmd_kilo` intentionally ignores the shared `--project` flag: graphify-py's
  `kilo` command has no project-scope variant
- the `# ...` markers in the Dart test fixture are an intentional
  byte-faithful copy of graphify-py's test_dart.py (they exercise that
  non-`//` lines survive comment-stripping)

By the will of the Machine God
Fixes:

- llm: strip IPv6 brackets from the provider `base_url` host (`[::1]`)
  before the loopback check, matching the same fix already applied to
  `validate_ollama_base_url`
- detect: drop the redundant `zip` dev-dependency (already a normal dep,
  so it is available to the `office_limits` integration test)
- extract: `#[must_use]` on `split_types`; correct the `resolve_cpp_path`
  doc to state the `./`-prefixed fallback when the cwd is unreadable
- hooks: clarify the `KILO_PLUGIN_JS` doc (structurally mirrors the
  OpenCode plugin but the echo text differs; byte-identical to the
  Python `_KILO_PLUGIN_JS`, not to OpenCode)

Dispute (documented in code): the Read/Glob hook command is kept as one
whole literal rather than decomposed into fragments, so it stays
byte-identical to graphify-py's `_READ_SETTINGS_HOOK["command"]`; its
behaviour is validated by executing it via `sh -c` in tests/read_hook.rs.

Ave Deus Mechanicus
Fixes:

- llm: detect loopback via `IpAddr::is_loopback` (covers `127.0.0.0/8`
  and `::1`, including IPv6-bracket and IPv6-loopback forms) in both
  `provider_base_url_ok` and `validate_ollama_base_url`
- hooks: fold the Codex hook removal into the shared agents-uninstall
  helper so `.codex/hooks.json` is cleaned in every branch (a missing or
  markerless AGENTS.md no longer orphans it), alongside the OpenCode /
  Kilo plugins
- detect: stream office-zip members in 64 KiB chunks (lower peak memory;
  the chunk size never affects the cap result)
- tests: a `call_llm` call-site test proving a metadata `OLLAMA_BASE_URL`
  is refused end-to-end on the no-key ollama path

Disputes (documented in code / here):

- `tiktoken-rs = "0.12"` is published (confirmed via `cargo search`, it
  resolves in `Cargo.lock`, and the full suite builds and passes), so the
  "may not be published" concern does not apply
- the claude skill-removal status message is emitted before the
  best-effort, infallible `remove_skill`, matching `gemini_uninstall`
- ollama URL validation runs only on the no-API-key path (graphify-py
  parity); the realistic ollama path has no key, and that path is now
  covered end-to-end

By the Omnissiah's grace
- kilo: `plugin_uri` now falls back to a lexical absolute path when the
  parent directory can't be canonicalized, so a stale `kilo.json` plugin
  entry can still be deregistered (matches Python's always-compute
  `resolve().as_uri()`); `#[must_use]` on `plugin_uri` and `display_rel`
- extract: document that the Dart `part of` redirect falls back to
  standalone extraction when the referenced parent file is missing
  (mirrors Python's `resolve()` + `exists()` fallback)

Ave Omnissiah
- extract: apply the Dart `variable_type` blacklist to the *cleaned*
  type, so a generic primitive like `Map<String, int>` is correctly
  skipped instead of emitting a spurious `map` reference (fixes a
  blacklist-bypass bug graphify-py shares; intentional divergence noted)
- llm: drop a redundant `host.to_string()` in `validate_ollama_base_url`
  (use `&str`, allocate only for the error path); add IPv6 loopback /
  link-local provider test cases; document why the `local != global`
  comparison is a plain path compare (Python parity, avoids extra I/O)
- tests: generalise the cli_install HOME-isolation comment

Disputes (documented in code / here):

- read_hook.rs keeps the file-top `#![allow(clippy::expect_used)]`:
  AGENTS.md explicitly sanctions this for test files, and the helpers
  orchestrate process spawning + fixture setup where `expect` with a
  message is the crate's established test pattern

The Machine God wills it
Document that the third `provider_base_url_ok` test argument is
`warn = true`, so the test intentionally exercises the warning paths.

Glory to the Omnissiah
Use exact-equality (not substring) for the destructure-key assertion in
the Dart roadmap test, so an unrelated label can never false-match.

By the will of the Machine God
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f4a380cf-03af-4160-b888-ea248bbe56c1

📥 Commits

Reviewing files that changed from the base of the PR and between 29d23ce and 0af1036.

📒 Files selected for processing (12)
  • crates/graphify-detect/Cargo.toml
  • crates/graphify-detect/src/office.rs
  • crates/graphify-detect/tests/office_limits.rs
  • crates/graphify-hooks/src/platform/claude.rs
  • crates/graphify-hooks/src/platform/common/fs.rs
  • crates/graphify-hooks/src/platform/common/install_skill.rs
  • crates/graphify-hooks/src/platform/common/mod.rs
  • crates/graphify-hooks/src/platform/kilo.rs
  • crates/graphify-llm/src/call.rs
  • crates/graphify-llm/src/extract.rs
  • crates/graphify-llm/tests/ollama_http.rs
  • tests/cli_install.rs
🚧 Files skipped from review as they are similar to previous changes (7)
  • crates/graphify-llm/src/extract.rs
  • tests/cli_install.rs
  • crates/graphify-llm/tests/ollama_http.rs
  • crates/graphify-hooks/src/platform/common/mod.rs
  • crates/graphify-detect/tests/office_limits.rs
  • crates/graphify-detect/src/office.rs
  • crates/graphify-hooks/src/platform/kilo.rs

📝 Walkthrough

Walkthrough

This PR introduces Kilo Code editor platform support with global skill/command installation and project-local plugin registration, implements SSRF and zip-bomb security gates for Ollama and untrusted documents, refactors Dart extraction to support part-of redirection and richer pattern detection, hardens Fortran C-preprocessor invocations, adds Claude Read|Glob hook nudging and user-scope skill management, improves cluster community determinism, and updates dependencies and documentation accordingly.

Changes

Kilo Code Platform Integration

Layer / File(s) Summary
Kilo module with JSONC config and plugin management
crates/graphify-hooks/src/platform/kilo.rs
Introduces 390-line Kilo platform module with JSONC comment stripping, JSON config loading/merging, plugin URI computation with file:// canonicalization fallback, and install/uninstall plugin and skill functions; wires global skill/command paths under $HOME/.config/kilo and project-local plugin registration.
CLI dispatch and Kilo command handling
src/cli/args.rs, src/cli/dispatch.rs, src/cli/install.rs
Adds Kilo variant to Command enum, routes through dispatch to cmd_kilo handler, special-cases Kilo in cmd_install for skill installation, and reformats imports.
Kilo skill command and hook constants
crates/graphify-hooks/skills/command-kilo.md, crates/graphify-hooks/src/platform/common/markdown.rs, crates/graphify-hooks/src/platform/common/skills.rs, crates/graphify-hooks/src/platform/mod.rs
Defines command-kilo.md instruction file, embeds COMMAND_KILO_MD markdown constant for /graphify slash command, introduces KILO_PLUGIN_JS for tool.execute.before plugin, and re-exports through module hierarchy.
Kilo platform wiring in agents module
crates/graphify-hooks/src/platform/agents.rs
Extends agents_install to recognize and install Kilo plugin, reworks agents_uninstall to ensure platform-specific artifacts are removed via new push_platform_extra_uninstall helper even when AGENTS.md is missing.
Kilo test suite with plugin lifecycle and HOME isolation
crates/graphify-hooks/tests/kilo.rs
Comprehensive tests verifying agents_install/uninstall with plugin registration, .kilo/kilo.jsonc byte-for-byte preservation, idempotency, global skill/command installation with RAII HomeGuard for environment isolation.
CLI integration tests for Kilo
tests/cli_install.rs
Isolates Kilo CLI operations with temporary HOME; tests kilo install writes skill/command output and plugin/AGENTS.md artifacts; tests uninstall removes all artifacts.

Security Hardening and Extractor Improvements

Layer / File(s) Summary
Office/PDF untrusted file screening with size and decompression caps
crates/graphify-detect/src/office.rs, crates/graphify-detect/tests/office_limits.rs, crates/graphify-detect/Cargo.toml
Adds OFFICE_MAX_RAW_BYTES, OFFICE_MAX_DECOMPRESSED_BYTES, OFFICE_MAX_COMPRESSION_RATIO constants; implements zip_within_caps_with two-pass screening (pre-filter declared sizes and compression ratio, stream with decompression ceiling); updates public parsing entry points (extract_pdf_text_with, docx_to_markdown, xlsx_to_markdown, xlsx_extract_structure) to short-circuit and return empty/default when caps exceeded; adds lopdf dev dependency and 276-line test suite validating all cap scenarios (ratio bombs, decompression ceiling, multi-member ZIPs, PDF over-cap).
Ollama SSRF/metadata URL validation with hard-block
crates/graphify-llm/src/ollama.rs, crates/graphify-llm/src/error.rs, crates/graphify-llm/src/backends.rs, crates/graphify-llm/src/call.rs, crates/graphify-llm/src/extract.rs, crates/graphify-llm/tests/ollama_http.rs
Implements link-local/cloud-metadata hostname detection via IP parsing and optional DNS resolution; adds OllamaUrlBlocked error variant; replaces validate_ollama_base_url with Result-based signature accepting warn flag; applies fail-closed validation in auto-detection, call_llm, and extract paths; test suite expands coverage to verify blocking behavior and warn suppression.
Custom provider registry opt-in gating and base_url validation
crates/graphify-llm/src/providers.rs, crates/graphify-llm/src/lib.rs, crates/graphify-llm/tests/provider_registry.rs
Introduces GRAPHIFY_ALLOW_LOCAL_PROVIDERS environment variable to gate project-local provider loading; implements provider_base_url_ok validation checking scheme/loopback/plaintext HTTP warnings; updates load_custom_providers_from to ignore local by default and enforce first-occurrence-wins name collision semantics; skips providers failing validation; test suite validates opt-in gating, scheme rejection, loopback handling.
Cluster community deterministic tie-breaking
crates/graphify-cluster/src/cluster.rs, crates/graphify-cluster/tests/parity.rs
Sorts community member lists, then sorts communities by descending size with lexicographic tiebreak on sorted member lists to ensure deterministic IDs when sizes equal; parity test verifies stable results across runs.
Dart extractor comprehensive refactor
crates/graphify-extract/src/extractors/dart.rs, crates/graphify-extract/tests/dart_parity.rs
Rewrites extract_dart from minimal regex extractor to stateful multi-pass processor: adds comment/string stripping, part-of library redirection with parent file resolution and conditional file-node suppression, 20+ regex patterns (classes/mixins/enums/typedefs/extensions/annotations/variables/methods/imports), Bloc/Riverpod/Navigator pattern detection, destructuring variable extraction with blacklist, generic type lookup, lossless UTF-8 handling; comprehensive test suite validates generic/Riverpod/Bloc/Flutter/regression scenarios.
Fortran C-preprocessor path hardening
crates/graphify-extract/src/extractors/fortran.rs, crates/graphify-extract/src/extractors/mod.rs, crates/graphify-extract/src/lib.rs, crates/graphify-extract/tests/cpp_preprocess.rs
Introduces resolve_cpp_path helper to compute safe absolute paths for cpp invocations with fallback ./ prefixing; updates cpp_preprocess to pass resolved path; re-exports through module hierarchy; test suite validates absolute-path behavior and non-option-looking formatting with attacker-shaped filenames.
Documentation and dependency updates
Cargo.toml, crates/graphify-llm/Cargo.toml, README.md, USAGE.md, graphify-py
Bumps workspace version to 0.8.30 and tiktoken-rs to 0.12; updates README with office/PDF screening features; extends USAGE.md with Kilo install instructions, OLLAMA_BASE_URL and GRAPHIFY_ALLOW_LOCAL_PROVIDERS environment variables, custom provider validation rules, and tightened cluster determinism claim; updates graphify-py subproject.

Claude Hook Enhancements and User-Scope Skill Management

Layer / File(s) Summary
**Read Glob PreToolUse hook and matcher constant**
crates/graphify-hooks/src/platform/common/hooks_json.rs, crates/graphify-hooks/src/platform/common/markdown.rs
Claude user-scope skill management and dual-hook registration
crates/graphify-hooks/src/platform/claude.rs
Adds claude_user_skill_dst helper for user-scope SKILL.md path; extends claude_uninstall to remove user-scope skill tree before section cleanup; updates install_claude_hook to register both Bash and Read
Shared Claude config directory resolution
crates/graphify-hooks/src/platform/common/fs.rs, crates/graphify-hooks/src/platform/common/install_skill.rs, crates/graphify-hooks/src/platform/common/mod.rs
Introduces claude_config_dir helper that reads CLAUDE_CONFIG_DIR and treats empty as unset; updates install_skill.rs to use shared helper for both skill and CLAUDE.md paths.
Claude test coverage with HOME isolation
crates/graphify-hooks/tests/parity.rs, crates/graphify-hooks/tests/read_hook.rs
Introduces claude_uninstall_to test helper with isolated HOME/CLAUDE_CONFIG_DIR; updates existing uninstall tests with #[serial(home_env)]; adds no-op and user-skill removal tests; adds read_hook.rs integration test module validating hook matcher, nudge triggers, silence rules, "fails open" behavior, and no-block enforcement.
Antigravity documentation and test updates
crates/graphify-hooks/src/platform/antigravity.rs, crates/graphify-hooks/tests/parity.rs
Updates antigravity.rs comments to reflect that both install modes write always-on rules/workflow under project_dir/.agents/; updates parity test to verify full layer (project skill + workspace rules/workflow + YAML frontmatter).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • bunkerlab-net/graphify#16: Both PRs modify the custom LLM provider registry in crates/graphify-llm/src/providers.rs, touching provider loading precedence and base_url validation behavior.
  • bunkerlab-net/graphify#12: Both PRs refactor crates/graphify-extract/src/extractors/dart.rs, including changes to Dart child node ID construction from file stem for deterministic IDs.

Poem

🐰 A Kilo hops in with plugin cheer,
Zip-bombs caught before they near,
Dart parts dance with parent grace,
Claude reads nudge you in the right place,
Security threads through every seam—
A safer, richer extraction dream! 🌿

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

coderabbitai[bot]

This comment was marked as resolved.

rblaine95 added 2 commits June 3, 2026 14:30
Resolve the findings from the latest CodeRabbit review of the v0.8.30
resync.

- `xlsx_extract_structure` now runs the same `zip_within_caps` screen as
  `xlsx_to_markdown` so every public XLSX parsing path honours the F2
  decompression/ratio caps. graphify-py leaves this path unguarded but
  flags (F-035) that it needs a bomb audit before use; Rust applies the
  guard now.
- F3 ollama validation in `call_llm` and `extract_files_direct` now
  hard-blocks link-local / cloud-metadata `OLLAMA_BASE_URL` for every
  ollama call, not only the no-key path. Reaching a real local ollama
  needs `GRAPHIFY_TEST_ALLOW_PRIVATE_IPS`, which also disarms the
  downstream SSRF guard, so this check is the only metadata defence on
  the ollama path. The LAN warning stays on the no-key path. Fixes a
  gap shared with graphify-py.
- `uninstall_claude_hook` now inspects nested `hooks[].command` strings
  via a shared `hook_targets_graphify` helper instead of stringifying
  the whole entry, mirroring the install path's precise matching.
- An empty `CLAUDE_CONFIG_DIR` is treated as unset via a shared
  `claude_config_dir` helper, so it never collapses to a stray relative
  path that install writes but uninstall cannot find. Applied to both
  install and uninstall sites.
- Add `#[must_use]` to the pure `PathBuf`-returning helpers in `kilo.rs`.

Test improvements:

- `pdf_over_cap_returns_empty` builds a genuinely valid PDF and proves
  the size cap (not a parse error) yields the empty string by extracting
  the same file under a large vs. tiny cap, via a new
  `extract_pdf_text_with` seam. Adds `lopdf` as a dev-dependency.
- `converters_return_empty_for_bomb` uses structurally valid DOCX/XLSX
  fixtures with the bomb payload in a real internal part.
- New `structure_extraction_returns_empty_for_bomb` covers the new
  `xlsx_extract_structure` guard.
- New non-empty-key assertion in
  `call_llm_blocks_metadata_ollama_url_at_call_site`.

The two file-top `expect_used`/`unwrap_used` nitpicks are declined: the
blanket allow in test files is the sanctioned project convention; the
rationale is recorded in code comments at each site rather than here.

Glory to the Omnissiah
Read `OLLAMA_BASE_URL` once into `ollama_base_url` before the backend
branching in `extract_files_direct_mode`, and reuse that single value
for both the F3 hard-block validation and the `ollama` dispatch arm.
Reading the environment twice could validate a different value than the
one actually sent. This mirrors the pattern already used in `call_llm`.

By the will of the Machine God
@rblaine95 rblaine95 force-pushed the resync/graphify-py/724f1e3 branch from e8c400c to 0af1036 Compare June 3, 2026 12:53
@rblaine95 rblaine95 merged commit 840a927 into master Jun 3, 2026
12 checks passed
@rblaine95 rblaine95 deleted the resync/graphify-py/724f1e3 branch June 3, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant