Conversation
- added enrich_update_actions_with_semantic() to analyze update actions - detect leaf changes within updated nodes using semantic.find_leaf_change() - extracting rename pairs for identifier changes
- added explicit rename action generation from update semantic leaf changes - added metadata on rename actions - added reusable range metadata helper, can be used for other actions to expose it to ui
gives around 100 renames in 2 ms so pretty fast
- normalizing each actions with build_action function - which carries type, kind, src_node, dst_node, src_range, dst_range, and other temporary compatibility fields: nodes, target so the engine dont explode right now
- now gives internal summaries of actions - this includes, summary.counts, summary.moves, summary.renames, summary.inserts, summary.deletes, summary.updates - from_line and to_line, is include in move summary - now core.generate_actions(...) returns actions, timings, summary
instead of hard coding the language node types will use query files to get the node type and use them (learned the hard way)
using the query files now to detect the actions, but now everything which is not mapped together is shown inserted/deleted will need to fix that... somehow
i was having the highlight based on both if it was mapped and if it had any descendant which was mapped to making everything insert/delete
added a stable ordering for nodes at each height level in top down matching. now nodes are sorted by their source position(start row/col, end row/col) and then by type, this prevents non-deterministic matching regardless of tree traversal order
… recovery passes similar to top-down ordering position deterministically, providing consistent order based on their source position. Bottom up pass sorts nodes by heights then position where recovery only sorts by position only
replaced the broad identifier rename captures with declaration focused captures across different languages, this tightens semantic rename check for field nodes to only accept name/key positions, this lets us reduce noisy rename detections from normal identifier usages and improves rename action quality
added a new core/payload.lua to build spans/signs/virt_text from normalized actions wired actions.lua to use payload.lua this keeps ui focused on consuming the payload instead of inferring semantic diffs itself
removed semantic/highlight diff logic from ui/renderer.lua and now renderer uses the core payload to show the highlights added ui/signs.lua for sign style/glyph/priority handling cleaned ui/helpers.lua by removing old semantic/internal diff helpers no longer needed in ui
renamed internal diff highlight group from Diff*Text/Sign to Diffmantic*
- added src_fillers/dst_fillers to payload output structure - generating filler entries for deletions/insertion/movement - handle trailing blank lines and top level nodes - applied fillers as virtual text as the native filler lines seem to be a native diff feature
…ysis.lua Normalized action objects around src/dst/metadata so actions.lua becomes core payload source. Replaced the update semantic storage from action.semantic to action.analysis
… move/rename operations - Add range_contains() to check if one range is within another - Add collect_suppressed_rename_pairs() to track renames within declarations and suppress redundant annotations - Add visual text annotations for move operations showing source/destination line numbers with arrows - Add visual text annotations for renames showing old/new names - Remove unused extra_src_fillers and extra_dst_fillers handling - Clean up trailing whitespace
This comment was marked as spam.
This comment was marked as spam.
There was a problem hiding this comment.
Pull request overview
This PR advances the “core vs UI” separation by moving semantic/rename detection and diff analysis into core modules, introducing Treesitter role queries across multiple languages, and simplifying the UI renderer to consume enriched action metadata (including rename + filler rendering).
Changes:
- Add Treesitter role indexing + language queries to classify semantic constructs (functions/classes/variables/assignments/imports/returns/rename identifiers).
- Extend action generation to emit
renameactions and enrichupdateactions with analyzed hunks/timings, and make node iteration deterministic. - Refactor UI rendering into smaller modules (signs + filler) and update benchmark/init to pass buffer context into core.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/benchmark.lua | Adds rename cases and passes src_buf/dst_buf into action generation; reports rename timings/actions. |
| queries/typescript/diffmantic.scm | Adds TS query captures for semantic roles + rename identifiers. |
| queries/python/diffmantic.scm | Adds Python query captures for semantic roles + rename identifiers. |
| queries/lua/diffmantic.scm | Adds Lua query captures for semantic roles + rename identifiers. |
| queries/javascript/diffmantic.scm | Adds JS query captures for semantic roles + rename identifiers. |
| queries/go/diffmantic.scm | Adds Go query captures for semantic roles + rename identifiers. |
| queries/cpp/diffmantic.scm | Adds C++ query captures for semantic roles + rename identifiers. |
| queries/c/diffmantic.scm | Adds C query captures for semantic roles + rename identifiers. |
| queries/fallback.scm | Adds a fallback query used when language-specific queries are unavailable. |
| lua/diffmantic/ui/signs.lua | New sign rendering helper with style + priority handling. |
| lua/diffmantic/ui/filler.lua | New filler-line computation/application to align moved/inserted/deleted blocks. |
| lua/diffmantic/ui/renderer.lua | Refactors renderer to consume enriched action ranges/hunks and delegate to signs/filler. |
| lua/diffmantic/ui/helpers.lua | Removes prior diff logic, leaving only inline virt-text helper. |
| lua/diffmantic/init.lua | Updates highlight setup (with ColorScheme autocmd) and passes buffers into action generation. |
| lua/diffmantic/core/top_down.lua | Adds stable ordering for node processing at each height. |
| lua/diffmantic/core/recovery.lua | Makes recovery application deterministic by sorting src_info traversal. |
| lua/diffmantic/core/bottom_up.lua | Integrates role index for name-based matching and deterministic traversal. |
| lua/diffmantic/core/semantic.lua | Introduces core semantic helpers (leaf diffs, rename-identifier heuristics, fragments). |
| lua/diffmantic/core/roles.lua | New module to build/query a capture index from diffmantic Treesitter queries. |
| lua/diffmantic/core/analysis.lua | New module to compute update hunks and suppress rename-noise within updates. |
| lua/diffmantic/core/actions.lua | Major refactor: richer action payloads, rename emission, timings, and update enrichment. |
| lua/diffmantic/core/payload.lua | New (currently unreferenced) module containing render-enrichment logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lua/diffmantic/core/roles.lua
Outdated
| for _, captured in pairs(by_capture) do | ||
| if node:equal(captured) or node:child_with_descendant(captured) then | ||
| return captured | ||
| end | ||
| end | ||
|
|
||
| return nil |
There was a problem hiding this comment.
find_descendant_with_capture iterates pairs(by_capture), which is non-deterministic and can return a nested capture (e.g., a nested function name) instead of the closest/outer one. This can make name extraction flaky and break matching/rename detection. Consider selecting the best candidate deterministically (e.g., smallest range / earliest in document order) or iterating captures in-order for the given subtree rather than using pairs over a map.
| for _, captured in pairs(by_capture) do | |
| if node:equal(captured) or node:child_with_descendant(captured) then | |
| return captured | |
| end | |
| end | |
| return nil | |
| local best = nil | |
| local best_len_rows, best_len_cols = nil, nil | |
| local best_start_row, best_start_col = nil, nil | |
| for _, captured in pairs(by_capture) do | |
| if node:equal(captured) or node:child_with_descendant(captured) then | |
| local start_row, start_col, end_row, end_col = captured:range() | |
| local len_rows = end_row - start_row | |
| local len_cols = end_col - start_col | |
| if not best then | |
| best = captured | |
| best_len_rows, best_len_cols = len_rows, len_cols | |
| best_start_row, best_start_col = start_row, start_col | |
| else | |
| -- Prefer the smallest range; if equal, prefer earlier in document order. | |
| if len_rows < best_len_rows | |
| or (len_rows == best_len_rows and (len_cols < best_len_cols | |
| or (len_cols == best_len_cols and (start_row < best_start_row | |
| or (start_row == best_start_row and start_col < best_start_col)))))) then | |
| best = captured | |
| best_len_rows, best_len_cols = len_rows, len_cols | |
| best_start_row, best_start_col = start_row, start_col | |
| end | |
| end | |
| end | |
| end | |
| return best |
| if math.abs(#src_leaves - #dst_leaves) > 2 then | ||
| return {} | ||
| end | ||
|
|
There was a problem hiding this comment.
In find_leaf_changes, the math.abs(#src_leaves - #dst_leaves) > 2 guard is unreachable because the previous check returns immediately when the lengths differ. If the intent is to allow small leaf-count differences, the first condition should be relaxed; otherwise the second check should be removed to avoid dead code and confusion.
| if math.abs(#src_leaves - #dst_leaves) > 2 then | |
| return {} | |
| end |
| local function has_kind(node, index, kind) | ||
| return index and roles.has_kind(node, index, kind) or false | ||
| end | ||
|
|
||
| local function is_significant(info, index) | ||
| local node = info.node | ||
| if has_kind(node, index, "function") | ||
| or has_kind(node, index, "class") | ||
| or has_kind(node, index, "variable") | ||
| or has_kind(node, index, "assignment") | ||
| or has_kind(node, index, "import") | ||
| or has_kind(node, index, "return") | ||
| or has_kind(node, index, "preproc") | ||
| then | ||
| return true | ||
| end | ||
| return significant_types[info.type] or false | ||
| end | ||
|
|
||
| local function is_transparent_update_ancestor(info, index) | ||
| if has_kind(info.node, index, "class") then | ||
| return true | ||
| end | ||
| return transparent_update_ancestors[info.type] or false | ||
| end | ||
|
|
||
| local function is_movable(info, index) | ||
| if has_kind(info.node, index, "function") or has_kind(info.node, index, "class") then | ||
| return true | ||
| end | ||
| return movable_types[info.type] or false | ||
| end |
There was a problem hiding this comment.
roles.has_kind(..., "function"/"class"/"variable"/...) returns true for any node captured as outer/name/body (per CAPTURE_BY_KIND). Using it in is_significant / is_movable can accidentally mark name identifiers or bodies as significant/movable, which may cause noisy insert/delete/update actions for sub-nodes when mappings are incomplete. Consider making the kind checks refer only to the *.outer captures (or adding a separate has_outer_kind helper) so only the declaration/container node is treated as significant.
lua/diffmantic/ui/filler.lua
Outdated
| if src_count > 0 then | ||
| local src_block_end = sec > 0 and (ser + 1) or ser | ||
| local best_dst_row = nil | ||
| local best_src_row = math.huge | ||
| for _, other in ipairs(actions) do | ||
| if other ~= action and other.src_node and other.dst_node then | ||
| local osr = select(1, other.src_node:range()) | ||
| local odr = select(1, other.dst_node:range()) | ||
| if osr >= src_block_end and osr < best_src_row then | ||
| best_src_row = osr | ||
| best_dst_row = odr | ||
| end | ||
| end | ||
| end |
There was a problem hiding this comment.
compute() does an O(n) scan over actions for every move action to find best_dst_row (nested loop), which becomes O(n²) in large diffs and can dominate runtime (benchmarks go up to thousands of nodes). Consider precomputing the “next move after this source block” mapping once (e.g., sort moves by src_row and keep a parallel list of dst_row) so each move can find its anchor in O(log n) or O(1).
lua/diffmantic/core/payload.lua
Outdated
| local semantic = require("diffmantic.core.semantic") | ||
|
|
||
| local M = {} | ||
| local RENDER_UPDATE_TOKENS = true | ||
|
|
||
| local function node_range(node) | ||
| if not node then | ||
| return nil | ||
| end | ||
| local ok, sr, sc, er, ec = pcall(function() |
There was a problem hiding this comment.
lua/diffmantic/core/payload.lua is added but doesn’t appear to be referenced anywhere in the repository (no require("diffmantic.core.payload") usages). If it’s not part of the current architecture split, consider removing it from this PR to avoid carrying a large unused module; otherwise, wire it into the code path or add a brief comment explaining its intended entry point.
lua/diffmantic/ui/helpers.lua
Outdated
| @@ -236,278 +13,4 @@ function M.set_inline_virt_text(buf, ns, row, col, text, hl) | |||
| pcall(vim.api.nvim_buf_set_extmark, buf, ns, row, col, opts) | |||
| end | |||
There was a problem hiding this comment.
lua/diffmantic/ui/helpers.lua now only contains set_inline_virt_text, but there are no remaining references to diffmantic.ui.helpers or set_inline_virt_text in the codebase. Consider removing this module (or reusing it from renderer.lua) to avoid dead code.
- action range construction now uses normalized src/dst fields - summary tables read metadata/src/dst directly inside of legacy code
e3d1ad3 to
f5ada37
Compare
Simplify parameter identifier extraction and suppress noisy field declaration renames that lack semantic context. - Change walk() to first_identifier() - extract only first identifier per parameter instead of recursively collecting all - Remove type_identifier from identifier matching - Remove Pass 1b strict fallback seed collection logic - Disable rename detection for field_declaration declarators - Update C/C++ queries to exclude field_identifier from init_declarator
prevents noisy rename detection for identifiers within class bodies that are not the class name itself suppressing update actions for class bodies in favour of move/insert/delete - added is_class_like_context check to exclude non-name identifiers in class bodies from rename detection - added should_emit_update to suppress update for class/struct/types bodies
…count tracking - added range_text() helper to extract text from buffer ranges - added action_pair_key() to generate unique keys for action pairs - removed base_count field from filler entries (unused) - removed move_anchor_after_blank_run() in favor of more precise count_filler_lines_at() calculation
9442d7b to
e0a9a04
Compare
…acement scaffolding
…covered insert/delete fillers
…s (skip action-level duplicates)
…move false core diff noise
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 38 out of 39 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| name = username, | ||
| email = email, | ||
| active = true, | ||
| created_at = nil, |
There was a problem hiding this comment.
Missing role field assignment in the Lua create_user function. The after.lua version (line 28-34) creates a user table without setting the role field, even though the role parameter is passed in and available. This differs from other language implementations where the role is properly assigned to the user object.
| created_at = nil, | |
| created_at = nil, | |
| role = role, |
| struct User create_user(const char *username, const char *email, const char *role) { | ||
| if (!validate_email_address(email)) { | ||
| fprintf(stderr, "Invalid email format\n"); | ||
| } | ||
|
|
||
| struct User user; | ||
| memset(&user, 0, sizeof(user)); | ||
| strncpy(user.name, username, sizeof(user.name) - 1); | ||
| strncpy(user.email, email, sizeof(user.email) - 1); | ||
| user.active = 1; | ||
| user.created_at = NULL; | ||
| (void)role; | ||
| return user; |
There was a problem hiding this comment.
The role parameter is accepted but not used in the create_user function. Similar to the C++ version, the function accepts a role parameter (line 29) but explicitly marks it as unused with (void)role on line 40. The User struct (line 7-12) also doesn't have a role field. This differs from before.c where the role was properly assigned using strncpy.
| end | ||
| if inner.end_row == outer.end_row and inner.end_col > outer.end_col then | ||
| return false | ||
| end |
There was a problem hiding this comment.
Inconsistent indentation: line 50 uses a tab character while surrounding lines use tabs. This specific line appears to have unusual indentation that may cause visual inconsistency.
| local x = 10 | ||
| return username.name .. " <" .. username.email .. ">" .. x | ||
| end | ||
|
|
There was a problem hiding this comment.
The format_user_display function has unexpected logic. A local variable x is defined as 10 and then concatenated to the user display string. This produces output like "John Doe <john@example.com>10" which differs from all other language implementations that just return "name <email>". This appears to be test data rather than production code, but the logic is inconsistent with the function's apparent purpose.
| local x = 10 | |
| return username.name .. " <" .. username.email .. ">" .. x | |
| end | |
| return username.name .. " <" .. username.email .. ">" | |
| end |
| function getUserPermissions(role: Role): string[] { | ||
| if (role === "member") { | ||
| return ["read"]; | ||
| } | ||
| if (role === "editor") { | ||
| return ["read", "write"]; | ||
| } | ||
| if (role === "admin") { | ||
| return ["read", "write", "delete", "manage"]; | ||
| } | ||
| return ["read", "write", "delete", "manage", "configure"]; | ||
| } |
There was a problem hiding this comment.
Missing handling of "editor" role in getUserPermissions. The function checks for "member" role (line 46) but the Role type includes "editor" and other valid roles. When role is "editor", the function will fall through to line 55 and return the superadmin permissions array, which is incorrect. An explicit check for "editor" role should be added to return ["read", "write"].
| User create_user(const std::string &username, const std::string &email, const std::string &role = ROLE) { | ||
| if (!validate_email_address(email)) { | ||
| throw std::runtime_error("Invalid email format"); | ||
| } | ||
|
|
||
| User user; | ||
| user.name = username; | ||
| user.email = email; | ||
| user.active = true; | ||
| user.created_at = ""; | ||
| (void)role; | ||
| return user; |
There was a problem hiding this comment.
The role parameter is accepted but not used in the create_user function. The function accepts a role parameter with a default value (line 30), but the User struct created on line 35 doesn't include a role field assignment. The role parameter is cast to void on line 40, which explicitly marks it as unused. This differs from the pattern in before.cpp where role was properly used.
| struct User { | ||
| std::string name; | ||
| std::string email; | ||
| bool active; | ||
| std::string created_at; | ||
| }; |
There was a problem hiding this comment.
The User struct is missing the role field. In before.cpp (line 18-23), the User struct has a role field, but in after.cpp (line 8-13) it's been removed. However, the create_user function still accepts a role parameter. This is a breaking change to the data structure that should be addressed.
| while i <= len do | ||
| local ch = text:sub(i, i) | ||
| if ch:match("%s") then | ||
| i = i + 1 | ||
| elseif ch:match("[%w_]") then | ||
| local j = i + 1 | ||
| while j <= len and text:sub(j, j):match("[%w_]") do | ||
| j = j + 1 | ||
| end | ||
| table.insert(tokens, { text = text:sub(i, j - 1), start_col = i, end_col = j - 1 }) | ||
| i = j | ||
| else | ||
| -- Keep punctuation/granular symbols as single-char tokens so we can | ||
| -- align shared delimiters (e.g. closing quote) and avoid off-by-one spans. | ||
| table.insert(tokens, { text = ch, start_col = i, end_col = i }) | ||
| i = i + 1 | ||
| end | ||
| end | ||
| return tokens | ||
| end |
There was a problem hiding this comment.
Inconsistent indentation: lines 50-68 use inconsistent indentation (tabs vs spaces). The function body starting at line 50 should maintain consistent indentation with the rest of the file.
| while i <= len do | |
| local ch = text:sub(i, i) | |
| if ch:match("%s") then | |
| i = i + 1 | |
| elseif ch:match("[%w_]") then | |
| local j = i + 1 | |
| while j <= len and text:sub(j, j):match("[%w_]") do | |
| j = j + 1 | |
| end | |
| table.insert(tokens, { text = text:sub(i, j - 1), start_col = i, end_col = j - 1 }) | |
| i = j | |
| else | |
| -- Keep punctuation/granular symbols as single-char tokens so we can | |
| -- align shared delimiters (e.g. closing quote) and avoid off-by-one spans. | |
| table.insert(tokens, { text = ch, start_col = i, end_col = i }) | |
| i = i + 1 | |
| end | |
| end | |
| return tokens | |
| end | |
| while i <= len do | |
| local ch = text:sub(i, i) | |
| if ch:match("%s") then | |
| i = i + 1 | |
| elseif ch:match("[%w_]") then | |
| local j = i + 1 | |
| while j <= len and text:sub(j, j):match("[%w_]") do | |
| j = j + 1 | |
| end | |
| table.insert(tokens, { text = text:sub(i, j - 1), start_col = i, end_col = j - 1 }) | |
| i = j | |
| else | |
| -- Keep punctuation/granular symbols as single-char tokens so we can | |
| -- align shared delimiters (e.g. closing quote) and avoid off-by-one spans. | |
| table.insert(tokens, { text = ch, start_col = i, end_col = i }) | |
| i = i + 1 | |
| end | |
| end | |
| return tokens | |
| end |
| vim.cmd("tabnew") | ||
| vim.cmd("edit " .. file1) | ||
| buf1 = vim.api.nvim_get_current_buf() | ||
| local win1 = vim.api.nvim_get_current_win() | ||
| win1 = vim.api.nvim_get_current_win() | ||
|
|
||
| vim.cmd("vsplit " .. file2) | ||
| buf2 = vim.api.nvim_get_current_buf() |
There was a problem hiding this comment.
vim.cmd is being built by concatenating untrusted path arguments (file1/file2) directly into an Ex command string, which allows a crafted filename or argument containing | or newlines to inject additional Vim commands (command injection). An attacker controlling the file path (e.g. via a repository containing a file named foo|:!rm -rf ~ or via a crafted :Diffmantic invocation) could cause arbitrary Ex commands to run in the user’s editor. Use the structured Lua API for commands (e.g. passing the path via an args/mods table or equivalent) or robustly escape/quote the path so that user-controlled strings cannot break out of the edit/vsplit command.
| local expanded_path = vim.fn.expand(file1) | ||
|
|
||
| vim.cmd("vsplit " .. expanded_path) | ||
| buf2 = vim.api.nvim_get_current_buf() | ||
| local win2 = vim.api.nvim_get_current_win() | ||
|
|
||
| vim.wo[win1].scrollbind = true | ||
| vim.wo[win1].cursorbind = true | ||
| vim.wo[win2].scrollbind = true | ||
| vim.wo[win2].cursorbind = true | ||
| win2 = vim.api.nvim_get_current_win() |
There was a problem hiding this comment.
vim.cmd is called with expanded_path interpolated directly into the command string, so a filename that expands to a value containing | or newlines can inject extra Ex commands (command injection) when users run :Diffmantic. Because expanded_path is derived from vim.fn.expand(file1), a maliciously named file combined with a trusted mapping or autocmd that calls this function could trigger arbitrary command execution in Neovim. Switch to the structured command API (passing the path in a dedicated args field) or escape/quote expanded_path before concatenation so that special command separators cannot be interpreted.
No description provided.