feat: add pascal optional extra for tree-sitter-pascal#1616
Open
vinicius-l-machado wants to merge 1 commit into
Open
feat: add pascal optional extra for tree-sitter-pascal#1616vinicius-l-machado wants to merge 1 commit into
pascal optional extra for tree-sitter-pascal#1616vinicius-l-machado wants to merge 1 commit into
Conversation
extract_pascal() already imports tree-sitter-pascal for AST-quality extraction and falls back to a regex extractor when it is absent (Graphify-Labs#781), but the grammar was not declared anywhere in the package metadata, so it was never installed and the AST path never ran out of the box. Declare a `pascal` extra (and add it to `all`) so users can opt into the AST extractor with `uv tool install "graphifyy[pascal]"`. tree-sitter-pascal publishes prebuilt wheels for every platform (win/macOS/Linux), so unlike the `dm` extra it needs no C toolchain. On a mid-size Delphi codebase the AST path yields notably more accurate relationship edges than the regex fallback (calls and inherits both up ~25%). README extras table and uv.lock updated accordingly.
nokternol
added a commit
to nokternol/graphify
that referenced
this pull request
Jul 4, 2026
…iling (Graphify-Labs#1616) `graphify explain "<phrase>"` treats its whole argument as one string that must match/prefix/substring a single node's label as a whole — so a genuine natural-language phrase (e.g. "critic score aggregation") returns "No node matching found" even when every individual word exists on a real, relevant node, because no node label ever literally contains the entire multi-word phrase. This silently dead-ends on exactly the query shape `explain` is otherwise suggested for, with no fallback and no signal that anything went wrong (worse than noise: a hard, silent zero). When the tiered lookup finds nothing and the phrase has more than one token, `explain` now falls back to the same per-token bag-of-words scoring `query` already uses (`_score_nodes`) and lists the top candidates by term overlap, in the same numbered-candidate format the existing ambiguity guard (Graphify-Labs#1613) uses, instead of a bare dead end. A genuine single-word miss is unaffected — gated on token count, since a one-word probe would score identically to the substring tier already tried and has nothing new to find. Regression tests: multi-word phrase with real term overlap surfaces candidates and excludes unrelated nodes; multi-word phrase with zero overlap still gets the honest original message; single-word miss is byte-identical to prior behavior. Full suite (2766 tests, 1 pre-existing unrelated failure) and ruff pass. Verified live against a real repo's graph.json: both previously-zero `explain` queries now surface their real target (`ratingsAggregation.ts`, `backdrops.handler.ts`) instead of nothing.
nokternol
added a commit
to nokternol/graphify
that referenced
this pull request
Jul 4, 2026
…ify-Labs#1618) Graphify-Labs#1616's term-overlap fallback (this same session) fixed `explain` hard- failing to zero on multi-word natural-language phrases, but has its own failure mode: when a query's only shared vocabulary with the corpus is one generic word, every node containing that word ties at the weakest possible bonus tier, and the fallback presents an arbitrary top-10 slice of that tie as though it were a considered answer. Live repro: "server startup error handling" matched 1,765 of this repo's 3,491 nodes (51%) — "server" is also this repo's top-level backend directory name — with the real target buried past rank 800, tied with 1,627 other nodes at the exact same floor score. That's not a useful answer, it's close to a coin flip dressed up as one. Fix: after scoring, if the candidate count exceeds both an absolute floor (50) and 15% of the graph's total node count, treat it as a noise flood and fall back to the same honest zero-match message a genuine miss gets, instead of printing a misleadingly specific candidate list. The floor keeps this from firing on small graphs/fixtures, where even "most of the graph matched" can be a small, legitimate list. Genuine large-but-real candidate lists (e.g. 31 candidates on this repo's ~3,491-node graph, an earlier fix's verified-good case) stay well under the threshold and are unaffected. Regression tests: a 60-of-61-node noise flood on one generic token now gets the honest no-match message; a 20-of-21-node case (below this graph size's threshold) still shows its candidate list normally, confirming the guard is for degenerate floods specifically, not just "more than 10 results." Full suite (2769 tests, all passing this run — the one known pre-existing test-order flake did not trigger) and ruff pass. Verified live: the exact 1,765-candidate flood from earlier now returns the honest no-match message; smaller legitimate fallbacks (critic score aggregation, backdrop image selection) are unaffected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
extract_pascal() already imports tree-sitter-pascal for AST-quality extraction and falls back to a regex extractor when it is absent (#781), but the grammar was not declared anywhere in the package metadata, so it was never installed and the AST path never ran out of the box.
Declare a
pascalextra (and add it toall) so users can opt into the AST extractor withuv tool install "graphifyy[pascal]". tree-sitter-pascal publishes prebuilt wheels for every platform (win/macOS/Linux), so unlike thedmextra it needs no C toolchain.On a mid-size Delphi codebase the AST path yields notably more accurate relationship edges than the regex fallback (calls and inherits both up ~25%). README extras table and uv.lock updated accordingly.