Skip to content

feat(extract): add Pascal/Delphi and Lazarus IDE support#781

Merged
safishamsi merged 5 commits into
Graphify-Labs:v7from
simeonbodurov:feat/pascal-delphi-support
May 9, 2026
Merged

feat(extract): add Pascal/Delphi and Lazarus IDE support#781
safishamsi merged 5 commits into
Graphify-Labs:v7from
simeonbodurov:feat/pascal-delphi-support

Conversation

@simeonbodurov

Copy link
Copy Markdown
Contributor

Summary

Adds full knowledge-graph extraction for Pascal/Delphi codebases and
Lazarus IDE project files. Developed with Claude Sonnet 4.6.


1 — Pascal/Delphi extractor (extract_pascal)

New extractor for .pas, .pp, .dpr, .dpk, .inc files via
tree-sitter-pascal.

Extracted nodes: file, unit/program/library, class/interface/helper,
procedure/function implementations.

Extracted edges:

  • file → contains → module
  • module → imports → dependency (uses clause, resolved to path-based IDs)
  • class → inherits → base class / interface
  • class/module → contains/method → procedure or function
  • procedure → calls → procedure (in-file call resolution, including bare
    Reset;-style calls without parentheses)

Key design — import resolution: Uses clause targets are resolved to
path-based node IDs by scanning all Pascal files under the project root
(_pascal_project_root + _pascal_resolve_unit helpers with caching).
Without this, bare unit names like SysUtils resolve to IDs that never
match any file node, making the entire import graph invisible.

Optional dependency: pip install tree-sitter-pascal
If not installed, extract_pascal returns {"nodes":[], "edges":[], "error":...}
and the rest of the pipeline is unaffected.


2 — Inherits-edge fix

The original declType/typeref handler built inherits edge targets
with _make_id(_read(child)) — just the bare class name. But class nodes
use _make_id(stem, type_name), so targets never matched, making the
entire class hierarchy invisible in the graph.

Fix adds _pascal_class_stem_cache and _pascal_resolve_class(): strips
the conventional T/I prefix, locates the defining file by stem lookup
(same cache mechanism as _pascal_resolve_unit), and returns the correct
_make_id(file_stem, class_name) ID. RTL/unresolvable bases (e.g.
TObject) fall back to _make_id(bare_name) with an explicit stub node,
following the same pattern as the Python extractor.

Also removes the break that stopped after the first typeref, so all
parents of multi-inheritance declarations are captured.


3 — Lazarus form extractor (extract_lazarus_form, .lfm)

.lfm files are text-based UI component trees. The extractor parses
object Name: TClassName ... end blocks into a containment graph and
captures OnXxx = HandlerName event bindings as references edges
(context: "event") linking each component to its handler procedure.


4 — Lazarus package extractor (extract_lazarus_package, .lpk)

.lpk files are XML package definitions. The extractor reads the package
name, required package dependencies (→ imports edges), and listed unit
files (→ contains edges). Unit names are resolved to path-based node IDs
via _pascal_resolve_unit so they link to the same nodes produced by
extract_pascal on .pas files.


Files changed

File Change
graphify/extract.py extract_pascal, extract_lazarus_form, extract_lazarus_package, resolution helpers
graphify/detect.py Add .pas .pp .dpr .dpk .inc .lfm .lpk to CODE_EXTENSIONS
tests/test_pascal.py 28 new tests covering all three extractors
tests/fixtures/sample.pas Representative Delphi fixture
tests/fixtures/sample.lfm Lazarus form fixture
tests/fixtures/sample.lpk Lazarus package fixture

Prerequisites

pip install tree-sitter-pascal

Wheel for Windows (cp38 abi3): available at
https://github.com/Isopod/tree-sitter-pascal

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

@safishamsi safishamsi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR overall — follows extractor conventions, solid test coverage (28 tests), and the inherits-edge fix is a genuine improvement. Just one thing needed before merge.

Required: add .lpr extension

The PR title promises Lazarus IDE support but .lpr (the Lazarus program file — the entry point of every Lazarus project, identical Pascal syntax to .dpr) is missing from both CODE_EXTENSIONS in detect.py and _DISPATCH in extract.py. This is a one-line fix in each file, mapping .lpr to extract_pascal.

.lpi (Lazarus project info, XML format) can be skipped or added in a follow-up — that's a separate parser.

Minor (non-blocking)

  • rstrip("()") strips any combination of ( and ) chars rather than the literal suffix "()"removesuffix("()") would be more precise (though benign in practice since labels are always "Name()")
  • Module-level _pascal_unit_cache / _pascal_class_stem_cache persist across extract() calls in the same process — could leak stale data if files move between runs. Low priority but worth a note.

Once .lpr is added this is a clean merge.

Simeon Bodurov and others added 5 commits May 9, 2026 15:11
Adds full AST extraction for Pascal and Delphi source files using
tree-sitter-pascal (https://github.com/Isopod/tree-sitter-pascal).

Supported file extensions: .pas, .pp, .dpr, .dpk, .inc

Extracted nodes:
- File node (the .pas file itself)
- unit / program / library declarations
- class, interface, and helper type declarations
- procedure and function implementations

Extracted edges:
- file --contains--> module
- module --imports--> dependency (via uses clause, resolved to path-based IDs)
- class --inherits--> base class / interface
- class/module --contains/method--> procedure or function
- procedure --calls--> procedure (in-file call resolution)

Key design: uses clause targets are resolved to path-based node IDs by
scanning all Pascal files under the project root (_pascal_project_root +
_pascal_resolve_unit helpers). This avoids dangling import edges that
result from resolving bare unit names like "SysUtils" to IDs that never
match any file node.

Bare procedure calls (e.g. `Reset;` without parentheses) are detected
by inspecting statement nodes whose sole named child is an identifier,
in addition to the standard exprCall nodes used for calls with arguments.

Requires: pip install tree-sitter-pascal
(https://github.com/Isopod/tree-sitter-pascal)
If not installed, extract_pascal returns {"nodes":[], "edges":[], "error": ...}
so the rest of the pipeline is unaffected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds two new extractors for Lazarus IDE-specific file formats:

extract_lazarus_form() — .lfm (Lazarus Form files)
  .lfm files are text-based UI component trees. The extractor parses
  `object Name: TClassName ... end` blocks to build a containment graph
  of form components, and captures `OnXxx = HandlerName` event bindings
  as `references` edges (context: "event") linking each component to
  its handler procedure.

extract_lazarus_package() — .lpk (Lazarus Package files)
  .lpk files are XML package definitions. The extractor reads the
  package name, required package dependencies (→ imports edges), and
  listed unit files (→ contains edges). Unit names are resolved to
  path-based node IDs via _pascal_resolve_unit so they connect to the
  same nodes produced by extract_pascal on .pas files.

Both extensions added to CODE_EXTENSIONS in detect.py and to _DISPATCH.
13 new tests in test_pascal.py cover both extractors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…classes

The declType/typeref handler built inherits edge targets with
_make_id(_read(child)) — just the bare class name. But class nodes
use _make_id(stem, type_name), so targets never matched, making the
entire class hierarchy invisible in the graph.

Add _pascal_class_stem_cache and _pascal_resolve_class(): strips the
conventional T/I prefix, locates the defining file by stem lookup
(same cache mechanism as _pascal_resolve_unit), and returns the
correct _make_id(file_stem, class_name) ID. RTL/unresolvable bases
(e.g. TObject) fall back to _make_id(bare_name) with an explicit
stub node, following the same pattern as the Python extractor.

Also remove the `break` that stopped after the first typeref, so
all parents are captured (e.g. class(TBase, IInterface)).

Extend test_pascal_no_dangling_edges to also assert that within-file
edge targets (contains, method, inherits, calls) resolve to real nodes.
Adds extract_delphi_form() for Delphi Form files (.dfm), which use the
same `object Name: TClassName ... end` text syntax as Lazarus .lfm files.

Binary .dfm files (FF 0A magic header) are skipped gracefully with an
informative error message so the pipeline is unaffected.  Text .dfm files
are parsed identically to .lfm: component containment (`contains` edges)
and event handler references (`references`, context "event").

Adds .dfm to _DISPATCH and CODE_EXTENSIONS.
10 new tests in test_pascal.py, including a regression test for the
binary-format detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Address review feedback from @safishamsi:
- Add .lpr (Lazarus program file, identical syntax to .dpr) to _DISPATCH
  in extract.py and CODE_EXTENSIONS in detect.py so Lazarus project entry
  points are indexed. Completes the promised Lazarus IDE support.
- Replace rstrip("()") with removesuffix("()") in the call-resolution
  dict comprehension for precise suffix removal (rstrip strips individual
  characters, not the literal string "()").
- Add .lpr assertions to test_pascal_dispatch_registered and
  test_pascal_detect_extensions_registered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@simeonbodurov simeonbodurov force-pushed the feat/pascal-delphi-support branch from 7b7aa10 to a3f1e53 Compare May 9, 2026 12:12
@simeonbodurov simeonbodurov requested a review from safishamsi May 9, 2026 13:21
@safishamsi safishamsi merged commit 32bf8b4 into Graphify-Labs:v7 May 9, 2026
@safishamsi

Copy link
Copy Markdown
Collaborator

Merged - thank you for your first OSS contribution! Pascal/Delphi/Lazarus support is now in v7. Great to hear graphify is useful for navigating your Delphi project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants