What problem does this solve?
codebase-memory-mcp currently has no PL/SQL support. Oracle source files
(.pks/.pkb/.fnc/.trg/...) aren't mapped to any language, so they're skipped
during discovery. If someone renames them to .sql, they fall through to the
generic DerekStride SQL grammar, which targets ANSI/PostgreSQL/MySQL and does
not model PL/SQL's procedural constructs — packages, package bodies, BEGIN/END
blocks, cursors, package-scoped procedures/functions, triggers.
The practical result: for enterprise Oracle codebases — which are often very
large and exactly where a structural knowledge graph pays off — the tool emits
no package/procedure/function nodes and no call edges. Engineers exploring a big
PL/SQL repo get nothing useful out of the graph.
Proposed solution
Add PL/SQL as a first-class structural language (CBM_LANG_PLSQL), following the
existing data-driven pattern — vendored tree-sitter grammar + a lang_specs row +
extension mappings + two small name-resolver special-cases in extract_defs.c —
exactly how C#/PHP/etc. are wired.
Scope (this PR): structural extraction only.
- packages / object types / triggers -> Class nodes
- procedures / functions (in packages and standalone) -> Function nodes
- ref_call -> CALLS edges
- if/case/loops/exception_handler -> branch (complexity)
- assignment_statement -> WRITES, raise_statement -> THROWS
Not in scope: Hybrid LSP semantic type resolution — can be a follow-up, like the
majority of supported languages today.
Grammar: AndreasMaierDe/tree-sitter-plsql (MIT, ABI 14, no external scanner).
Extensions: .pks .pkb .pck .pls .plb .plsql .fnc .trg .bdy .tps .tpb
(.sql stays generic SQL; .prc is left as-is since it already maps to FORM.)
Candidate public OSS test beds:
- utPLSQL/utPLSQL (large, well-maintained PL/SQL codebase)
- oracle/db-sample-schemas (Oracle's official sample schemas)
- mortenbra/alexandria-plsql-utils
- OraOpenSource/oos-utils
I already have a working prototype: builds clean, full extraction suite green
(199/199, incl. 2 new PL/SQL tests), and verified end-to-end by indexing a sample
package (emp_pkg/util_pkg -> Class, hire/salary -> Function). Happy to open the PR
once you confirm the approach and grammar choice.
Alternatives considered
-
iliasaz/tree-sitter-orasql — also covers Oracle SQL + PL/SQL, but its generated
parser.c is ~30MB vs ~9MB for AndreasMaierDe, which bloats the binary more.
Open to switching if you'd prefer its broader coverage.
-
Reusing the existing DerekStride SQL grammar (remap PL/SQL extensions to
CBM_LANG_SQL) — rejected: that grammar doesn't parse PL/SQL procedural syntax,
so packages/procedures/functions wouldn't be extracted.
-
Infra-pass pattern (like Dockerfile/K8s, no grammar) — N/A; PL/SQL needs real
AST parsing, not YAML-reuse heuristics.
Known limitation to flag up front: the chosen grammar is still maturing upstream —
some standalone DDL (e.g. CREATE TYPE ... AS OBJECT) currently produces ERROR
nodes. Package specs/bodies, procedures, functions and triggers parse cleanly.
Confirmations
What problem does this solve?
codebase-memory-mcp currently has no PL/SQL support. Oracle source files
(.pks/.pkb/.fnc/.trg/...) aren't mapped to any language, so they're skipped
during discovery. If someone renames them to .sql, they fall through to the
generic DerekStride SQL grammar, which targets ANSI/PostgreSQL/MySQL and does
not model PL/SQL's procedural constructs — packages, package bodies, BEGIN/END
blocks, cursors, package-scoped procedures/functions, triggers.
The practical result: for enterprise Oracle codebases — which are often very
large and exactly where a structural knowledge graph pays off — the tool emits
no package/procedure/function nodes and no call edges. Engineers exploring a big
PL/SQL repo get nothing useful out of the graph.
Proposed solution
Add PL/SQL as a first-class structural language (CBM_LANG_PLSQL), following the
existing data-driven pattern — vendored tree-sitter grammar + a lang_specs row +
extension mappings + two small name-resolver special-cases in extract_defs.c —
exactly how C#/PHP/etc. are wired.
Scope (this PR): structural extraction only.
Not in scope: Hybrid LSP semantic type resolution — can be a follow-up, like the
majority of supported languages today.
Grammar: AndreasMaierDe/tree-sitter-plsql (MIT, ABI 14, no external scanner).
Extensions: .pks .pkb .pck .pls .plb .plsql .fnc .trg .bdy .tps .tpb
(.sql stays generic SQL; .prc is left as-is since it already maps to FORM.)
Candidate public OSS test beds:
I already have a working prototype: builds clean, full extraction suite green
(199/199, incl. 2 new PL/SQL tests), and verified end-to-end by indexing a sample
package (emp_pkg/util_pkg -> Class, hire/salary -> Function). Happy to open the PR
once you confirm the approach and grammar choice.
Alternatives considered
iliasaz/tree-sitter-orasql — also covers Oracle SQL + PL/SQL, but its generated
parser.c is ~30MB vs ~9MB for AndreasMaierDe, which bloats the binary more.
Open to switching if you'd prefer its broader coverage.
Reusing the existing DerekStride SQL grammar (remap PL/SQL extensions to
CBM_LANG_SQL) — rejected: that grammar doesn't parse PL/SQL procedural syntax,
so packages/procedures/functions wouldn't be extracted.
Infra-pass pattern (like Dockerfile/K8s, no grammar) — N/A; PL/SQL needs real
AST parsing, not YAML-reuse heuristics.
Known limitation to flag up front: the chosen grammar is still maturing upstream —
some standalone DDL (e.g. CREATE TYPE ... AS OBJECT) currently produces ERROR
nodes. Package specs/bodies, procedures, functions and triggers parse cleanly.
Confirmations