feat: Hive parser hardening (STRUCT/DIV/OFFSET) + Matrix CTE collapse#1
Merged
Conversation
The Tables sub-mode of MatrixView previously surfaced WITH-clause CTE aliases as first-class rows/columns alongside physical tables, diluting the cross-script blueprint signal (typical scripts have 30%+ CTE noise). Add a Layers toggle (default ON) that: - Filters CTE rows/columns out of the rendered matrix. - Reconstructs physical->physical dependencies that previously only reached via CTE chains, using a forward BFS over `write` cells with the visited-CTE path captured in `viaCtes`. - Renders rebuilt edges with a dashed half-opacity arrow and a tooltip showing the CTE hop chain, distinct from direct edges. - Hides itself in Scripts sub-mode (no overlap with CTE keys). Worker payload gains `cteItemKeys: string[]`; collapse + metric recomputation happen on the main thread for instant toggle response. Refactor: extract `MatrixMetrics` and `computeMatrixMetrics` into matrixUtils as the single source of truth, eliminating a 49-line duplicate that previously lived in both the worker and MatrixView. Docs: clarify in CLAUDE.md and ui-change-protocol.md that the Cursor browser MCP (Playwright-based) reliably operates Radix DropdownMenu buttons, while the external agent-browser CLI is still unreliable. Tests: 5 new cases cover single-hop, multi-hop, direct-edge-priority, empty CTE set, and chain-isolation scenarios. 201/201 passing. Co-authored-by: Cursor <cursoragent@cursor.com>
…support Real-world Hive / Spark SQL uses `struct(field1 AS name1, ...)` (e.g. `collect_list(struct(...))`) but sqlparser-rs 0.61's `HiveDialect` inherits `supports_struct_literal()` = false from the base `Dialect` trait, so `STRUCT(a AS w)` fails to parse with "Expected: ), found: AS". Upstream BigQuery / Databricks / Generic dialects all override this to `true`; HiveDialect simply forgot to. Fix it by introducing a thin wrapper dialect (`crates/flowscope-core/src/dialect_ext/`) that composes the upstream `HiveDialect` and re-enables this feature, and wire it through `Dialect::to_sqlparser_dialect()`. Affects auditId=2479, 2482, 2571 in the conan SQL corpus (3 of 6 real parse failures resolved by this single change). Co-authored-by: Cursor <cursoragent@cursor.com>
Hive defines `a DIV b` as BIGINT integer division (see <https://cwiki.apache.org/confluence/display/hive/languagemanual+udf>), but sqlparser-rs 0.61 only wires up `DIV` parsing inside MySqlDialect via `Dialect::parse_infix`. HiveDialect doesn't override `parse_infix`, so `SELECT x DIV 1000 FROM t` fails with a generic parse error. Mirror the MySQL implementation in FlowscopeHiveDialect: parse `DIV` as an infix operator and lower it to the same `BinaryOperator::MyIntegerDivide` node MySQL uses, keeping downstream analyzer code dialect-agnostic. Affects auditId=2568 in the conan SQL corpus. Co-authored-by: Cursor <cursoragent@cursor.com>
Hive's SELECT grammar has no OFFSET clause (only LIMIT, see <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select>), so writing `CROSS JOIN (SELECT ...) offset` to name a derived table `offset` is perfectly legal Hive syntax. sqlparser-rs, however, keeps `OFFSET` in its global `RESERVED_FOR_TABLE_ALIAS` list and HiveDialect inherits that default, so the construct fails with "Expected: ), found: <next token>" at the join site. Override `Dialect::is_table_alias` in FlowscopeHiveDialect to remove only `OFFSET` from the reserved set; other reserved keywords (SELECT, FROM, WHERE, ...) are still rejected, so disambiguation of the surrounding grammar is preserved. A negative test (`SELECT * FROM t CROSS JOIN (...) select`) guards against accidental loosening. Affects auditId=2482 in the conan SQL corpus. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundles two independent workstreams that were developed on the same branch:
1. Hive SQL parser hardening (3 commits)
Fixes 5 of 6 real parse failures in the conan SQL corpus (auditId 2479, 2482, 2497*, 2568, 2571). Each root cause gets its own focused commit; one shared
FlowscopeHiveDialectwrapper is introduced undercrates/flowscope-core/src/dialect_ext/.HiveDialectinheritssupports_struct_literal() = false, soSTRUCT(a AS w, ...)(Hive/Spark named-field syntax) fails with "Expected: ), found: AS"FlowscopeHiveDialectwraps upstreamHiveDialectand enables struct literals; wired throughDialect::to_sqlparser_dialect()a DIV b(Hive integer division) is only wired into MySqlDialect'sparse_infixupstreamFlowscopeHiveDialect, lower to existingBinaryOperator::MyIntegerDivideso analyzer code stays dialect-agnosticOFFSETin globalRESERVED_FOR_TABLE_ALIAS, but Hive grammar has no OFFSET clause —CROSS JOIN (...) offsetis legal alias usageDialect::is_table_aliasinFlowscopeHiveDialectto remove onlyOFFSETfrom the reserved set; negative test guards against accidental loosening2. Matrix CTE collapse (1 commit, 2ec9324)
Tables sub-mode previously rendered WITH-clause CTE aliases as first-class rows/columns alongside physical tables, diluting the cross-script blueprint signal (~30%+ noise in typical scripts).
Layerstoggle (default ON) hides CTE rows/columns.writecells reconstructs physical→physical dependencies that previously only reached via CTE chains; rebuilt edges render with dashed half-opacity arrows + tooltip showing the CTE hop chain (visually distinct from direct edges).cteItemKeys: string[]; collapse + metric recomputation happen on main thread for instant toggle response.MatrixMetrics+computeMatrixMetricsextracted tomatrixUtilsas single source of truth, eliminating a 49-line duplicate that lived in both worker and MatrixView.CLAUDE.mdandui-change-protocol.mdupdated to clarify Cursor browser MCP (Playwright) reliably operates Radix DropdownMenu, while externalagent-browserCLI is still unreliable.Test plan
cargo test --workspacepasses (Hive parser changes)yarn workspace @pondpilot/flowscope-react test --run— 201/201 passing, includes 5 newcollapseCteFromMatrixcases (single-hop, multi-hop, direct-edge-priority, empty CTE set, chain-isolation)yarn workspace @pondpilot/flowscope-react lint && typecheckcleanrisk_level=medium, onlyMatrixView → UseLineageStoreandMatrixView → GetWorkerprocesses affected, both expectedMade with Cursor