Flink FULL_DATA_TYPE parser#3305
Conversation
- Create simplified FlinkType model for UI purposes - Replace complex hierarchy with simple, focused interfaces - Add FlinkTypeKind enum (SCALAR, ROW, MAP, ARRAY, MULTISET) - Use CompoundFlinkType for ROW and MAP types with members array - Implement recursive descent parser with peek()/consume() operations - Parse all Flink type variants: scalars, parameterized, arrays, multisets - Support ROW types with backtick-quoted field names and optional comments - Support MAP types with exactly 2 members (key, value) - Minimal regex use - character-by-character parsing for clarity - Proper handling of nullability markers (NOT NULL, NULL)
Store the full element type structure in a `members` array instead of flattening the information. This allows full type hierarchy preservation for nested containers (ARRAY<ROW<>>, MULTISET<ARRAY<>>, etc). Changes: - Move `members?: FlinkType[]` from CompoundFlinkType to base FlinkType - Remove deprecated `areMembersNullable` field - ARRAY/MULTISET now return CompoundFlinkType with single-element members - Update type model docstrings to clarify ARRAY/MULTISET participation - Enhance tests to verify leaf nodes are scalar (NOT compound) and that nested container types can be cleanly discriminated Test coverage: - ARRAY<ROW<>>, ARRAY<MAP<>>, ARRAY<MULTISET<>> - MULTISET<ARRAY<>>, MULTISET<ROW<>> - ARRAY<ARRAY<>> (doubly nested containers) - All 62 parser tests passing Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
… unterminated comments Addresses Copilot PR review comments: - Add validation that closing backtick exists when parsing backtick-quoted ROW field names, throws clear error instead of silently consuming wrong character - Add validation in parseComment() to throw error if EOF reached without closing quote, preventing truncated comments These changes improve error reporting for malformed input and prevent confusing downstream errors from propagating. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Addresses two medium priority issues: 1. Fix O(n²) performance in consumeWhile() by using slice() instead of string concatenation in loop. Previously, concatenating characters one at a time caused quadratic time complexity for large inputs. 2. Add negative offset bounds checking in peekAt(). The bounds check now properly handles negative offsets by checking `idx < 0` in addition to the upper bound check. Also adds comprehensive test suite (17 tests) covering: - Negative offset edge cases (before input start) - consumeWhile() correctness and performance (including 10000-char stress test) - Position advancement after consuming - Multiple sequential consumeWhile() calls - Edge cases (empty input, trimming behavior) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Addresses low priority feedback from code review: 1. Fix docstring typo in flinkTypes.ts: 'TIME STAMP WITH TIME ZONE' → 'TIMESTAMP WITH TIME ZONE' for consistency with Flink SQL standards 2. Clean up tryConsume() lint rule disabling by replacing unused-var disabling for-of loop with simpler for-loop over keyword.length 3. Remove reference to non-existent REAL_WORLD_FULL_DATA_TYPE_EXAMPLES.md file in test describe block name (tests remain valid) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…e test coverage - Updated JSDoc comment to accurately describe what the function validates (checks for non-empty members array, validates SCALAR+members is invalid) - Added 7 new test cases covering all compound type kinds (ARRAY, ROW, MAP, MULTISET) - Added tests for scalar types without members and edge cases (empty members array) - Added test for nested compound type structures (ARRAY<ROW<>>) - All 84 flinkTypeParser tests passing, 3055 total unit tests passing This addresses Copilot comment ID 2861056430 by ensuring the documentation accurately reflects the implementation's actual behavior. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Fixes TypeScript compilation error TS2304: Cannot find name 'FlinkType' in src/parsers/flinkTypeParser.test.ts due to using FlinkType in type annotations for the new test cases. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…test -t Add support for running multiple test suites in a single pass by: - Detecting pipe-separated patterns (e.g., "pattern1|pattern2") and converting to regex OR syntax - Detecting regex syntax (brackets, parentheses, wildcards, etc.) - Using Mocha's grep() method for regex patterns (via GREP env var) - Preserving FGREP behavior for simple substring matching (backward compatible) Allows flexible test filtering: npx gulp test -t "parser|builder" # Multiple test suites npx gulp test -t "flinkType.*test$" # Regex patterns npx gulp test -t "async|integration" # Various patterns Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This pull request implements a recursive descent parser for Flink SQL data types, creating the foundation for displaying arbitrarily nested type structures in VS Code tree views. The parser handles all Flink type variants including scalars, parameterized types, arrays, multisets, ROW types with field names and comments, and MAP types.
Changes:
- Adds FlinkType model interfaces and enum for representing Flink SQL types in a simplified, UI-friendly structure
- Implements ParserState utility class providing character-level parsing primitives (peek/consume operations) with configurable delimiters
- Creates FlinkTypeParser using recursive descent parsing to handle all Flink type variants with proper nullability and comment support
- Enhances test infrastructure to support both exact string matching (FGREP) and regex patterns (GREP) for running specific tests
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/models/flinkTypes.ts | Defines FlinkType and CompoundFlinkType interfaces with type guard validation for representing Flink SQL types |
| src/parsers/parserState.ts | Implements reusable ParserState class with character-level parsing operations, delimiter matching, and identifier/keyword parsing |
| src/parsers/parserState.test.ts | Comprehensive unit tests for ParserState covering edge cases, error handling, and custom pattern support |
| src/parsers/flinkTypeParser.ts | Implements FlinkTypeParser using recursive descent to parse all Flink type variants with nullability and comment handling |
| src/parsers/flinkTypeParser.test.ts | Extensive test suite covering scalar types, nested structures, real-world examples, error cases, and degenerate inputs |
| src/testing.ts | Adds GREP support for regex-based test filtering alongside existing FGREP for exact string matching |
| Gulpfile.js | Implements regex pattern detection and conversion for test filtering with pipe-separated patterns |
- Fix GREP type safety in testing.ts: convert env var to RegExp with error handling - Update comment wording for FGREP to clarify fixed-string/substring behavior - Remove duplicate old comment in test file (line 489) - Update stale Playlist test comment that claimed parser limitations no longer exist - All test comments now accurately reflect current parser capabilities All 129 parser tests passing. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
- Remove unnecessary escape characters in regex pattern (Gulpfile.js:740) - Remove unused catch parameter (src/testing.ts:41) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Dave Shoup (shouples)
left a comment
There was a problem hiding this comment.
Good stuff! I only had a couple of minor questions, but shouldn't block merging - looking forward to seeing this in action in the Flink Database view 👏
testFilter is never reassigned after initialization, so it should be declared as const rather than let. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
54dba7b
Changes: - Created BaseFlinkType interface with common fields (kind, dataType, isFieldNullable, fieldName, comment) - ScalarFlinkType extends BaseFlinkType with kind=SCALAR, no members field - CompoundFlinkType extends BaseFlinkType with kind=ROW|ARRAY|MULTISET|MAP and required members - FlinkType is now a union type: ScalarFlinkType | CompoundFlinkType - Updated isCompoundFlinkType() type guard to simple kind check (no validation throws) Benefits: ✅ TypeScript prevents accessing .members on scalar types (compile-time safety) ✅ Non-optional members field guarantees non-empty for compounds (via construction) ✅ Type structure self-documents invariants ✅ DRY: common fields defined once in BaseFlinkType ✅ Zero runtime overhead ✅ Extractable for use in other TypeScript codebases Updated tests: - Removed tests that expected isCompoundFlinkType() to throw on invalid states - Added tests verifying type guard correctly discriminates all type kinds All 92 parser tests passing, TypeScript strict mode satisfied. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|




Summary of Changes
Implement a traditional LL(1) recursive descent parser for Flink
FULL_DATA_TYPEstrings:The parser is split into two layers, one general probably reusable recursive descent (peek(), consume(), ...) generic layer, and then a higher layer
FlinkTypeParserwhich uses the lower layer to parse its particular grammar. The entirety of the parser is hidden away by a simple exported functionparseFlinkType(full_data_type_string)which will return a single or a tree ofFlinkType.Implements a type-safe discriminated union:
FlinkTypewith two sub-types:ScalarFlinkType(no members) andCompoundFlinkType(required members), discriminated by eitherisCompoundFlinkType()type guard or just by checking if.kind === FlinkTypeKind.SCALARas the output nodes from the parser. A toplevel node with or without interior members will be returned.This is first of a two part effort. This creates the recursive descent parser core and a Flink
FULL_DATA_TYPE-centric parser using it that can then parse Flink FULL_DATA_TYPE strings into object/object trees.A followup branch will use those object/trees to make arbitrarily deep nested treeview items for the Flink table/view columns.
No UI / features changes present in this branch.
But while here, touched up Gulpfile.js to allow for regexes like "gulp test -t flinkTypeParser|ParserState' to be passed to
gulp testandtestRun.Be not dismayed, there's only ~500 LOC of new models + parser code. And then so many tests and corner case tests. But since the strings we'll be parsing should all be well formed since coming from the Flink INFORMATION_SCHEMA, the defensiveness of the parsing and tests is ... Very Thorough Coding.
Pull request checklist
Please check if your PR fulfills the following (if applicable):
Tests
Release notes