Port upstream cel2sql options + bug fixes (v3.7.1 backport)#9
Port upstream cel2sql options + bug fixes (v3.7.1 backport)#9richardwooding merged 2 commits intomainfrom
Conversation
New ConvertOptions (mirrors upstream commit 92314f9): - withJsonVariables(String...) — flat JSONB columns emit ->> operators - withColumnAliases(Map<String,String>) — CEL identifier → SQL column rename (alias values validated against the dialect's identifier rules) - withParamStartIndex(int) — shift the placeholder counter so a generated CEL fragment can be embedded in a larger pre-parameterized query Security: - maxByteArrayLength = 10 000 cap on inline byte literals (CWE-400). Parameterized mode bypasses the check — bytes go straight to JDBC. CEL string function: - format() with %s/%d/%f support, max 1000-char format string. Postgres + BigQuery use FORMAT(); SQLite + DuckDB use printf(); MySQL throws explicitly (no portable printf-style equivalent). Bug fix: - BigQuery writeArrayLength / writeJSONArrayLength now wrap in COALESCE(..., 0) for NULL-array correctness, matching the other 4 dialects (mirrors upstream 1689adc). Docs: - README: new options doc, supported-functions table, resource-limits table - CLAUDE.md: "Differences from upstream cel2sql (Go)" subsection covering the items deliberately not ported (numeric-cast heuristic, sentinel errors, JDBC schema loaders) and format()'s dialect-specific support. 391 unit tests pass (370 existing + 21 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR backports upstream cel2sql v3.7.1 features and fixes into cel2sql4j, expanding conversion configurability (new ConvertOptions), adding support for CEL format(), tightening resource limits for inline bytes, and aligning BigQuery array-length null semantics with other dialects.
Changes:
- Add new
ConvertOptionsknobs: JSON-variable handling, column aliasing, and parameter index offsetting. - Implement CEL string
format()translation across dialects (or explicit failure for MySQL) plus related unit tests. - Fix BigQuery
ARRAY_LENGTH/JSON_QUERY_ARRAYlength behavior for NULL arrays viaCOALESCE(..., 0).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/test/java/com/spandigital/cel2sql/Cel2SqlOptionsTest.java | Adds coverage for new options, byte cap behavior, and format() translation/validation. |
| src/test/java/com/spandigital/cel2sql/Cel2SqlArrayTest.java | Updates expected BigQuery SQL to include COALESCE for null-safe length. |
| src/main/java/com/spandigital/cel2sql/dialect/sqlite/SqliteDialect.java | Implements writeFormat() via printf(). |
| src/main/java/com/spandigital/cel2sql/dialect/postgres/PostgresDialect.java | Implements writeFormat() via FORMAT() with %d/%f coercion to %s. |
| src/main/java/com/spandigital/cel2sql/dialect/mysql/MySqlDialect.java | Explicitly rejects format() as unsupported in MySQL. |
| src/main/java/com/spandigital/cel2sql/dialect/duckdb/DuckDbDialect.java | Implements writeFormat() via printf(). |
| src/main/java/com/spandigital/cel2sql/dialect/bigquery/BigQueryDialect.java | Adds COALESCE wrapping for array lengths and implements writeFormat() via FORMAT(). |
| src/main/java/com/spandigital/cel2sql/dialect/Dialect.java | Extends the Dialect interface with writeFormat(). |
| src/main/java/com/spandigital/cel2sql/Converter.java | Adds column aliasing, JSON-variable root detection, byte literal cap, param start index, and format() lowering. |
| src/main/java/com/spandigital/cel2sql/ConvertOptions.java | Adds new option storage/builders for JSON variables, column aliases, and parameter start index. |
| README.md | Documents new options, format() support, and resource limits. |
| CLAUDE.md | Documents deliberate divergences from upstream and the new options at a high level. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (call.args().isEmpty()) { | ||
| throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS, | ||
| "format() requires an argument list"); | ||
| } | ||
| argsExpr = call.args().get(0); | ||
| } else if (call.args().size() >= 2) { | ||
| formatExpr = call.args().get(0); | ||
| argsExpr = call.args().get(1); | ||
| } else { | ||
| throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS, | ||
| "format() requires a format string and arguments"); |
There was a problem hiding this comment.
callFormat accepts calls with extra arguments (e.g. non-member form with call.args().size() > 2, or member form with more than one arg) and silently ignores the extras by only reading args().get(0/1). This should be rejected with an invalid-arguments error to avoid surprising behavior and make placeholder-count validation meaningful.
| if (call.args().isEmpty()) { | |
| throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS, | |
| "format() requires an argument list"); | |
| } | |
| argsExpr = call.args().get(0); | |
| } else if (call.args().size() >= 2) { | |
| formatExpr = call.args().get(0); | |
| argsExpr = call.args().get(1); | |
| } else { | |
| throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS, | |
| "format() requires a format string and arguments"); | |
| if (call.args().size() != 1) { | |
| throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS, | |
| "format() requires exactly one argument list"); | |
| } | |
| argsExpr = call.args().get(0); | |
| } else if (call.args().size() == 2) { | |
| formatExpr = call.args().get(0); | |
| argsExpr = call.args().get(1); | |
| } else { | |
| throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS, | |
| "format() requires a format string and exactly one arguments list"); |
There was a problem hiding this comment.
Good catch — fixed in 76ba485. Member form now requires exactly one args list; free form requires exactly two args. Anything else is rejected with INVALID_ARGUMENTS.
| static final String FORMAT = "format"; | ||
|
|
||
| // Format strings are bounded to keep generated SQL small. Mirrors upstream cel2sql. | ||
| static final int MAX_FORMAT_STRING_LENGTH = 1000; | ||
| private static final Pattern FORMAT_SPECIFIER = Pattern.compile("%([sdf])"); |
There was a problem hiding this comment.
FORMAT_SPECIFIER is defined but never used. Either remove it (and any now-unneeded regex imports) or use it to implement specifier validation/counting to avoid dead code.
There was a problem hiding this comment.
Removed in 76ba485 — specifier validation is done inline in countAndValidateSpecifiers(), so the regex field was dead code.
| * | ||
| * @param w the output buffer | ||
| * @param formatSpec dialect-native format string (already validated and quoted as a literal) |
There was a problem hiding this comment.
The writeFormat Javadoc says formatSpec is "already ... quoted as a literal", but all current dialect implementations call writeStringLiteral(w, formatSpec) (so they expect the raw, unquoted string). Please fix the Javadoc (or change the contract/implementations) so callers and implementors don’t get misled.
| * | |
| * @param w the output buffer | |
| * @param formatSpec dialect-native format string (already validated and quoted as a literal) | |
| * Implementations should render {@code formatSpec} as a string literal using the dialect's | |
| * normal literal-escaping rules. | |
| * | |
| * @param w the output buffer | |
| * @param formatSpec dialect-native format string value, unquoted and already validated |
There was a problem hiding this comment.
Fixed the Javadoc in 76ba485 to say formatSpec is unquoted and that implementations should render it via writeStringLiteral (matching what the existing dialect impls already do).
| .withDialect(new MySqlDialect()) // SQL dialect (default: PostgreSQL) | ||
| .withSchemas(schemas) // Schema map for JSON field detection | ||
| .withJsonVariables("context", "tags") // Flat JSONB columns (use ->> instead of .) | ||
| .withColumnAliases(Map.of("name", "usr_name")) // CEL identifier → SQL column rename | ||
| .withParamStartIndex(5) // Embed in larger query: starts at $5 | ||
| .withMaxDepth(100) // Max AST recursion depth (default: 100) | ||
| .withMaxOutputLength(50000) // Max SQL output length (default: 50,000) | ||
| .withLogger(myLogger)); // SLF4J logger for debugging |
There was a problem hiding this comment.
This example configures new MySqlDialect() but the inline comments describe PostgreSQL-specific behavior (->> JSONB operator and $5-style placeholders). Either switch the example to new PostgresDialect() or adjust the comments/output to match MySQL (? placeholders and different JSON syntax).
There was a problem hiding this comment.
Switched the example to new PostgresDialect() in 76ba485 so the inline comments (->> JSONB operator, $5 placeholders) match the configured dialect.
- callFormat: reject calls with extra args. Member form requires exactly one args list; free form requires exactly two args. Previously the extras were silently ignored. - Drop unused FORMAT_SPECIFIER pattern field — specifier validation is done inline in countAndValidateSpecifiers(). - Fix Dialect.writeFormat Javadoc: formatSpec is unquoted (impls call writeStringLiteral); previous wording said "already quoted as a literal". - README: switch the configuration-options example to PostgresDialect so the inline comments (->> JSONB operator, $5 placeholders) match the configured dialect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#11) Three new project-local skills under .claude/skills/, each authored against the existing skill-authoring conventions and passing the lint script: - add-cel-function: walks adding a new CEL function to all 6 dialects (constant + dispatch + Dialect interface method + 6 impls + tests). Includes references/sql-writer-pattern.md and references/conversion-exception.md. - add-sql-dialect: walks adding a new SQL dialect (e.g. CockroachDB, Snowflake). Includes a scaffold_dialect.sh script that copies an existing dialect package and renames classes via sed, plus references/ dialect-method-checklist.md and references/test-files.md. - port-from-upstream: walks porting from the Go upstream cel2sql repo. Includes a list_upstream_changes.sh script that auto-detects the last port commit on cel2sql4j and lists candidate upstream commits, plus references/go-to-java-idioms.md and references/two-pr-shape.md. Also includes the previously-uncommitted skill-authoring skill that was staged on main (the foundation these new skills are authored against). While running the build-verification step on this PR, discovered that SparkDialect (PR #10) was merged on top of PR #9 without rebasing and silently lost the writeFormat method that PR #9 added to the Dialect interface — main currently does not compile. Fixed in this PR by adding SparkDialect.writeFormat using Spark's format_string() function (its printf-equivalent), and added a Spark case to Cel2SqlOptionsTest's formatTests. All 392 unit tests pass; ./gradlew build clean. Co-authored-by: Richard Wooding <richardwooding@Richards-Virtual-Machine.local> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Brings cel2sql4j up to feature parity with upstream
cel2sqlv3.7.1 for everything except the Spark dialect (which lands in a follow-up PR).New ConvertOptions (upstream 92314f9)
withJsonVariables(String...)— flat JSONB columns emit->>operatorswithColumnAliases(Map)— CEL identifier → SQL column rename, with alias values validated against the dialect's identifier ruleswithParamStartIndex(int)— shift placeholder counter so a CEL fragment can be embedded in a larger pre-parameterized query (clamped to ≥1)Security
MAX_BYTE_ARRAY_LENGTH = 10 000cap on inline byte literals (CWE-400 — hex expansion). Parameterized mode bypasses the check since bytes go straight to JDBC.CEL
format()string functionFORMAT()printf()%s,%d,%fonly; max 1000-char format string; arg count must match placeholder countBug fix
writeArrayLength/writeJSONArrayLengthwrap inCOALESCE(..., 0)for NULL-array correctness, matching the other 4 dialects (upstream 1689adc).Docs
Verified
Cel2SqlOptionsTest)./gradlew buildcleanOut of scope
getDayOfWeekmodulo fix — already correct in cel2sql4j (verifiedCel2SqlTimestampTest:130)EXTRACT(... AT TIME ZONE ...)— already correct in all dialectsARRAY_LENGTHCOALESCE wrap — already done in PG/MySQL/SQLite/DuckDB; only BigQuery was missing (fixed here)Test plan
./gradlew test— all 391 unit tests pass./gradlew buildclean🤖 Generated with Claude Code