Skip to content

Port upstream cel2sql options + bug fixes (v3.7.1 backport)#9

Merged
richardwooding merged 2 commits intomainfrom
feat/port-upstream-options-and-fixes
Apr 27, 2026
Merged

Port upstream cel2sql options + bug fixes (v3.7.1 backport)#9
richardwooding merged 2 commits intomainfrom
feat/port-upstream-options-and-fixes

Conversation

@richardwooding
Copy link
Copy Markdown
Contributor

Summary

Brings cel2sql4j up to feature parity with upstream cel2sql v3.7.1 for everything except the Spark dialect (which lands in a follow-up PR).

New ConvertOptions (upstream 92314f9)

  • withJsonVariables(String...) — flat JSONB columns emit ->> operators
  • withColumnAliases(Map) — CEL identifier → SQL column rename, with alias values validated against the dialect's identifier rules
  • withParamStartIndex(int) — shift placeholder counter so a CEL fragment can be embedded in a larger pre-parameterized query (clamped to ≥1)

Security

  • MAX_BYTE_ARRAY_LENGTH = 10 000 cap on inline byte literals (CWE-400 — hex expansion). Parameterized mode bypasses the check since bytes go straight to JDBC.

CEL format() string function

  • Postgres + BigQuery → FORMAT()
  • SQLite + DuckDB → printf()
  • MySQL → throws explicitly (no portable printf-style equivalent)
  • %s, %d, %f only; max 1000-char format string; arg count must match placeholder count

Bug fix

  • BigQuery writeArrayLength / writeJSONArrayLength wrap in COALESCE(..., 0) for NULL-array correctness, matching the other 4 dialects (upstream 1689adc).

Docs

  • README: new options doc, supported-CEL-functions table, resource-limits table
  • CLAUDE.md: "Differences from upstream cel2sql (Go)" subsection documenting items deliberately not ported (numeric-cast heuristic, sentinel-error split, JDBC schema loaders)

Verified

  • All 391 unit tests pass (370 existing + 21 new in Cel2SqlOptionsTest)
  • ./gradlew build clean

Out of scope

  • Apache Spark dialect (~700-1000 LOC) — follow-up PR
  • getDayOfWeek modulo fix — already correct in cel2sql4j (verified Cel2SqlTimestampTest:130)
  • EXTRACT(... AT TIME ZONE ...) — already correct in all dialects
  • ARRAY_LENGTH COALESCE wrap — already done in PG/MySQL/SQLite/DuckDB; only BigQuery was missing (fixed here)

Test plan

  • ./gradlew test — all 391 unit tests pass
  • ./gradlew build clean
  • CI green on this branch
  • Integration tests pass on CI

🤖 Generated with Claude Code

New ConvertOptions (mirrors upstream commit 92314f9):
- withJsonVariables(String...) — flat JSONB columns emit ->> operators
- withColumnAliases(Map<String,String>) — CEL identifier → SQL column rename
  (alias values validated against the dialect's identifier rules)
- withParamStartIndex(int) — shift the placeholder counter so a generated
  CEL fragment can be embedded in a larger pre-parameterized query

Security:
- maxByteArrayLength = 10 000 cap on inline byte literals (CWE-400).
  Parameterized mode bypasses the check — bytes go straight to JDBC.

CEL string function:
- format() with %s/%d/%f support, max 1000-char format string.
  Postgres + BigQuery use FORMAT(); SQLite + DuckDB use printf();
  MySQL throws explicitly (no portable printf-style equivalent).

Bug fix:
- BigQuery writeArrayLength / writeJSONArrayLength now wrap in
  COALESCE(..., 0) for NULL-array correctness, matching the other 4
  dialects (mirrors upstream 1689adc).

Docs:
- README: new options doc, supported-functions table, resource-limits table
- CLAUDE.md: "Differences from upstream cel2sql (Go)" subsection covering
  the items deliberately not ported (numeric-cast heuristic, sentinel
  errors, JDBC schema loaders) and format()'s dialect-specific support.

391 unit tests pass (370 existing + 21 new).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR backports upstream cel2sql v3.7.1 features and fixes into cel2sql4j, expanding conversion configurability (new ConvertOptions), adding support for CEL format(), tightening resource limits for inline bytes, and aligning BigQuery array-length null semantics with other dialects.

Changes:

  • Add new ConvertOptions knobs: JSON-variable handling, column aliasing, and parameter index offsetting.
  • Implement CEL string format() translation across dialects (or explicit failure for MySQL) plus related unit tests.
  • Fix BigQuery ARRAY_LENGTH / JSON_QUERY_ARRAY length behavior for NULL arrays via COALESCE(..., 0).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/test/java/com/spandigital/cel2sql/Cel2SqlOptionsTest.java Adds coverage for new options, byte cap behavior, and format() translation/validation.
src/test/java/com/spandigital/cel2sql/Cel2SqlArrayTest.java Updates expected BigQuery SQL to include COALESCE for null-safe length.
src/main/java/com/spandigital/cel2sql/dialect/sqlite/SqliteDialect.java Implements writeFormat() via printf().
src/main/java/com/spandigital/cel2sql/dialect/postgres/PostgresDialect.java Implements writeFormat() via FORMAT() with %d/%f coercion to %s.
src/main/java/com/spandigital/cel2sql/dialect/mysql/MySqlDialect.java Explicitly rejects format() as unsupported in MySQL.
src/main/java/com/spandigital/cel2sql/dialect/duckdb/DuckDbDialect.java Implements writeFormat() via printf().
src/main/java/com/spandigital/cel2sql/dialect/bigquery/BigQueryDialect.java Adds COALESCE wrapping for array lengths and implements writeFormat() via FORMAT().
src/main/java/com/spandigital/cel2sql/dialect/Dialect.java Extends the Dialect interface with writeFormat().
src/main/java/com/spandigital/cel2sql/Converter.java Adds column aliasing, JSON-variable root detection, byte literal cap, param start index, and format() lowering.
src/main/java/com/spandigital/cel2sql/ConvertOptions.java Adds new option storage/builders for JSON variables, column aliases, and parameter start index.
README.md Documents new options, format() support, and resource limits.
CLAUDE.md Documents deliberate divergences from upstream and the new options at a high level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1697 to +1707
if (call.args().isEmpty()) {
throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS,
"format() requires an argument list");
}
argsExpr = call.args().get(0);
} else if (call.args().size() >= 2) {
formatExpr = call.args().get(0);
argsExpr = call.args().get(1);
} else {
throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS,
"format() requires a format string and arguments");
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callFormat accepts calls with extra arguments (e.g. non-member form with call.args().size() > 2, or member form with more than one arg) and silently ignores the extras by only reading args().get(0/1). This should be rejected with an invalid-arguments error to avoid surprising behavior and make placeholder-count validation meaningful.

Suggested change
if (call.args().isEmpty()) {
throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS,
"format() requires an argument list");
}
argsExpr = call.args().get(0);
} else if (call.args().size() >= 2) {
formatExpr = call.args().get(0);
argsExpr = call.args().get(1);
} else {
throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS,
"format() requires a format string and arguments");
if (call.args().size() != 1) {
throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS,
"format() requires exactly one argument list");
}
argsExpr = call.args().get(0);
} else if (call.args().size() == 2) {
formatExpr = call.args().get(0);
argsExpr = call.args().get(1);
} else {
throw new ConversionException(ErrorMessages.INVALID_ARGUMENTS,
"format() requires a format string and exactly one arguments list");

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 76ba485. Member form now requires exactly one args list; free form requires exactly two args. Anything else is rejected with INVALID_ARGUMENTS.

Comment on lines +126 to +130
static final String FORMAT = "format";

// Format strings are bounded to keep generated SQL small. Mirrors upstream cel2sql.
static final int MAX_FORMAT_STRING_LENGTH = 1000;
private static final Pattern FORMAT_SPECIFIER = Pattern.compile("%([sdf])");
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FORMAT_SPECIFIER is defined but never used. Either remove it (and any now-unneeded regex imports) or use it to implement specifier validation/counting to avoid dead code.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 76ba485 — specifier validation is done inline in countAndValidateSpecifiers(), so the regex field was dead code.

Comment on lines +137 to +139
*
* @param w the output buffer
* @param formatSpec dialect-native format string (already validated and quoted as a literal)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writeFormat Javadoc says formatSpec is "already ... quoted as a literal", but all current dialect implementations call writeStringLiteral(w, formatSpec) (so they expect the raw, unquoted string). Please fix the Javadoc (or change the contract/implementations) so callers and implementors don’t get misled.

Suggested change
*
* @param w the output buffer
* @param formatSpec dialect-native format string (already validated and quoted as a literal)
* Implementations should render {@code formatSpec} as a string literal using the dialect's
* normal literal-escaping rules.
*
* @param w the output buffer
* @param formatSpec dialect-native format string value, unquoted and already validated

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the Javadoc in 76ba485 to say formatSpec is unquoted and that implementations should render it via writeStringLiteral (matching what the existing dialect impls already do).

Comment thread README.md Outdated
Comment on lines +165 to +172
.withDialect(new MySqlDialect()) // SQL dialect (default: PostgreSQL)
.withSchemas(schemas) // Schema map for JSON field detection
.withJsonVariables("context", "tags") // Flat JSONB columns (use ->> instead of .)
.withColumnAliases(Map.of("name", "usr_name")) // CEL identifier → SQL column rename
.withParamStartIndex(5) // Embed in larger query: starts at $5
.withMaxDepth(100) // Max AST recursion depth (default: 100)
.withMaxOutputLength(50000) // Max SQL output length (default: 50,000)
.withLogger(myLogger)); // SLF4J logger for debugging
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example configures new MySqlDialect() but the inline comments describe PostgreSQL-specific behavior (->> JSONB operator and $5-style placeholders). Either switch the example to new PostgresDialect() or adjust the comments/output to match MySQL (? placeholders and different JSON syntax).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched the example to new PostgresDialect() in 76ba485 so the inline comments (->> JSONB operator, $5 placeholders) match the configured dialect.

- callFormat: reject calls with extra args. Member form requires exactly
  one args list; free form requires exactly two args. Previously the
  extras were silently ignored.
- Drop unused FORMAT_SPECIFIER pattern field — specifier validation is
  done inline in countAndValidateSpecifiers().
- Fix Dialect.writeFormat Javadoc: formatSpec is unquoted (impls call
  writeStringLiteral); previous wording said "already quoted as a literal".
- README: switch the configuration-options example to PostgresDialect so
  the inline comments (->> JSONB operator, $5 placeholders) match the
  configured dialect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@richardwooding richardwooding merged commit a5782a4 into main Apr 27, 2026
4 checks passed
richardwooding added a commit that referenced this pull request Apr 27, 2026
…#11)

Three new project-local skills under .claude/skills/, each authored against
the existing skill-authoring conventions and passing the lint script:

- add-cel-function: walks adding a new CEL function to all 6 dialects
  (constant + dispatch + Dialect interface method + 6 impls + tests).
  Includes references/sql-writer-pattern.md and references/conversion-exception.md.
- add-sql-dialect: walks adding a new SQL dialect (e.g. CockroachDB,
  Snowflake). Includes a scaffold_dialect.sh script that copies an
  existing dialect package and renames classes via sed, plus references/
  dialect-method-checklist.md and references/test-files.md.
- port-from-upstream: walks porting from the Go upstream cel2sql repo.
  Includes a list_upstream_changes.sh script that auto-detects the last
  port commit on cel2sql4j and lists candidate upstream commits, plus
  references/go-to-java-idioms.md and references/two-pr-shape.md.

Also includes the previously-uncommitted skill-authoring skill that was
staged on main (the foundation these new skills are authored against).

While running the build-verification step on this PR, discovered that
SparkDialect (PR #10) was merged on top of PR #9 without rebasing and
silently lost the writeFormat method that PR #9 added to the Dialect
interface — main currently does not compile. Fixed in this PR by
adding SparkDialect.writeFormat using Spark's format_string() function
(its printf-equivalent), and added a Spark case to Cel2SqlOptionsTest's
formatTests.

All 392 unit tests pass; ./gradlew build clean.

Co-authored-by: Richard Wooding <richardwooding@Richards-Virtual-Machine.local>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants