Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions .claude/skills/add-cel-feature/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
name: add-cel-feature
description: Adds new CEL operator, function, macro, or comprehension support to cel2sql across all SQL dialects, touching cel2sql.go visit*, the dialect.Dialect interface, every dialect/<name>/dialect.go, and testcases/<category>_tests.go. Use when wiring a new CEL function (e.g., string.toLowerCase, timestamp.toISOString), a new comprehension form, a new operator overload, or extending the converter to recognise a CEL extension package.
---

# Add CEL Feature

Adding a new CEL surface area (function, operator, macro, comprehension) is the second-most-common contribution shape after adding a dialect (5+ feature PRs in the recent history: string-extension functions, comprehensions, multi-dim arrays, regex, get/setDayOfWeek). The work is the same shape every time — this skill captures it.

## Quick start

1. **Classify the feature** — function call, operator, macro (like `has()`), or comprehension form (like `all`/`exists`/`filter`/`map`).
2. **Identify the converter file** that owns the surface area — see [references/converter-file-map.md](references/converter-file-map.md).
3. **Decide whether a new `dialect.Dialect` method is needed** — see "New method or inline?" below.
4. **If a new method is needed**, walk [references/dialect-method-checklist.md](references/dialect-method-checklist.md) — the new method must land in `dialect/dialect.go` plus all six dialect packages.
5. **Add a test case** to the appropriate `testcases/<category>_tests.go` with a `WantSQL` entry for every dialect (use the gen-script from `add-sql-dialect` if outputs diverge significantly).
6. **Run** `python .claude/skills/add-sql-dialect/scripts/check_testcase_coverage.py <each-dialect>` to confirm coverage is still complete.

## New method or inline?

The `dialect.Dialect` interface is already large (~47 methods). Add a new method only when needed; otherwise inline the SQL in the visitor.

| Situation | Verdict |
|---|---|
| The SQL is identical across all six dialects | Inline in the visitor (e.g., `con.str.WriteString("MOD(")` for `%`). |
| Any dialect needs a different syntax | New `dialect.Dialect` method. |
| A subset of dialects don't support the feature at all | New method + each unsupported dialect returns `dialect.ErrUnsupportedFeature`. |
| The feature is JSON-related, regex-related, or array-related | Look for an existing `Write…` method first — JSON/array/regex method coverage is broad. |

If you add a method, the compile-time `var _ dialect.Dialect = (*Dialect)(nil)` assertion at the top of every dialect file will catch missing implementations. The build won't pass until all six dialects are updated.

## Where features live

For example: adding `string.toLowerCase()`. CEL parses this as a method call on a string value. The visitor entry is `visitCall` in `cel2sql.go`. CEL's standard string functions are dispatched by overload ID — see how `startsWith`/`endsWith`/`contains` are wired and follow the same pattern. The output SQL (`LOWER(expr)`) is identical across all dialects, so this is a candidate for inline conversion (no new `Dialect` method).

For another example: adding `timestamp.toUnixSeconds()`. The output diverges (`EXTRACT(EPOCH FROM e)::bigint` PG, `UNIX_TIMESTAMP(e)` MySQL, etc.) — so add a new `Dialect.WriteUnixSeconds(w, writeExpr)` method. `cel2sql.go` is generic; per-dialect SQL goes in `dialect/<name>/dialect.go`.

For the file-by-file map, see [references/converter-file-map.md](references/converter-file-map.md).

## CEL environment registration

For a new function, the CEL environment must declare it before the converter can compile expressions that use it. Check if it's already registered in:

- `testutil/env.go` `NewDefaultEnv` / `NewTimestampEnv` — for the test environments.
- The user's CEL environment (out of scope for this skill).

If you add a function only to the test environment, the user-facing API (the documentation) needs to mention the new function so consumers know to register it themselves.

## Testcase coverage

Add a representative test to `testcases/<category>_tests.go` (or the appropriate category file). The test must have a `WantSQL[dialect.<Name>]` entry for every dialect, or a `SkipDialect[dialect.<Name>]` reason for any that legitimately can't express the feature.

If the SQL output diverges significantly across dialects, use the `gen_expected_sql.go.tmpl` template from the `add-sql-dialect` skill to generate the actual outputs in bulk:

**Run** `cp .claude/skills/add-sql-dialect/templates/gen_expected_sql.go.tmpl testutil/gen_<dialect>_test.go` for each dialect, transcribe outputs, delete the generators.

After the test case is in place, **run** the coverage check for every dialect:

```
for d in postgresql mysql sqlite duckdb bigquery spark; do
python .claude/skills/add-sql-dialect/scripts/check_testcase_coverage.py "$d"
done
```

A new test case missing a single dialect's `WantSQL` entry silently skips for that dialect — easy regression to ship without this check.

## Resources

- [references/converter-file-map.md](references/converter-file-map.md) — which converter file owns which CEL surface area.
- [references/dialect-method-checklist.md](references/dialect-method-checklist.md) — checklist when adding a method to the `Dialect` interface.
- **Run** `python .claude/skills/add-sql-dialect/scripts/check_testcase_coverage.py <dialect>` — verifies test coverage; reuse the script across this skill and `add-sql-dialect`.
70 changes: 70 additions & 0 deletions .claude/skills/add-cel-feature/references/converter-file-map.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Converter File Map

Where each CEL surface area lives in the converter. Use this to find the visitor that needs updating when adding a feature.

## Contents

- Top-level dispatch
- Operators and special-cased operators
- Comprehensions
- JSON / JSONB
- Timestamps and durations
- Index analysis
- Where it doesn't go

## Top-level dispatch

| File | Owns |
|---|---|
| `cel2sql.go` | The `converter` struct, `Convert` / `ConvertParameterized` entry points, the `visit` dispatch (`visitCall`, `visitSelect`, `visitIdent`, `visitList`, `visitConst`, `visitStruct`, `visitComprehension`), parameter-placeholder writing, identifier/struct-field validation, recursion-depth and output-length guards. |

`visitCall` is the dispatch hub for nearly every CEL function and operator. New function support starts here, almost always by adding a new branch to the overload-ID switch (search for `overloads.Add`, `overloads.Contains`, etc., to see existing wiring).

## Operators and special-cased operators

| File | Owns |
|---|---|
| `operators.go` | `standardSQLBinaryOperators` map (CEL operator → SQL token), reverse lookup, helper predicates like `isStringLiteral`, `isNullLiteral`, `isBoolLiteral`, `isListType`, `isNumericComparison`, `isFieldAccessExpression`. |
| `cel2sql.go` `visitCallBinary` | Binary operator emission with special cases (`||` for string/list concat, `IS`/`IS NOT` for null/bool comparisons, JSON-text numeric coercion via `WriteCastToNumeric`, IN-list dispatch to `WriteArrayMembership` or `WriteJSONArrayMembership`). |

Adding a new operator overload (e.g., timestamp + duration) usually requires updates here.

## Comprehensions

| File | Owns |
|---|---|
| `comprehensions.go` | The `ComprehensionInfo` struct, `identifyComprehension`, comprehension-shape predicates. |
| `cel2sql.go` `visitComprehension`, `visitAllComprehension`, `visitExistsComprehension`, `visitExistsOneComprehension`, `visitFilterComprehension`, `visitMapComprehension`, `writeComprehensionSource` | Per-shape SQL emission. |

The comprehension source is dispatched to either `dialect.WriteUnnest` (native arrays) or `dialect.WriteJSONArrayElements` (JSON arrays).

## JSON / JSONB

| File | Owns |
|---|---|
| `json.go` | JSON-path detection (`shouldUseJSONPath`, `hasJSONFieldInChain`), JSON-variable handling (`isJSONVariable` for `WithJSONVariables`), `has()` macro emission, JSON path construction (`buildJSONPath`, `buildJSONPathInternal`, `buildJSONPathForArray`), JSON field name escaping. |

The detection chain is: schema-driven (via `isFieldJSON`) → JSON-variable (via `WithJSONVariables`) → fallback to chain-walk (`hasJSONFieldInChain`). Adding new JSON behaviour usually plugs into `shouldUseJSONPath` plus a new `dialect.Dialect` method.

## Timestamps and durations

| File | Owns |
|---|---|
| `timestamps.go` | Custom CEL types for `DATE` / `TIME` / `DATETIME` / `INTERVAL`, duration parsing, EXTRACT helpers, timestamp arithmetic dispatch. |

Adding a new timestamp accessor (e.g., `getQuarter()`) goes here. Day-of-week conversion is handled per-dialect via `Dialect.WriteExtract` (CEL convention: Sunday=0).

## Index analysis

| File | Owns |
|---|---|
| `analysis.go` | `AnalyzeQuery` entry point; pattern detection (`PatternComparison`, `PatternJSONAccess`, `PatternRegexMatch`, `PatternArrayMembership`, `PatternArrayComprehension`, `PatternJSONArrayComprehension`); pattern → DDL delegation to `dialect.IndexAdvisor`. |

Adding a new pattern type means: extend `analysis.go` to detect it, add a `PatternType` constant in `dialect/index_advisor.go`, then implement the new pattern in each dialect's `index_advisor.go`.

## Where it doesn't go

- Per-dialect SQL syntax → `dialect/<name>/dialect.go`. The converter is dialect-agnostic.
- Reserved keywords / identifier rules → `dialect/<name>/validation.go`.
- Regex pattern translation (RE2 → engine-native) → `dialect/<name>/regex.go`.
- Type provider / database-introspection → `<name>/provider.go`.
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Dialect Method Checklist

Steps when adding a new method to the `dialect.Dialect` interface. Skip the new-method route entirely if the SQL output is identical across all six dialects (just inline in the converter).

## Contents

- Step-by-step
- Compile-time safety net
- Naming conventions
- Error returns
- Worked example: WriteUnixSeconds

## Step-by-step

1. **Declare the method on the interface** in `dialect/dialect.go`. Group it with related methods (Literals / Operators / Type Casting / Arrays / JSON / Timestamps / String Functions / Comprehensions / Struct / Validation / Regex / Capabilities). Add a one-line doc comment naming the canonical PostgreSQL form ("For PostgreSQL: …") to anchor reviewers.
2. **Implement in all six dialect packages.** The canonical six are:
- `dialect/postgres/dialect.go`
- `dialect/mysql/dialect.go`
- `dialect/sqlite/dialect.go`
- `dialect/duckdb/dialect.go`
- `dialect/bigquery/dialect.go`
- `dialect/spark/dialect.go`
3. **Verify the compile-time assertion fires.** Each dialect file ends with `var _ dialect.Dialect = (*Dialect)(nil)` — running `go build ./...` will fail if any dialect is missing the new method.
4. **Wire the method into the converter** in `cel2sql.go` (or whichever visitor file owns the surface area — see the converter-file-map reference). The converter calls `con.dialect.<NewMethod>(...)`.
5. **Add a representative test case** to `testcases/<category>_tests.go`. Set `WantSQL[dialect.<Name>]` for every dialect or document a `SkipDialect[dialect.<Name>]` reason.
6. **Run the testcase coverage check** for every dialect:
```
for d in postgresql mysql sqlite duckdb bigquery spark; do
python .claude/skills/add-sql-dialect/scripts/check_testcase_coverage.py "$d"
done
```

## Compile-time safety net

The `var _ dialect.Dialect = (*Dialect)(nil)` line at the top of every dialect file is critical. It's the only mechanism that guarantees all dialects implement every method. If you add a new dialect, add this line first.

## Naming conventions

Methods on the `Dialect` interface that emit SQL fragments are prefixed `Write…`. They take a `*strings.Builder` (the converter's output buffer) plus callbacks `func() error` for any sub-expressions the dialect needs to interleave. They return `error` so dialects that don't support a feature can return `dialect.ErrUnsupportedFeature`.

Capability check methods are `Supports…` and return `bool`.

## Error returns

`dialect.ErrUnsupportedFeature` is the sentinel for "this dialect cannot express this CEL construct." Wrap it with `fmt.Errorf("%w: <reason>", dialect.ErrUnsupportedFeature, ...)` so callers can `errors.Is()` it.

Examples already in the codebase:
- `dialect/sqlite/dialect.go` `WriteRegexMatch` returns `ErrUnsupportedFeature` (SQLite has no native regex).
- `dialect/spark/dialect.go` `WriteArrayLength` returns `ErrUnsupportedFeature` for `dimension > 1` (Spark doesn't support multi-dim arrays).

## Worked example: WriteUnixSeconds

Suppose CEL's `int(timestamp)` should emit a per-dialect Unix-seconds expression.

1. **Interface** (`dialect/dialect.go`):
```go
// WriteEpochExtract writes extraction of epoch from a timestamp.
// For PostgreSQL: EXTRACT(EPOCH FROM expr)::bigint.
WriteEpochExtract(w *strings.Builder, writeExpr func() error) error
```
2. **PostgreSQL** (`dialect/postgres/dialect.go`):
```go
func (d *Dialect) WriteEpochExtract(w *strings.Builder, writeExpr func() error) error {
w.WriteString("EXTRACT(EPOCH FROM ")
if err := writeExpr(); err != nil { return err }
w.WriteString(")::bigint")
return nil
}
```
3. **MySQL**: `UNIX_TIMESTAMP(<expr>)`.
4. **SQLite**: `CAST(strftime('%s', <expr>) AS INTEGER)`.
5. **DuckDB**: `EXTRACT(EPOCH FROM <expr>)::BIGINT`.
6. **BigQuery**: `UNIX_SECONDS(<expr>)`.
7. **Spark**: `UNIX_TIMESTAMP(<expr>)`.

(All six already implement this method — it's a real example from the codebase.)

8. **Test case** in `testcases/cast_tests.go`:
```go
{
Name: "cast_int_epoch",
CELExpr: `int(created_at)`,
Category: CategoryCast,
WantSQL: map[dialect.Name]string{
dialect.PostgreSQL: "EXTRACT(EPOCH FROM created_at)::bigint",
dialect.MySQL: "UNIX_TIMESTAMP(created_at)",
dialect.SQLite: "CAST(strftime('%s', created_at) AS INTEGER)",
dialect.DuckDB: "EXTRACT(EPOCH FROM created_at)::BIGINT",
dialect.BigQuery: "UNIX_SECONDS(created_at)",
dialect.Spark: "UNIX_TIMESTAMP(created_at)",
},
},
```
Loading
Loading