read_blob() in filter predicate fails with INTERNAL_ERROR instead of clean analyzer rejection

**What happened:**
Using `read_blob()` inside a WHERE predicate fails with:

```
[INTERNAL_ERROR] Cannot generate code for expression: read_blob(...)
```

Example query:
```sql
SELECT id FROM t WHERE length(read_blob(image_bytes)) = 11;
```

`read_blob()` works correctly in the SELECT list — only filter predicates trigger the codegen failure.

**What you expected:**
Two things:
1. The codegen restriction should surface as an analyzer-level rejection with a clear "read_blob() is not supported in filter predicates" message, not an INTERNAL_ERROR with a Spark codegen stack trace.
2. Docs (AI quick start) should call out the recommended workaround: for length-based filtering, filter on the BLOB struct's `.length` subfield from the meta columns (e.g. `WHERE image_bytes.length = 11`) rather than wrapping `read_blob()` in `length(...)`. Typical usage is vector search or filtering on structured columns; pulling raw bytes through codegen in a predicate is not a supported path.

**Steps to reproduce:**
1. Use 1.2.0-rc2 Spark bundle.
2. Create a table with a BLOB column `image_bytes` and insert rows.
3. Run: `SELECT id FROM t WHERE length(read_blob(image_bytes)) = 11`.
4. Observe INTERNAL_ERROR.

**Environment:**
- Hudi version: 1.2.0-rc2
- Query engine: Spark 3.5
- Found during: 1.2.0-rc2



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_blob() in filter predicate fails with INTERNAL_ERROR instead of clean analyzer rejection #18820

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

read_blob() in filter predicate fails with INTERNAL_ERROR instead of clean analyzer rejection #18820

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions