Check if allowing nullable for all datatypes in blob is too permissive.

### Task Description

**What needs to be done:**
In https://github.com/apache/hudi/pull/18540/changes#diff-3e2a24e519a0cf4b097131aa5ad08a41a9d16bebc85aa39cf168bc344e9a2e0dR285-R297, we made fields in blob type nullable, which may be too permissive to get tests to pass on all Spark engines, 3.3, 3.4, 3.5 and 4.0.

This workaround was to address: #18547.

Address @rahil-c's comment and ensure that we are not being too permissive and may allow for unintended user behaviours.


**Background of the fix**

RFC-100 declares the BLOB Avro record with three strictly non-null fields: type, reference.external_path, reference.managed. 

The contract is actually conditional ("required when its parent is present", data matters only when type='INLINE', reference.* only when type='OUT_OF_LINE'), but Avro can't say that, so it just says "non-null" everywhere.

When that strict declaration is reflected back into Spark via `toSqlType`, the Spark catalog type for payload ends up with **non-null** type, **reference.external_path**, **reference.managed**. 

Then on every write path we hit a chain of analyzer/resolver checks against user-supplied source structs:
1. `validateBlobStructure`: the old version compared dataType and nullable field-by-field. 

     The user's `INSERT INTO ... values (1, named_struct(... 'data', cast(X'010203' as binary), 'reference', cast(null as struct<...>)))` produces a source whose per-field nullability differs from the catalog's strict declaration, causing it to be rejected.

2. `TableOutputResolver` (Spark 3.4+): even if we skip the validator, `resolveOutputColumns` walks nested struct assignments and rejects nullable-source -> non-null-target narrowing. 

    User-supplied `named_struct` fields are nullable by default, so any assignment into the strict BLOB struct fails at analyzer time, before Hudi sees the write.

3. castIfNeeded -> Cast (used by `UpdateHoodieTableCommand` and `MergeIntoHoodieTableCommand`, Cast (a) strips custom Metadata (the hudi_type tag we use to recognize BLOB), and (b) on some Spark versions performs its own nullability-narrowing check via `Cast.canCast` on nested structs.

So the strict catalog-side BLOB type collides with every Spark write-path rewrite, on every DML verb, on multiple Spark versions, for a contract that was already a partial lie (because the conditional non-null can't be expressed)


As of now, the fix in #18540 is no risk as it strictly adheres to the on-disk RFC-100 contract. 

The physical Avro schema is not derived from the Spark type, the write path goes through `HoodieSchema.Blob.createBlob()` (called from `toHoodieTypeNested), which builds the canonical RFC-100 record fresh from RFC-100 definitions. 

So data on disk is still type `STRING NOT NULL`, `reference.external_path STRING NOT NULL`, `reference.managed BOOLEAN NOT NULL`.


**TLDR generated by Claude:**
The strict declaration was already conditional, and Spark's type system can't model "conditional non-null." Trying to keep the strict declaration on the Spark side made every write path fight Spark's nullability machinery for a guarantee that wasn't really enforceable there anyway. 

Pushing the non-null enforcement to the BLOB-aware physical writer (`createBlob()`) and presenting Spark with a uniformly-permissive type lets every generic write path (`INSERT`/`UPDATE`/`MERGE` on 3.3 / 3.4 / 3.5 / 4.0) pass through unchanged. 

The `validateBlobStructure` change to ignore nullability (`matchesStructure`) and the per-field `nullable = true` projection are two parts of the same idea: structural shape is the contract on the Spark side; nullability is enforced at the physical-write boundary.

**Why this task is needed:**


### Task Type

Code improvement/refactoring

### Related Issues

**Parent feature issue:** (if applicable )
**Related issues:**
NOTE: Use `Relationships` button to add parent/blocking issues after issue is created.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check if allowing nullable for all datatypes in blob is too permissive. #18601

Task Description

Task Type

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Check if allowing nullable for all datatypes in blob is too permissive. #18601

Description

Task Description

Task Type

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions