[BUG] HoodieFileGroupReaderBasedFileFormat drops mandatory partition columns from dataAvroSchema

### Bug Description

**What happened:**
`HoodieFileGroupReaderBasedFileFormat.buildReaderWithPartitionValues` constructs two Avro schemas side-by-side: `requestedSchema` (what to return to Spark) and `dataSchema` (what to read from parquet). It explicitly augments `requestedSchema` with any partition fields that are in `mandatoryFields`, but pipes the input `dataStructType` through unchanged. Spark's `dataStructType` excludes partition columns by convention, and `HoodieSchemaUtils.pruneDataSchema` only iterates over the fields of its second argument, so any mandatory partition field is silently dropped from the resulting `dataSchema`. The FileGroupReader then does not read the partition column from the parquet base file, and for CUSTOM mergers (e.g. `PostgresDebeziumAvroPayload`) the output converter writes `null` for that column via `HoodieInternalRowUtils.genUnsafeStructWriter`'s `setNullAt` fallback.

**What you expected:**
For untouched records in a file slice with both a base file and a log file, reading should return the correct partition-column values (matching what's physically stored in the base parquet).

**Steps to reproduce:**
All of the following must hold:
1. Table uses `CustomKeyGenerator` or `TimestampBasedKeyGenerator` (with partition fields not declared as timestamp types, for CustomKeyGenerator)
2. `hoodie.datasource.write.drop.partition.columns=false` (otherwise `mandatoryFields=[]` and the bug is latent)
3. MOR table; the file slice being read has both a base file AND a log file (forces the FileGroupReader path, not `readBaseFile`)
4. Merger is not projection-compatible (e.g. `PostgresDebeziumAvroPayload`)
5. Read the table via the Spark DataSource API

Under these conditions, the partition column reads back as `null` for every untouched row in the base+log file slice.

### Environment

**Hudi version:** master (reproduced on 1.1.x internal fork)
**Query engine:** Spark 3.5 (DataSource v2 / FileGroupReader path)
**Relevant configs:**
- `hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator`
- `hoodie.datasource.write.partitionpath.field=country:simple`
- `hoodie.datasource.write.drop.partition.columns=false`
- `hoodie.datasource.write.payload.class=org.apache.hudi.common.model.debezium.PostgresDebeziumAvroPayload`
- `hoodie.table.type=MERGE_ON_READ`
- `hoodie.index.type=GLOBAL_SIMPLE` + `hoodie.global.simple.index.update.partition.path=true`

### Logs and Stack Trace

No exception — silent data corruption. Symptom is partition column returning `null` for untouched records whose file slice has a log file.

### Root Cause

`HoodieFileGroupReaderBasedFileFormat.scala` around line 254 (master):

```scala
val requestedStructType = StructType(requiredSchema.fields ++ partitionSchema.fields.filter(f => mandatoryFields.contains(f.name)))
val requestedSchema = HoodieSchemaUtils.pruneDataSchema(schema, HoodieSchemaConversionUtils.convertStructTypeToHoodieSchema(requestedStructType, sanitizedTableName), exclusionFields)
val dataSchema     = HoodieSchemaUtils.pruneDataSchema(schema, HoodieSchemaConversionUtils.convertStructTypeToHoodieSchema(dataStructType,      sanitizedTableName), exclusionFields)
```

Notice `requestedStructType` is augmented with mandatory partition fields, but `dataStructType` is not. `pruneDataSchema` iterates over `dataStructType.fields`, so the mandatory partition field never makes it into `dataSchema`.

Regression introduced by #13711 ("Improve Logical Type Handling on Col Stats", Sep 2025), which added the `pruneDataSchema` wrapping but only on `requestedSchema`.

### Fix

Mirror `requestedStructType`'s construction: augment `dataStructType` with mandatory partition fields before pruning.

PR incoming.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] HoodieFileGroupReaderBasedFileFormat drops mandatory partition columns from dataAvroSchema #18568

Bug Description

Environment

Logs and Stack Trace

Root Cause

Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] HoodieFileGroupReaderBasedFileFormat drops mandatory partition columns from dataAvroSchema #18568

Description

Bug Description

Environment

Logs and Stack Trace

Root Cause

Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions