-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Open
Labels
area:schemaSchema evolution and data typesSchema evolution and data typestype:bugBug reports and fixesBug reports and fixestype:community-supportCommunity-relatedCommunity-related
Description
Bug Description
Here’s a clean, GitHub-friendly version using Markdown code fences and collapsible details for better readability:
### Verifying Schema-on-Read Behavior in Apache Hudi
#### 1. Create a Hudi table (initially without schema-on-read)
```sql
CREATE TABLE IF NOT EXISTS trips_quickstart1 (
ts BIGINT,
uuid STRING,
rider STRING,
driver STRING,
fare DOUBLE,
city STRING
) USING HUDI
PARTITIONED BY (city)
LOCATION 's3a://<table-path>/trips_quickstart1'
TBLPROPERTIES (
primaryKey = 'uuid',
preCombineField = 'ts'
);Initial table metadata:
DESCRIBE EXTENDED trips_quickstart1;+--------------------+---------+-------+
| col_name|data_type|comment|
+--------------------+---------+-------+
| _hoodie_commit_time| string| null|
|_hoodie_commit_seqno| string| null|
| _hoodie_record_key| string| null|
|_hoodie_partition...| string| null|
| _hoodie_file_name| string| null|
| ts| bigint| null|
| uuid| string| null|
| rider| string| null|
| driver| string| null|
| fare| double| null|
| city| string| null|
+--------------------+---------+-------+
Table type: EXTERNAL
Provider: hudi
Location: s3a://<table-path>/trips_quickstart1
2. Enable schema-on-read
SET hoodie.schema.on.read.enable=true;+-------------------------------+-----+
| key|value|
+-------------------------------+-----+
|hoodie.schema.on.read.enable | true|
+-------------------------------+-----+
3. Re-check table metadata after enabling schema-on-read
DESCRIBE EXTENDED trips_quickstart1;+--------------------+---------+-------+
| col_name|data_type|comment|
+--------------------+---------+-------+
| _hoodie_commit_time| string| null|
|_hoodie_commit_seqno| string| null|
| _hoodie_record_key| string| null|
|_hoodie_partition...| string| null|
| _hoodie_file_name| string| null|
| ts| bigint| null|
| uuid| string| null|
| rider| string| null|
| driver| string| null|
| fare| double| null|
| city| string| null|
+--------------------+---------+-------+
Table type: MANAGED <-- Now reported as MANAGED when schema-on-read is enabled
Provider: hudi
Key observation: With hoodie.schema.on.read.enable=true, Spark reports the Hudi table as MANAGED instead of EXTERNAL. This is expected behavior in recent Hudi versions when schema evolution/promotion features are active.
Paste the above directly into GitHub — it renders cleanly with proper syntax highlighting and collapsible sections if needed.
### Environment
**Hudi version:** 0.15.0
**Query engine:** Spark
**Relevant configs:** `hoodie.schema.on.read.enable`
### Logs and Stack Trace
_No response_
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:schemaSchema evolution and data typesSchema evolution and data typestype:bugBug reports and fixesBug reports and fixestype:community-supportCommunity-relatedCommunity-related