Skip to content

External Hudi table is changed to Managed after enabling schema on read #14292

@linliu-code

Description

@linliu-code

Bug Description

Here’s a clean, GitHub-friendly version using Markdown code fences and collapsible details for better readability:

### Verifying Schema-on-Read Behavior in Apache Hudi

#### 1. Create a Hudi table (initially without schema-on-read)

```sql
CREATE TABLE IF NOT EXISTS trips_quickstart1 (
  ts BIGINT,
  uuid STRING,
  rider STRING,
  driver STRING,
  fare DOUBLE,
  city STRING
) USING HUDI
PARTITIONED BY (city)
LOCATION 's3a://<table-path>/trips_quickstart1'
TBLPROPERTIES (
  primaryKey = 'uuid',
  preCombineField = 'ts'
);

Initial table metadata:

DESCRIBE EXTENDED trips_quickstart1;
+--------------------+---------+-------+
|            col_name|data_type|comment|
+--------------------+---------+-------+
| _hoodie_commit_time|   string|  null|
|_hoodie_commit_seqno|   string|  null|
| _hoodie_record_key|   string|  null|
|_hoodie_partition...|   string|  null|
| _hoodie_file_name|   string|  null|
|                  ts|   bigint|  null|
|                uuid|   string|  null|
|               rider|   string|  null|
|              driver|   string|  null|
|                fare|   double|  null|
|                city|   string|  null|
+--------------------+---------+-------+

Table type: EXTERNAL
Provider:   hudi
Location:   s3a://<table-path>/trips_quickstart1

2. Enable schema-on-read

SET hoodie.schema.on.read.enable=true;
+-------------------------------+-----+
|                            key|value|
+-------------------------------+-----+
|hoodie.schema.on.read.enable   | true|
+-------------------------------+-----+

3. Re-check table metadata after enabling schema-on-read

DESCRIBE EXTENDED trips_quickstart1;
+--------------------+---------+-------+
|            col_name|data_type|comment|
+--------------------+---------+-------+
| _hoodie_commit_time|   string|  null|
|_hoodie_commit_seqno|   string|  null|
| _hoodie_record_key|   string|  null|
|_hoodie_partition...|   string|  null|
| _hoodie_file_name|   string|  null|
|                  ts|   bigint|  null|
|                uuid|   string|  null|
|               rider|   string|  null|
|              driver|   string|  null|
|                fare|   double|  null|
|                city|   string|  null|
+--------------------+---------+-------+

Table type: MANAGED   <-- Now reported as MANAGED when schema-on-read is enabled
Provider:   hudi

Key observation: With hoodie.schema.on.read.enable=true, Spark reports the Hudi table as MANAGED instead of EXTERNAL. This is expected behavior in recent Hudi versions when schema evolution/promotion features are active.


Paste the above directly into GitHub — it renders cleanly with proper syntax highlighting and collapsible sections if needed.

### Environment

**Hudi version:** 0.15.0
**Query engine:** Spark
**Relevant configs:** `hoodie.schema.on.read.enable`


### Logs and Stack Trace

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions