Skip to content

[#9504] feat(flink): support generic table for Gravitino flink connector#9689

Merged
jerryshao merged 6 commits intoapache:mainfrom
FANNG1:generic_table
Feb 4, 2026
Merged

[#9504] feat(flink): support generic table for Gravitino flink connector#9689
jerryshao merged 6 commits intoapache:mainfrom
FANNG1:generic_table

Conversation

@FANNG1
Copy link
Contributor

@FANNG1 FANNG1 commented Jan 12, 2026

What changes were proposed in this pull request?

support generic table in Gravitino Flink connector,

  1. Create table path (flink-connector/flink/.../catalog/BaseCatalog.java +
    flink-connector/flink/.../hive/HiveSchemaAndTablePropertiesConverter.java)

    • Detect Hive vs generic by mirroring Flink:
      • Hive table if connector=hive
      • Otherwise generic
    • For generic tables:
      • Serialize ResolvedCatalogTable with CatalogPropertiesUtil.serializeCatalogTable(...).
      • Mask with flink. prefix (same as HiveTableUtil.maskFlinkProperties(...)).
      • Preserve connector / connector.type options in the masked properties.
      • Add is_generic=true only when Flink would add it (no connector keys).
      • Store these properties into Gravitino table properties.
      • Schema handling: Gravitino validates schema presence. If we must keep HMS schema empty
        for generic tables, introduce a generic-table exception in the Gravitino Hive catalog write
        path or allow empty columns when is_generic=true or flink.* properties indicate generic.
    • For Hive tables:
      • Keep the current flow in HiveSchemaAndTablePropertiesConverter:
        normalize serde/storage properties and set connector=hive in Flink options.
  2. Load table path (flink-connector/flink/.../catalog/BaseCatalog.java +
    flink-connector/flink/.../hive/HiveSchemaAndTablePropertiesConverter.java)

    • Detect generic tables using the same rule as HiveCatalog.isHiveTable(Table):
      • If is_generic exists: isHiveTable = !Boolean.parseBoolean(is_generic)
      • Else: isHiveTable = !has(flink.connector) && !has(flink.connector.type)
    • For generic tables:
      • Strip flink. prefix to build a Flink properties map.
      • Use CatalogPropertiesUtil.deserializeCatalogTable(...) to reconstruct schema and partition
        keys.
      • For managed tables where connector=ManagedTableFactory.DEFAULT_IDENTIFIER, remove the
        connector option (same as HiveCatalog.instantiateCatalogTable(...)).
    • For Hive tables:
      • Use the existing schema-from-columns and table-properties logic.
  3. Alter table path (Table changes in BaseCatalog and property conversion)

    • Load table first to determine if it is generic.
    • For generic tables, re-serialize the updated table schema and partition keys into flink.*
      properties and update Gravitino properties; avoid writing HMS columns.
    • For Hive tables, keep the current schema-alter behavior.

please refer more details in design doc: https://docs.google.com/document/d/1Nr09p1kkQ1pTmoLDs1tI2gpEIur8fhSFanmvEZkoG3c/edit?tab=t.0#heading=h.ad5fz2xu9563

Why are the changes needed?

Fix: #9504

Does this PR introduce any user-facing change?

Yes, If the user create a table without specifying connector, it will create a generic managed table not hive table

How was this patch tested?

adding tests and ITs
test locally Gravitino create generic JDBC table and native Flink connector could read it and vice versa.

@FANNG1 FANNG1 marked this pull request as draft January 12, 2026 11:08
@FANNG1 FANNG1 changed the title [SIP] feat(flink): support generic table for Gravitino flink connector [#9504] feat(flink): support generic table for Gravitino flink connector Jan 15, 2026
@FANNG1
Copy link
Contributor Author

FANNG1 commented Jan 15, 2026

Blocked by #9590

@FANNG1 FANNG1 marked this pull request as ready for review January 27, 2026 05:19
@FANNG1 FANNG1 requested a review from Copilot January 27, 2026 05:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for generic (non-Hive) tables in the Gravitino Flink connector to maintain compatibility with Flink's native Hive catalog behavior. The implementation enables Gravitino to properly handle tables created by native Flink clients that store schema information in table properties rather than as Hive schema.

Changes:

  • Introduces FlinkGenericTableUtil to detect, serialize, and deserialize generic tables
  • Overrides create/get/alter table methods in GravitinoHiveCatalog to handle generic tables separately from raw Hive tables
  • Updates HiveSchemaAndTablePropertiesConverter to set is_generic=false for Hive tables and validate connector types

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
FlinkGenericTableUtil.java New utility class for detecting generic vs Hive tables and handling property serialization/deserialization with flink.* prefix
GravitinoHiveCatalog.java Overrides table operations to route generic tables to property-based storage and Hive tables to schema-based storage
HiveSchemaAndTablePropertiesConverter.java Adds connector validation and sets is_generic=false flag for Hive tables
BaseCatalog.java Minor refactoring to compute indices earlier (no functional change)
FlinkHiveCatalogIT.java Comprehensive integration tests verifying bidirectional compatibility between Gravitino and native Flink for both generic and Hive tables
TestFlinkGenericTableUtil.java Unit tests for generic table detection and property conversion logic
TestHiveCatalogOperations.java Test for creating generic tables with empty columns
TestDatabaseName.java Adds test database enum entry
flink-catalog-hive.md Documents generic table support and requirement to specify connector=hive for raw Hive tables

@FANNG1
Copy link
Contributor Author

FANNG1 commented Jan 27, 2026

@jerryshao PTAL

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Hive schema.

:::note
You must set `connector=hive` explicitly when creating a raw Hive table. Otherwise, the table is created as a managed generic table. The managed table is not recommended to use and is deprecated in Flink.
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "The managed table is not recommended to use and is deprecated in Flink," but this statement lacks clarity and context. It would be more helpful to:

  1. Specify which version of Flink deprecated managed tables
  2. Clarify what users should use instead
  3. Add a reference/link to the relevant Flink documentation about this deprecation

This helps users understand the implications and make informed decisions.

Suggested change
You must set `connector=hive` explicitly when creating a raw Hive table. Otherwise, the table is created as a managed generic table. The managed table is not recommended to use and is deprecated in Flink.
You must set `connector=hive` explicitly when creating a raw Hive table. Otherwise, the table is created as a managed generic table. Starting from Apache Flink 1.15, managed generic tables in the Hive catalog are deprecated and should be avoided. Instead, use external generic tables (by specifying an explicit connector) or native Hive tables. For more details, see the Flink documentation on Hive generic tables: https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/hive/hive_catalog/#generic-tables

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the document, and managed table related API is deprecated in Flink 1.18

}
if (!(newTable instanceof ResolvedCatalogTable)) {
throw new CatalogException("Generic table must be a resolved catalog table");
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applyGenericTableAlter method ignores the tableChanges parameter in the second alterTable overload (line 180). While this appears intentional since generic tables re-serialize the entire schema into properties (line 219), it would be valuable to add a comment explaining this design decision.

For example:

// For generic tables, we re-serialize the entire table schema and partition keys
// into flink.* properties, so the individual tableChanges are not needed.
// The newTable parameter contains the final state after applying all changes.

This helps future maintainers understand why the parameter is unused and prevents potential confusion or incorrect modifications.

Suggested change
}
}
// For generic tables, we re-serialize the entire table schema and partition keys
// into flink.* properties, so the individual tableChanges are not needed.
// The newTable parameter contains the final state after applying all changes.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}
});

catalog().asTableCatalog().alterTable(identifier, changes.toArray(new TableChange[0]));
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applyGenericTableAlter method calls catalog().asTableCatalog().alterTable() at line 244 without wrapping it in a try-catch block. This is inconsistent with other methods in this class:

  • createTable (lines 117-137) wraps the call in try-catch and converts exceptions to appropriate Flink exceptions
  • getTable (lines 143-156) wraps the call in try-catch and converts exceptions
  • loadGravitinoTable (lines 199-210) wraps the call in try-catch

The alterTable() call could throw various exceptions (like NoSuchTableException, IllegalArgumentException, etc.) that should be converted to CatalogException or TableNotExistException for consistency with Flink's catalog API. Consider adding proper exception handling similar to other methods in this class.

Suggested change
catalog().asTableCatalog().alterTable(identifier, changes.toArray(new TableChange[0]));
try {
catalog().asTableCatalog().alterTable(identifier, changes.toArray(new TableChange[0]));
} catch (NoSuchTableException e) {
throw new CatalogException(e);
} catch (RuntimeException e) {
throw new CatalogException(e);
}

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines +77 to +80
if (catalogTable.getUnresolvedSchema().getColumns().isEmpty()) {
catalogTable =
CatalogPropertiesUtil.deserializeCatalogTable(flinkProperties, "generic.table.schema");
}
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback logic at lines 77-79 attempts to deserialize with a different schema prefix ("generic.table.schema") if the initial deserialization results in empty columns. However, there's no test coverage for this fallback path.

Consider adding a test case that:

  1. Creates properties that trigger this fallback condition
  2. Verifies the fallback deserialization works correctly
  3. Documents when this fallback is needed (e.g., for backward compatibility with specific Flink versions)

This ensures the fallback logic is tested and its purpose is clear to future maintainers.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the logic from Flink connector, seems no neccessary to add test for this.

jerryshao
jerryshao previously approved these changes Jan 30, 2026
@FANNG1
Copy link
Contributor Author

FANNG1 commented Feb 2, 2026

update PR to fix the comments from AI, @jerryshao PTAL, thx

@jerryshao
Copy link
Contributor

please fix the conflict.

@jerryshao jerryshao merged commit 7f2cbfc into apache:main Feb 4, 2026
26 checks passed
@FANNG1 FANNG1 deleted the generic_table branch February 4, 2026 06:38
bharos pushed a commit to bharos/gravitino that referenced this pull request Feb 4, 2026
…connector (apache#9689)

### What changes were proposed in this pull request?

support generic table in Gravitino Flink connector,

1. **Create table path**
(`flink-connector/flink/.../catalog/BaseCatalog.java` +

`flink-connector/flink/.../hive/HiveSchemaAndTablePropertiesConverter.java`)
   - Detect Hive vs generic by mirroring Flink:
     - Hive table if `connector=hive`
     - Otherwise generic
   - For generic tables:
- Serialize `ResolvedCatalogTable` with
`CatalogPropertiesUtil.serializeCatalogTable(...)`.
- Mask with `flink.` prefix (same as
`HiveTableUtil.maskFlinkProperties(...)`).
- Preserve `connector` / `connector.type` options in the masked
properties.
- Add `is_generic=true` only when Flink would add it (no connector
keys).
     - Store these properties into Gravitino table properties.
- **Schema handling:** Gravitino validates schema presence. If we must
keep HMS schema empty
for generic tables, introduce a generic-table exception in the Gravitino
Hive catalog write
path or allow empty columns when `is_generic=true` or `flink.*`
properties indicate generic.
   - For Hive tables:
     - Keep the current flow in `HiveSchemaAndTablePropertiesConverter`:
normalize serde/storage properties and set `connector=hive` in Flink
options.

2. **Load table path**
(`flink-connector/flink/.../catalog/BaseCatalog.java` +

`flink-connector/flink/.../hive/HiveSchemaAndTablePropertiesConverter.java`)
- Detect generic tables using the same rule as
`HiveCatalog.isHiveTable(Table)`:
- If `is_generic` exists: `isHiveTable =
!Boolean.parseBoolean(is_generic)`
- Else: `isHiveTable = !has(flink.connector) &&
!has(flink.connector.type)`
   - For generic tables:
     - Strip `flink.` prefix to build a Flink properties map.
- Use `CatalogPropertiesUtil.deserializeCatalogTable(...)` to
reconstruct schema and partition
       keys.
- For managed tables where
`connector=ManagedTableFactory.DEFAULT_IDENTIFIER`, remove the
`connector` option (same as `HiveCatalog.instantiateCatalogTable(...)`).
   - For Hive tables:
     - Use the existing schema-from-columns and table-properties logic.

3. **Alter table path** (Table changes in `BaseCatalog` and property
conversion)
   - Load table first to determine if it is generic.
- For generic tables, re-serialize the updated table schema and
partition keys into `flink.*`
properties and update Gravitino properties; avoid writing HMS columns.
   - For Hive tables, keep the current schema-alter behavior.

please refer more details in design doc:
https://docs.google.com/document/d/1Nr09p1kkQ1pTmoLDs1tI2gpEIur8fhSFanmvEZkoG3c/edit?tab=t.0#heading=h.ad5fz2xu9563

### Why are the changes needed?

Fix: apache#9504 

### Does this PR introduce _any_ user-facing change?

Yes, If the user create a table without specifying connector, it will
create a generic managed table not hive table

### How was this patch tested?
adding tests and ITs
test locally Gravitino create generic JDBC table and native Flink
connector could read it and vice versa.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Gravitino Flink connector could't read hive tables create by flink native client

3 participants