[FLINK-29585][hive] Migrate TableSchema to Schema for Hive connector #21522

Aitozi · 2022-12-18T07:00:17Z

What is the purpose of the change

This PR is meant to migrate the TableSchema to the Schema and ResolvedSchema. Most TableSchema have been moved out of the hive connector module. Only left some in the HiveCatalog related. I filed a discussion about the catalog's APIs regarding this.

Verifying this change

This change is a rework that should be covered by the existing tests.

flinkbot · 2022-12-18T07:06:29Z

CI report:

83f4fa9 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

Aitozi · 2022-12-18T10:53:17Z

cc @luoyuxia @wuchong please take a look when you are free

luoyuxia · 2022-12-19T09:50:06Z

@Aitozi Thanks for contribution. Migrate to new scheme is a good improvement and valuable for future development. I'll definitely have a look when I'm free.

luoyuxia

@Aitozi Thanks for contribution. I left some comments. PTAL.
Also, I find some import of TableSchema import org.apache.flink.table.api.TableSchema
in some test class like:
HiveDialectITCase/TableEnvHiveConnectorITCase/HiveInputFormatPartitionReaderITCase/HiveCatalogGenericMetadataTest/HiveCatalogHiveMetadataTest/HiveCatalogITCase/HiveCatalogTest.
Can they all be removed?

luoyuxia · 2023-01-30T02:11:50Z

...ctors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableSink.java

        String[] formatNames = new String[formatFieldCount];
        LogicalType[] formatTypes = new LogicalType[formatFieldCount];
        for (int i = 0; i < formatFieldCount; i++) {
-            formatNames[i] = tableSchema.getFieldName(i).get();
-            formatTypes[i] = tableSchema.getFieldDataType(i).get().getLogicalType();
+            formatNames[i] = resolvedSchema.getColumn(i).get().getName();


nit:
resolvedSchema.getColumnNames().get(i);

luoyuxia · 2023-01-30T02:12:10Z

...ctors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableSink.java

-            formatNames[i] = tableSchema.getFieldName(i).get();
-            formatTypes[i] = tableSchema.getFieldDataType(i).get().getLogicalType();
+            formatNames[i] = resolvedSchema.getColumn(i).get().getName();
+            formatTypes[i] = resolvedSchema.getColumn(i).get().getDataType().getLogicalType();


luoyuxia · 2023-01-30T02:13:37Z

...ctors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableSink.java

-                            formatConf, typeDescription.toString(), formatTypes));
+                            formatConf,
+                            typeDescription.toString(),
+                            formatType.getFields().stream()


use formatTypes?

...ors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableSource.java

luoyuxia · 2023-01-30T02:20:20Z

...tors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java

        // Partition keys
        List<String> partitionKeys = new ArrayList<>();
+        TableSchema tableSchema;


Can we also remove TableSchema in here?

luoyuxia · 2023-01-30T02:30:46Z

...ink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java

@@ -97,7 +102,7 @@ public class HiveTableUtil {

    private HiveTableUtil() {}

-    public static TableSchema createTableSchema(
+    public static ResolvedSchema createResolvedTableSchema(


Why is ResolvedSchema? As for as I'm concerned, it should be Schema.

luoyuxia · 2023-01-30T02:40:32Z

...java/org/apache/flink/table/planner/delegation/hive/copy/HiveParserBaseSemanticAnalyzer.java

@@ -2106,7 +2111,7 @@ public static CatalogBaseTable getCatalogBaseTable(
    public static class TableSpec {
        public ObjectIdentifier tableIdentifier;
        public String tableName;
-        public CatalogBaseTable table;
+        public ResolvedCatalogBaseTable<?> table;


Do we really need ResolvedCatalogBaseTable? What I mean is can we avoid to call the method catalogManager.resolveCatalogBaseTable? It's a internal method, we always want to avoid to call it.

I think we need the table to be resolved since we want validatePartColumnType on this, it requires the resolved type information.

luoyuxia · 2023-01-30T02:41:35Z

...ain/java/org/apache/flink/table/planner/delegation/hive/copy/HiveParserSemanticAnalyzer.java

-                        CatalogTable catalogTable =
-                                getCatalogTable(tableIdentifier.asSummaryString(), qb);
+                        ResolvedCatalogTable catalogTable =
+                                catalogManager.resolveCatalogTable(


dito. Can we avoid to call catalogManager.resolveCatalogTable ?

luoyuxia · 2023-01-30T02:47:30Z

...ink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java

        }
-        return builder.build();
+        return org.apache.flink.table.catalog.UniqueConstraint.primaryKey(
+                primaryKey.getName(), primaryKey.getColumns());
    }

    /** Create Hive columns from Flink TableSchema. */


Can this method be removed?

luoyuxia · 2023-01-30T02:49:31Z

...java/org/apache/flink/table/planner/delegation/hive/parse/HiveParserDDLSemanticAnalyzer.java

+                                oldTable.getComment(),
+                                oldTable.getPartitionKeys(),
+                                props),
+                        newSchema));
    }


Can the deprecated TableSchema be removed in method convertAlterTableChangeCol?

luoyuxia · 2023-01-30T02:56:19Z

@Aitozi FYI. About one year ago, I also review a pr of migrate TableSchema to Schema, fapaul@f8af0e9
but the author may not intended to finish it. you can have a look of it just for reference.

Aitozi · 2023-01-30T04:15:14Z

@luoyuxia thanks for your review, I will take a look and fix your comments

Aitozi · 2023-03-02T10:07:35Z

I'm revisiting this pr now.

Aitozi · 2023-03-09T09:48:53Z

Hi @luoyuxia , I have addressed your comments, please take a look again when you are free, thanks

Aitozi · 2023-03-09T10:24:08Z

@flinkbot run azure

luoyuxia

@Aitozi Thanks for updating. I left some comments again. PTAL. We're getting there.

luoyuxia · 2023-03-13T08:58:38Z

...s/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceBuilder.java

     */
-    public HiveSourceBuilder setProjectedFields(int[] projectedFields) {


Please don't change this method since it's a public interface. Also, I don't think we need to change it.

...s/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceBuilder.java

...tors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java

luoyuxia · 2023-03-13T09:23:02Z

...tors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java

        if (isHiveTable) {
-            pkConstraint = table.getSchema().getPrimaryKey().orElse(null);
+            // TODO replace the deprecated UniqueConstraint


It'll be better that we create a JIra to track the todo task,

https://issues.apache.org/jira/browse/FLINK-31426

luoyuxia · 2023-03-13T09:33:46Z

...tors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java

@@ -763,7 +783,7 @@ CatalogBaseTable instantiateCatalogTable(Table hiveTable) {
            tableSchemaProps.putProperties(properties);
            // try to get table schema with both new and old (1.10) key, in order to support tables
            // created in old version
-            tableSchema =
+            TableSchema tableSchema =


Can we remove TableSchema in here? So that TableSchema will be removed from our hive connector totally.

I think its doesn't matter. Since we still use the DescriptorProperties to se/de the schema to store/restore to the external meta store. So TableSchema actually still used in the hive connector system. It can be entirely removed after we can use the new way to se/de the schema. But I think we can improve it as a follow up. WDYT ?

CatalogPropertiesUtil is the alternative of DescriptorProperties. It should not be a major work to migrate.

Updated. Now import org.apache.flink.table.api.TableSchema has entirely removed from the hive connector. And the DescriptorProperties is migrate to CatalogPropertiesUtil

...rc/test/java/org/apache/flink/connectors/hive/read/HiveInputFormatPartitionReaderITCase.java

flink-table/flink-table-common/src/test/java/org/apache/flink/table/catalog/CatalogTest.java

luoyuxia · 2023-03-13T11:57:54Z

...able-planner/src/main/java/org/apache/flink/table/planner/utils/OperationConverterUtils.java

@@ -69,67 +69,75 @@ private OperationConverterUtils() {}
    public static Operation convertAddReplaceColumns(


Seems this method can be removed as well as else if (sqlAlterTable instanceof SqlAddReplaceColumns in SqlToOperationConverter.

Oh, I get it. SqlAddReplaceColumns is Hive dialect, and is not used now. Will remove it.

luoyuxia · 2023-03-13T12:03:25Z

...able-planner/src/main/java/org/apache/flink/table/planner/utils/OperationConverterUtils.java

@@ -157,12 +165,12 @@ public static Operation convertChangeColumn(
            // disallow changing partition columns
            throw new ValidationException("CHANGE COLUMN cannot be applied to partition columns");
        }
-        TableSchema oldSchema = catalogTable.getSchema();
+        ResolvedSchema oldSchema = catalogTable.getResolvedSchema();


dito:
Seems we can also remove the method convertChangeColumn as well.

luoyuxia · 2023-03-13T12:13:43Z

...ctor-hive/src/test/java/org/apache/flink/table/catalog/hive/HiveCatalogMetadataTestBase.java

+     * ResolvedExpression} back to its Unresolved state. This will enable direct comparison of the
+     * schema.
+     */
+    public static Schema fromResolvedSchema(ResolvedSchema resolvedSchema) {


Can't Schema.newBuilder().fromResolvedSchema(resolvedSchema).build() meet our need?

The ResolvedExpression in resolvedSchema is different from the Expression in the string format. For example, SqlCallExpression's format will have '[]'

Refactored by introducing a TestSchemaResolver to ease this compare.

Do you mean for method CatalogTestUtil#checkEquals(CatalogTable t1, CatalogTable t2), t1/t2 may be ResolvedCatalogTable or DefaultCatalogTable.
If it only happans in Hive, can we use method CatalogManagerMocks.createEmptyCatalogManager() .resolveCatalogTable()? So that, we won't need to add a new class TestSchemaResolver

If it only happans in Hive, can we use method CatalogManagerMocks.createEmptyCatalogManager() .resolveCatalogTable()? So that, we won't need to add a new class TestSchemaResolver.

Aitozi · 2023-03-13T16:54:10Z

Hi @luoyuxia , Most of your comments have be solved. PTAL again.

luoyuxia · 2023-03-15T10:11:21Z

...ink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveLookupTableSource.java

@@ -258,8 +256,12 @@ private TableFunction<RowData> getLookupFunction(int[] keys) {
                        jobConf,
                        hiveVersion,
                        tablePath,
-                        getTableSchema().getFieldDataTypes(),
-                        getTableSchema().getFieldNames(),
+                        DataType.getFieldDataTypes(


Will it be better using

catalogTable.getResolvedSchema().getColumnDataTypes() .toArray(new DataType[0]), catalogTable.getResolvedSchema().getColumnNames().toArray(new String[0])

I think we should exclude the computed column here. So I use DataType.getFieldDataTypes( catalogTable.getResolvedSchema().toPhysicalRowDataType()) .toArray(new DataType[0])

Make sense.

luoyuxia · 2023-03-15T10:20:57Z

...ctors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableSink.java

@@ -703,7 +704,7 @@ private CompactReader.Factory<RowData> createCompactReaderFactory(
                jobConf,
                catalogTable,
                hiveVersion,
-                (RowType) tableSchema.toRowDataType().getLogicalType(),
+                (RowType) resolvedSchema.toSinkRowDataType().getLogicalType(),


It should be resolvedSchema.toSourceRowDataType(), right?

Why ? I think in sink it should deal with the Column::isPersisted data type. So the compact reader factory should use toSinkRowDataType

Since it's for reader, should use resolvedSchema#toPhysicalRowDataType?

Make sense, updated.

...s/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveSourceBuilder.java

...ors/flink-connector-hive/src/main/java/org/apache/flink/connectors/hive/HiveTableSource.java

...nnector-hive/src/test/java/org/apache/flink/connectors/hive/TableEnvHiveConnectorITCase.java

...-table-api-java/src/test/java/org/apache/flink/table/catalog/GenericInMemoryCatalogTest.java

luoyuxia · 2023-03-15T12:02:37Z

...e/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogPropertiesUtil.java

@@ -472,7 +510,8 @@ private static int getCount(Map<String, String> map, String key, String suffix)
        final String escapedSeparator = Pattern.quote(SEPARATOR);
        final Pattern pattern =
                Pattern.compile(
-                        escapedKey
+                        "^"


Why change this?

Before, for the key generic.schema.1.name and schema.1.name will all pass this pattern, then the column count will misleading the key extractor. It actually expect the ^ match here. So for the generic.schema.1.name it will return 0. Then, we can use the fallback key to get.

...e/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogPropertiesUtil.java

luoyuxia · 2023-03-15T12:12:03Z

...ctor-hive/src/test/java/org/apache/flink/table/catalog/hive/HiveCatalogMetadataTestBase.java

+     * ResolvedExpression} back to its Unresolved state. This will enable direct comparison of the
+     * schema.
+     */
+    public static Schema fromResolvedSchema(ResolvedSchema resolvedSchema) {


Do you mean for method CatalogTestUtil#checkEquals(CatalogTable t1, CatalogTable t2), t1/t2 may be ResolvedCatalogTable or DefaultCatalogTable.
If it only happans in Hive, can we use method CatalogManagerMocks.createEmptyCatalogManager() .resolveCatalogTable()? So that, we won't need to add a new class TestSchemaResolver

luoyuxia · 2023-03-15T12:31:11Z

@Aitozi Thanks for updating. I left minor comments. PTAL. Should be ready to be merge in next iteratioin.
BTW, the test fails.

Aitozi · 2023-03-15T14:16:31Z

@luoyuxia Thanks for your detailed review. I have addressed your comments.

Do you mean for method CatalogTestUtil#checkEquals(CatalogTable t1, CatalogTable t2), t1/t2 may be ResolvedCatalogTable or DefaultCatalogTable.
If it only happans in Hive, can we use method CatalogManagerMocks.createEmptyCatalogManager() .resolveCatalogTable()? So that, we won't need to add a new class TestSchemaResolver

for this question: Yes, It's currently only need to resolve the schema for hive catalog. But in the CatalogTest it can not access to the CatalogManagerMocks. And hive catalog test should also run the CatalogTest's test, So in the CatalogTest module, we'd better have a test tool which can resolve the Schema to ResolvedSchema for compare.

Besides, CatalogManagerMocks can not resolve the expression actually, so the computed column and watermark spec can not be covered. The TestSchemaResolver will solve this and can served as a test harness when need resolve.

...able/flink-table-common/src/test/java/org/apache/flink/table/catalog/TestSchemaResolver.java

...ink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java

...able/flink-table-common/src/test/java/org/apache/flink/table/catalog/TestSchemaResolver.java

...k-table/flink-table-common/src/test/java/org/apache/flink/table/catalog/CatalogTestUtil.java

...e/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogPropertiesUtil.java

luoyuxia

@Aitozi Thanks for you patient. I left minor comments. PTAL.

luoyuxia

@Aitozi Thanks for your updating. left minor commets again. PTAL.
please remeber not to call method CatalogManager#resolveCatalogTable

...ink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java

...r-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/HiveParserDMLHelper.java

...ctor-hive/src/test/java/org/apache/flink/table/catalog/hive/HiveCatalogHiveMetadataTest.java

...e/flink-table-common/src/main/java/org/apache/flink/table/catalog/CatalogPropertiesUtil.java

...able/flink-table-common/src/test/java/org/apache/flink/table/catalog/TestSchemaResolver.java

...ink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/util/HiveTableUtil.java

...tors/flink-connector-hive/src/main/java/org/apache/flink/table/catalog/hive/HiveCatalog.java

luoyuxia · 2023-03-17T07:10:34Z

@Aitozi Still notice we are calling catalogManager#resolveCatalogTable in HiveParserSemanticAnalyzer/HiveParserBaseSemanticAnalyzer. Please remove them.

Aitozi · 2023-03-17T07:35:29Z

@Aitozi Still notice we are calling catalogManager#resolveCatalogTable in HiveParserSemanticAnalyzer/HiveParserBaseSemanticAnalyzer. Please remove them.

All removed, PTAL again

luoyuxia

@Aitozi Thanks for contribution. LGTM assuming test pass.
Could you please rebase master? I will merge later.

Aitozi · 2023-03-17T07:55:14Z

@luoyuxia Done. Very thanks for your patient review.

flinkbot added the component=Connectors/Hive label Dec 18, 2022

luoyuxia reviewed Jan 30, 2023

View reviewed changes

Aitozi force-pushed the hive-schema branch from 860fe43 to 51d639a Compare March 2, 2023 10:26

Aitozi marked this pull request as draft March 2, 2023 13:06

Aitozi force-pushed the hive-schema branch 3 times, most recently from b9ed21a to fc21901 Compare March 9, 2023 09:47

Aitozi marked this pull request as ready for review March 9, 2023 09:48

Aitozi force-pushed the hive-schema branch from fc21901 to a6d8d40 Compare March 9, 2023 10:01

luoyuxia reviewed Mar 13, 2023

View reviewed changes

Aitozi force-pushed the hive-schema branch from 5d00968 to 54f32de Compare March 13, 2023 16:40

Aitozi force-pushed the hive-schema branch from 2043e91 to ec226d8 Compare March 15, 2023 00:54

luoyuxia self-assigned this Mar 15, 2023

luoyuxia reviewed Mar 15, 2023

View reviewed changes

luoyuxia reviewed Mar 16, 2023

View reviewed changes

luoyuxia requested changes Mar 17, 2023

View reviewed changes

luoyuxia approved these changes Mar 17, 2023

View reviewed changes

Aitozi force-pushed the hive-schema branch from 7b402be to bd9e154 Compare March 17, 2023 07:53

[FLINK-29585][hive] Migrate TableSchema to Schema for Hive connector

83f4fa9

Aitozi force-pushed the hive-schema branch from bd9e154 to 83f4fa9 Compare March 20, 2023 03:34

luoyuxia merged commit 3d35048 into apache:master Mar 20, 2023

chucheng92 mentioned this pull request Mar 22, 2023

[FLINK-31574][table-planner] Cleanup unused private methods in OperationConverterUtils #22240

Merged

		*/
		public HiveSourceBuilder setProjectedFields(int[] projectedFields) {

		@@ -69,67 +69,75 @@ private OperationConverterUtils() {}
		public static Operation convertAddReplaceColumns(

[FLINK-29585][hive] Migrate TableSchema to Schema for Hive connector #21522

[FLINK-29585][hive] Migrate TableSchema to Schema for Hive connector #21522

Conversation

Aitozi commented Dec 18, 2022

What is the purpose of the change

Verifying this change

flinkbot commented Dec 18, 2022 • edited

CI report:

Aitozi commented Dec 18, 2022

luoyuxia commented Dec 19, 2022

luoyuxia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luoyuxia commented Jan 30, 2023

Aitozi commented Jan 30, 2023

Aitozi commented Mar 2, 2023

Aitozi commented Mar 9, 2023

Aitozi commented Mar 9, 2023

luoyuxia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luoyuxia Mar 15, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Aitozi commented Mar 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luoyuxia Mar 15, 2023 • edited

Choose a reason for hiding this comment

luoyuxia commented Mar 15, 2023 • edited

Aitozi commented Mar 15, 2023

luoyuxia left a comment

Choose a reason for hiding this comment

luoyuxia left a comment

Choose a reason for hiding this comment

luoyuxia commented Mar 17, 2023

Aitozi commented Mar 17, 2023

luoyuxia left a comment

Choose a reason for hiding this comment

Aitozi commented Mar 17, 2023

flinkbot commented Dec 18, 2022 •

edited

luoyuxia Mar 15, 2023 •

edited

luoyuxia Mar 15, 2023 •

edited

luoyuxia commented Mar 15, 2023 •

edited