Skip to content

[hive] After Flink changed the table name when using hive metastore, the modified table could not be found.#1833

Merged
JingsongLi merged 4 commits intoapache:masterfrom
zhuangchong:fix-hive-table-rename
Aug 18, 2023
Merged

[hive] After Flink changed the table name when using hive metastore, the modified table could not be found.#1833
JingsongLi merged 4 commits intoapache:masterfrom
zhuangchong:fix-hive-table-rename

Conversation

@zhuangchong
Copy link
Copy Markdown
Contributor

@zhuangchong zhuangchong commented Aug 17, 2023

Purpose

Linked issue: close #1831

Tests

HiveCatalogITCaseBase#testRenameTable

API and Format

Documentation

Path fromPath = getDataTableLocation(fromTable);
Path toPath = getDataTableLocation(toTable);
try {
fileIO.rename(fromPath, toPath);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check fromPath existence first? Maybe hive will do this renaming before.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When hive renames a table, it only modifies the metadata and does not modify the value of the location url. Is it necessary to check fromPath and toPath here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#471
See original PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version of hive used by the user is 3.1.3. After the rename operation, the location of the new table has not changed, and it points to the original path.

465A537E-183E-4543-8F47-061B0E581173 B79371E6-AAB0-4fbb-8AD6-555E8818C8C3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see discussion in #471 .
Some Hive version maybe rename the directory to new. So we can add this check first.
Or can you confirm all Hive versions do not rename the directory?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are fixing this bug, so I know there is the problem.

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The birth of this bug is because the codes did not perform a minimum protection action, which I suggested in PR.

@zhuangchong zhuangchong requested a review from JingsongLi August 17, 2023 10:18
@zhangjun0x01
Copy link
Copy Markdown
Contributor

zhangjun0x01 commented Aug 17, 2023

I test again in hive3.1.3, it is correct. that is very strange

Flink SQL> CREATE CATALOG paimon_hive_catalog WITH (
>     'type' = 'paimon',
>     'metastore' = 'hive',
>     'uri' = 'thrift://localhost:9083',
>     'warehouse' = 'hdfs://localhost/user/hive/warehouse'
> );
> 
[INFO] Execute statement succeed.

Flink SQL> use catalog paimon_hive_catalog;
[INFO] Execute statement succeed.


Flink SQL> create database db6;
[INFO] Execute statement succeed.

Flink SQL> use db6;
[INFO] Execute statement succeed.

Flink SQL> create table t1(id int);
[INFO] Execute statement succeed.

Flink SQL> show tables;
+------------+
| table name |
+------------+
|         t1 |
+------------+
1 row in set

Flink SQL> alter table t1 rename to t2;
[INFO] Execute statement succeed.

Flink SQL> show tables;
+------------+
| table name |
+------------+
|         t2 |
+------------+
1 row in set

Flink SQL> show create table t2;
+------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                             result |
+------------------------------------------------------------------------------------------------------------------------------------+
| CREATE TABLE `paimon_hive_catalog`.`db6`.`t2` (
  `id` INT
) WITH (
  'path' = 'hdfs://localhost/user/hive/warehouse/db6.db/t2'
)
 |
+------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

zhangjun@zhangjundeMacBook-Air hive3 % hdfs dfs -ls -R hdfs://localhost/user/hive/warehouse/db6.db/
2023-08-17 21:54:25,083 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxr-xr-x   - zhangjun supergroup          0 2023-08-17 21:48 hdfs://localhost/user/hive/warehouse/db6.db/t2
drwxr-xr-x   - zhangjun supergroup          0 2023-08-17 21:48 hdfs://localhost/user/hive/warehouse/db6.db/t2/schema
-rw-r--r--   1 zhangjun supergroup        213 2023-08-17 21:48 hdfs://localhost/user/hive/warehouse/db6.db/t2/schema/schema-0
zhangjun@zhangjundeMacBook-Air hive3 % 

@zhangjun0x01
Copy link
Copy Markdown
Contributor

We need to confirm if the user is using object storage. I remember this situation occurred once when I was using s3.

In addition, I think we may need more user environment information to troubleshoot the issue

@zhangjun0x01
Copy link
Copy Markdown
Contributor

I test in s3 with hive catalog , the table could no be found after rename table


Flink SQL> show create table t1;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                         result |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CREATE TABLE `paimon_hive_catalog_dev`.`paimon_zhangjun`.`t1` (
  `id` INT
) COMMENT 'aa'
WITH (
  'path' = 's3://xxxxx/paimon/paimon_zhangjun.db/t1'
)
 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

Flink SQL> show tables;
+------------+
| table name |
+------------+
|         t1 |
+------------+
1 row in set

Flink SQL> alter table t1 rename to t2;
[INFO] Execute statement succeed.

Flink SQL> show tables;
Empty set

Flink SQL> 


@zhuangchong
Copy link
Copy Markdown
Contributor Author

zhuangchong commented Aug 18, 2023

Thanks @zhangjun0x01 @JingsongLi

I suggest that at the code layer, first execute the hive client alert_table method to rename, then check fromPath, if it exists, call fileIO rename, and update hive location. What do you think? @JingsongLi @zhangjun0x01

@zhangjun0x01
Copy link
Copy Markdown
Contributor

Thanks @zhangjun0x01 @JingsongLi

I suggest that at the code layer, first execute the hive client alert_table method to rename, then check fromPath, if it exists, call fileIO rename, and update hive location. What do you think? @JingsongLi @zhangjun0x01

I think it is ok, add a check to ensure that the location is correctly renamed

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Thanks @zhuangchong

@JingsongLi JingsongLi merged commit 69ad722 into apache:master Aug 18, 2023
@zhuangchong zhuangchong deleted the fix-hive-table-rename branch August 18, 2023 10:22
schnappi17 pushed a commit to schnappi17/flink-table-store that referenced this pull request Sep 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [hive] After modifying the table name when flink uses hive metastore, the modified table cannot be found

3 participants