[#4089] fix(hive catalog): the problem of slow acquisition of hive table list#4469
[#4089] fix(hive catalog): the problem of slow acquisition of hive table list#4469jerryshao merged 8 commits intoapache:mainfrom
Conversation
| if (!listAllTables) { | ||
| String filter = | ||
| String.format( | ||
| "%stable_type = \"ICEBERG\"", hive_metastoreConstants.HIVE_FILTER_FIELD_PARAMS); |
There was a problem hiding this comment.
Filtering out only the Iceberg table seems to be insufficient, and we currently also do not support view type.
So I think you should use show managed type table when listAllTables=false
There was a problem hiding this comment.
If we do not filter the view, what impact will it have on Gravitino?
There was a problem hiding this comment.
If we do not filter the view, what impact will it have on Gravitino?
I have tested it.The view table ,don‘t have any impact ,so i think we can do like this.
There was a problem hiding this comment.
I add some comments ,please review again
|
Hi, @mygrsun could you plz resolve the comment and CI issue? Thanks |
|
Now,the ci issue is a troublesome problem. https://github.com/apache/gravitino/actions/runs/10452698362/job/28942732918?pr=4469 This hive bug is only happen in Derby. In our environment , the metastore storage is mysql,so we don't encounter this problem. |
|
@mygrsun Is there any progress? May I take on this? |
thanks,you can do it |
a8e877d to
ebc4b17
Compare
|
I have verified locally that through this PR, the time consumption of listing 1000 tables can be reduced from 2043ms to 14ms. It's ready for review now. @jerryshao |
| }) | ||
| .map(tb -> NameIdentifier.of(namespace, tb.getTableName())) | ||
| .toArray(NameIdentifier[]::new)); | ||
| if (!listAllTables) { |
There was a problem hiding this comment.
Based on the source codes of Paimon and tested locally, I updated the PR and use table param table_type=paimon to filter out Paimon tables.
| clientPool.run( | ||
| c -> | ||
| c.listTableNamesByFilter( | ||
| schemaIdent.name(), icebergAndPaimonFilter, (short) -1)); |
There was a problem hiding this comment.
What's the meaning of (short) -1 here? Can you define a constant and add a comment on it?
…ble list (#4469) ### What changes were proposed in this pull request? the problem of slow acquisition of hive table list. Using listTableNamesByFilter replace the getTableObjectsByName method. ### Why are the changes needed? I found that list-table will takes 300s when a schema has 5000 tables . Fix: #4089 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Manual testing --------- Co-authored-by: ericqin <ericqin@tencent.com> Co-authored-by: mchades <liminghuang@datastrato.com>
…ive table list (apache#4469) ### What changes were proposed in this pull request? the problem of slow acquisition of hive table list. Using listTableNamesByFilter replace the getTableObjectsByName method. ### Why are the changes needed? I found that list-table will takes 300s when a schema has 5000 tables . Fix: apache#4089 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? Manual testing --------- Co-authored-by: ericqin <ericqin@tencent.com> Co-authored-by: mchades <liminghuang@datastrato.com>


What changes were proposed in this pull request?
the problem of slow acquisition of hive table list.
Using listTableNamesByFilter replace the getTableObjectsByName method.
Why are the changes needed?
I found that list-table will takes 300s when a schema has 5000 tables .
Fix: #4089
Does this PR introduce any user-facing change?
no
How was this patch tested?
Manual testing