Skip to content

[#6570] improvement(core): Optimize fetching entity parent id logic#6574

Merged
jerryshao merged 12 commits intoapache:mainfrom
yuqi1129:issue_6570
Mar 28, 2025
Merged

[#6570] improvement(core): Optimize fetching entity parent id logic#6574
jerryshao merged 12 commits intoapache:mainfrom
yuqi1129:issue_6570

Conversation

@yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

Use JOIN to avoid fetching id several times when getting the parents of a fileset/table/topic/model

Why are the changes needed?

It's very time-consuming to retrieve id from database several times.

Fix: #6570

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Existing UTs

@roryqi
Copy link
Contributor

roryqi commented Feb 28, 2025

@lw-yang

Copy link
Contributor

@tengqm tengqm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.
Left a suggestion regarding the several parallel if statements.

@yuqi1129 yuqi1129 closed this Mar 3, 2025
@yuqi1129 yuqi1129 reopened this Mar 3, 2025
@yuqi1129 yuqi1129 self-assigned this Mar 3, 2025
@tengqm
Copy link
Contributor

tengqm commented Mar 4, 2025

This is nice.

@yuqi1129 yuqi1129 closed this Mar 5, 2025
@yuqi1129 yuqi1129 reopened this Mar 5, 2025
@yuqi1129 yuqi1129 requested review from jerryshao, roryqi and tengqm and removed request for tengqm March 8, 2025 02:49
@yuqi1129
Copy link
Contributor Author

@jerqi @jerryshao
Would you be able to help take a look if you have time? Thank you.

@Param("metalakeName") String metalakeName, @Param("catalogName") String catalogName) {
return "SELECT me.metalake_id as metalakeId, ca.catalog_id as catalogId FROM "
+ TABLE_NAME
+ " ca INNER JOIN metalake_meta me ON ca.metalake_id = me.metalake_id"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have confirmed that this join has no impact on internal usage at Xiaomi.

return "SELECT metalake_meta.metalake_id as metalakeId, catalog_meta.catalog_id as catalogId, "
+ " schema_id as schemaId"
+ " FROM metalake_meta"
+ " JOIN catalog_meta ON metalake_meta.metalake_id = catalog_meta.metalake_id"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have confirmed that this join has no impact on internal usage at Xiaomi.

@yuqi1129
Copy link
Contributor Author

@jerqi @jerryshao
Could you please take some time to review it?


public class CatalogIds {
private Long metalakeId;
private Long catalogId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these class variables final here and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

@jerryshao
Copy link
Contributor

Generally LGTM, just some small comments.

@jerryshao jerryshao merged commit 10b2fb0 into apache:main Mar 28, 2025
28 checks passed
LindaSummer pushed a commit to LindaSummer/gravitino that referenced this pull request Mar 30, 2025
…ogic (apache#6574)

### What changes were proposed in this pull request?

Use `JOIN` to avoid fetching id several times when getting the parents
of a fileset/table/topic/model

### Why are the changes needed?

It's very time-consuming to retrieve id from database several times. 

Fix: apache#6570 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Existing UTs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask] Improvment: Optimize method getParentEntityIdByNamespace to reduce the number of accesses to the database

5 participants