Skip to content

Conversation

dengzhhu653
Copy link
Member

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

@Aggarwal-Raghav
Copy link
Contributor

Aggarwal-Raghav commented Jun 24, 2025

@dengzhhu653, I will be marking HIVE-29039 as duplicate of HIVE-29030.
As I have a repro case on my setup, let me test this PR :-)

@Aggarwal-Raghav
Copy link
Contributor

@dengzhhu653 , i ran with this patch. Few points:

  1. Direct SQL is working fine, No "falling back to ORM" log is observed 👍🏻.
  2. There is performance degradation for the same scenario that we ran in HIVE-28972 (alter table add column cascade), instead of taking 70-80 sec, it is taking approx. 550 sec. I suspect the issue is moving from alterPartition => alterPartitions in HIVE-28956. Because without HIVE-28956, it is taking 80 sec approx.

Can you also validate please, in order to rule out setup issues?

Copy link

@dengzhhu653
Copy link
Member Author

@dengzhhu653 , i ran with this patch. Few points:

  1. Direct SQL is working fine, No "falling back to ORM" log is observed 👍🏻.
  2. There is performance degradation for the same scenario that we ran in HIVE-28972 (alter table add column cascade), instead of taking 70-80 sec, it is taking approx. 550 sec. I suspect the issue is moving from alterPartition => alterPartitions in HIVE-28956. Because without HIVE-28956, it is taking 80 sec approx.

Can you also validate please, in order to rule out setup issues?

Thank you. Let me try

@deniskuzZ
Copy link
Member

@Aggarwal-Raghav, do we know what change in HIVE-28956 introduced a perf degradation: batching or alterPartitions API?
cc vikramahuja1001

@dengzhhu653
Copy link
Member Author

@Aggarwal-Raghav same to me, it takes me 402.786s. I guess it due to insert the new columns per partition, instead of sharing the same CD_ID the table holds.

@deniskuzZ @Aggarwal-Raghav should we file a new jira to address this issue?

@dengzhhu653
Copy link
Member Author

@Aggarwal-Raghav, do we know what change in HIVE-28956 introduced a perf degradation: batching or alterPartitions API? cc vikramahuja1001

The alterPartitions API doesn't share the CD_ID the table holds, for each partition, it creates a new CD_ID with 800+ columns in COLUMNS_V2

@Aggarwal-Raghav
Copy link
Contributor

@Aggarwal-Raghav same to me, it takes me 402.786s. I guess it due to insert the new columns per partition, instead of sharing the same CD_ID the table holds.

@deniskuzZ @Aggarwal-Raghav should we file a new jira to address this issue?

@dengzhhu653 , as code freeze is planned for tomorrow i.e. June 25th, Can we make it in time? I am inclined towards reverting HIVE-28956 and chase it later in master branch? What do you think @deniskuzZ ?

@deniskuzZ
Copy link
Member

@Aggarwal-Raghav same to me, it takes me 402.786s. I guess it due to insert the new columns per partition, instead of sharing the same CD_ID the table holds.
@deniskuzZ @Aggarwal-Raghav should we file a new jira to address this issue?

@dengzhhu653 , as code freeze is planned for tomorrow i.e. June 25th, Can we make it in time? I am inclined towards reverting HIVE-28956 and chase it later in master branch? What do you think @deniskuzZ ?

+1

@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 24, 2025

@Aggarwal-Raghav, do we know what change in HIVE-28956 introduced a perf degradation: batching or alterPartitions API? cc vikramahuja1001

The alterPartitions API doesn't share the CD_ID the table holds, for each partition, it creates a new CD_ID with 800+ columns in COLUMNS_V2

@dengzhhu653, but performance-wise that shouldn't be any different from using loop over partitions and calling alterPartition API, isn't it? if we manage to add CD_ID reuse, that should improve the performance.

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@dengzhhu653
Copy link
Member Author

dengzhhu653 commented Jun 24, 2025

@Aggarwal-Raghav, do we know what change in HIVE-28956 introduced a perf degradation: batching or alterPartitions API? cc vikramahuja1001

The alterPartitions API doesn't share the CD_ID the table holds, for each partition, it creates a new CD_ID with 800+ columns in COLUMNS_V2

@dengzhhu653, but performance-wise that shouldn't be any different from using loop over partitions and calling alterPartition API, isn't it? if we manage to add CD_ID reuse, that should improve the performance.

In theory yes, the performance should be similar(or better than) as before we upgrade the DataNucleus.

@deniskuzZ
Copy link
Member

deniskuzZ commented Jun 24, 2025

@Aggarwal-Raghav, @dengzhhu653, do we have an agreement on HIVE-28956 revert? If yes, I can do that.

Also, do we have a benchmark in metastore-tools for alter table add column cascade? Or how do you find the regression post DataNucleus upgrade, private setup ?

@dengzhhu653
Copy link
Member Author

Personally I'm -1 on reverting the already merged PR without strong reason, it's upset for the community. Filed HIVE-29042, I can prepare a fix next day if allow.

@Aggarwal-Raghav
Copy link
Contributor

@Aggarwal-Raghav, @dengzhhu653, do we have an agreement on HIVE-28956 revert? If yes, I can do that.

Also, do we have a benchmark in metastore-tools for alter table add column cascade? Or how do you find the regression post DataNucleus upgrade, private setup ?

======

how do you find the regression post DataNucleus upgrade, private setup ?

@deniskuzZ, I found it because of some prod use case and checked the behaviour in master branch and found the performace issue.

@vikramahuja1001
Copy link
Contributor

vikramahuja1001 commented Jun 24, 2025

@Aggarwal-Raghav , thanks for testing out the patch HIVE-28956 with along with HIVE-28909 and HIVE-28972. When i was working on the fix and tested the same, the 2 patches were not merged.
From the above tests, post HIVE-28972 it seems that the JDO performance is much better than the current DirectSql Implementation.

@Aggarwal-Raghav, do we know what change in HIVE-28956 introduced a perf degradation: batching or alterPartitions API? cc vikramahuja1001

@deniskuzZ , in HIVE-28956, i had moved from alterPartition API to alterPartitions API with batching. As is the case currently, alterPartition does not support DirectSql, the degradation is caused by the API change itself and is not related to batching.

@Aggarwal-Raghav, @dengzhhu653, do we have an agreement on HIVE-28956 revert? If yes, I can do that.

Also, do we have a benchmark in metastore-tools for alter table add column cascade? Or how do you find the regression post DataNucleus upgrade, private setup ?

Though reverting the patch HIVE-28956 would definitely improve the performance of alter table add column cascade operation, but it would still not fix the issue of DirectSql being slower than JDO in the alterPartitions API which currently is being used in other flows as well, one of the instance being: alter table rename operation.
This use case of rename table should also potentially suffer from the same issue, based on the analysis in the above comments. HIVE-17969 is doing the similar change of moving from alterPartition API to alterPartitions API in the alter table rename flow. We should also perhaps look and compare the performance of the rename operation as well in JDO vs DirectSql and then take a call.

@deniskuzZ deniskuzZ merged commit be89c46 into apache:master Jun 24, 2025
4 checks passed
@deniskuzZ
Copy link
Member

fixed the directSql perf in #5895

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants