-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29030: Alter partition change column cannot use the direct sql #5890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@dengzhhu653, I will be marking HIVE-29039 as duplicate of HIVE-29030. |
...ore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdatePart.java
Show resolved
Hide resolved
...ore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlUpdatePart.java
Show resolved
Hide resolved
@dengzhhu653 , i ran with this patch. Few points:
Can you also validate please, in order to rule out setup issues? |
|
Thank you. Let me try |
@Aggarwal-Raghav, do we know what change in HIVE-28956 introduced a perf degradation: batching or |
@Aggarwal-Raghav same to me, it takes me 402.786s. I guess it due to insert the new columns per partition, instead of sharing the same CD_ID the table holds. @deniskuzZ @Aggarwal-Raghav should we file a new jira to address this issue? |
The alterPartitions API doesn't share the CD_ID the table holds, for each partition, it creates a new CD_ID with 800+ columns in COLUMNS_V2 |
@dengzhhu653 , as code freeze is planned for tomorrow i.e. June 25th, Can we make it in time? I am inclined towards reverting HIVE-28956 and chase it later in master branch? What do you think @deniskuzZ ? |
+1 |
@dengzhhu653, but performance-wise that shouldn't be any different from using loop over partitions and calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
In theory yes, the performance should be similar(or better than) as before we upgrade the DataNucleus. |
@Aggarwal-Raghav, @dengzhhu653, do we have an agreement on HIVE-28956 revert? If yes, I can do that. Also, do we have a benchmark in metastore-tools for |
Personally I'm -1 on reverting the already merged PR without strong reason, it's upset for the community. Filed HIVE-29042, I can prepare a fix next day if allow. |
======
@deniskuzZ, I found it because of some prod use case and checked the behaviour in master branch and found the performace issue. |
@Aggarwal-Raghav , thanks for testing out the patch HIVE-28956 with along with HIVE-28909 and HIVE-28972. When i was working on the fix and tested the same, the 2 patches were not merged.
@deniskuzZ , in HIVE-28956, i had moved from alterPartition API to alterPartitions API with batching. As is the case currently, alterPartition does not support DirectSql, the degradation is caused by the API change itself and is not related to batching.
Though reverting the patch HIVE-28956 would definitely improve the performance of alter table add column cascade operation, but it would still not fix the issue of DirectSql being slower than JDO in the alterPartitions API which currently is being used in other flows as well, one of the instance being: alter table rename operation. |
fixed the directSql perf in #5895 |
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?