Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-25800: Improvement in loadDynamicPartitions() to not load all partitions from HMS for managed table #2868

Closed
wants to merge 3 commits into from

Conversation

sourabh912
Copy link
Contributor

What changes were proposed in this pull request?

HIVE-20661 added an improvement in loadDynamicPartitions() api in Hive.java to not add partitions one
by one in HMS. It used to fetch all the existing partitions for a table from HMS and compare that with
dynamic partitions list to decide old and new partitions to be added to HMS (in batches). The call to
fetch all partitions has introduced a performance regression for tables with large number of
partitions (of the order of 100K).

This is fixed for external tables in HIVE-25178. However for ACID tables there is an open Jira HIVE-25187.
Until we have an appropriate fix in HIVE-25187,we can skip fetching all partitions. Instead, in the
threadPool which loads each partition individually,call getPartition() to check if the partition already
exists in HMS or not. This will introduce additional getPartition() call for every partition to be loaded
dynamically but does not fetch all existing partitions for a table anymore.

Why are the changes needed?

Does this PR introduce any user-facing change?

No

How was this patch tested?

Since it is an improvement in existing logic, therefore relying on existing tests.

…rtitions from HMS for managed table

HIVE-20661 added an improvement in loadDynamicPartitions() api in Hive.java to not add partitions one
by one in HMS. It used to fetch all the existing partitions for a table from HMS and compare that with
dynamic partitions list to decide old and new partitions to be added to HMS (in batches). The call to
fetch all partitions has introduced a performance regression for tables with large number of
partitions (of the order of 100K).

This is fixed for external tables in HIVE-25178. However for ACID tables there is an open Jira HIVE-25187.
Until we have an appropriate fix in HIVE-25187,we can skip fetching all partitions. Instead, in the
threadPool which loads each partition individually,call getPartition() to check if the partition already
exists in HMS or not. This will introduce additional getPartition() call for every partition to be loaded
dynamically but does not fetch all existing partitions for a table anymore.

Change-Id: I1308d51d56d77aae2c8378e153d002b1d13f7cc1
@sourabh912
Copy link
Contributor Author

@lcspinter @rbalamohan @nrg4878 : Please review and provide your feedback.

Sourabh Goyal added 2 commits December 14, 2021 10:39
…rtitions from HMS for managed table

HIVE-20661 added an improvement in loadDynamicPartitions() api in Hive.java to not add partitions one
by one in HMS. It used to fetch all the existing partitions for a table from HMS and compare that with
dynamic partitions list to decide old and new partitions to be added to HMS (in batches). The call to
fetch all partitions has introduced a performance regression for tables with large number of
partitions (of the order of 100K).

This is fixed for external tables in HIVE-25178. However for ACID tables there is an open Jira HIVE-25187.
Until we have an appropriate fix in HIVE-25187,we can skip fetching all partitions. Instead, in the
threadPool which loads each partition individually,call getPartition() to check if the partition already
exists in HMS or not. This will introduce additional getPartition() call for every partition to be loaded
dynamically but does not fetch all existing partitions for a table anymore.

Change-Id: I1308d51d56d77aae2c8378e153d002b1d13f7cc1
Change-Id: I9b76082fd8846ad390da2bc3218b151f11d9e394
@sourabh912
Copy link
Contributor Author

@kgyrtkirk : Please review and provide your feedback.

@sourabh912
Copy link
Contributor Author

The following test is failing and it seems unrelated to the patch. I have triggered the run again:

org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Caught exception while initializing the SqlSerDe)
 at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1301)
 at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1306)
 at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:140)
 at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98)
 at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
 at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
 at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:365)
 at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:338)
 at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:249)
 at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:110)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148)
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:164)
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:230)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:256)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:353)
 at org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:726)
 at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:696)
 at org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:114)
 at org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
 at org.apache.hadoop.hive.cli.split8.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)

@nrg4878
Copy link
Contributor

nrg4878 commented Dec 20, 2021

Fix has been merged. Please close the PR

@sourabh912 sourabh912 closed this Jan 4, 2022
@sourabh912
Copy link
Contributor Author

Thank you @lcspinter for the approval and @nrg4878 for merging it. Commit merged to master: 63c6d8b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants