Skip to content

Conversation

@wecharyu
Copy link
Contributor

@wecharyu wecharyu commented Mar 19, 2023

What changes were proposed in this pull request?

We want to reuse the dropPartitionsInternal() method for drop_partition_common api to enjoy the high performance, to achieve this we will refactor the RawStore#dropPartition api.

Why are the changes needed?

  1. support direct sql for drop single partition
  2. reduce one query to the DB, that is MPartition query in ObjectStore#dropPartition

Does this PR introduce any user-facing change?

No

How was this patch tested?

  1. Pass all existing tests.
  2. Add the benchmark test for dropPartition:
java -jar hmsbench-jar-with-dependencies.jar -H localhost --savedata /tmp/benchdata --sanitize -N 10 -N 100 -o bench_results.csv -C -d testbench --params=100 -E 'list.*' -E 'createTable' -E 'dropTable.*' -E 'get.*' -E 'add.*' -E 'renameTable.*' -E 'dropDatabase.*' -E 'openTxn.*' -E 'PartitionManagementTask'

Before use direct sql:

Operation                      Mean     Med      Min      Max      Err%
dropPartition                  29.60    28.32    27.38    57.81    13.86
dropPartition.10               263.1    272.3    160.7    292.5    12.09
dropPartition.100              2461     2735     1610     4381     22.36
dropPartitions.10              22.97    22.46    21.56    36.05    8.306
dropPartitions.100             26.97    26.35    24.71    35.25    7.569

After use direct sql:

Operation                      Mean     Med      Min      Max      Err%
dropPartition                  24.76    24.56    20.86    36.23    11.47
dropPartition.10               190.1    165.7    153.5    275.1    23.76
dropPartition.100              2181     2591     1482     4831     28.61
dropPartitions.10              17.71    17.43    15.84    25.37    7.857
dropPartitions.100             34.42    33.10    31.61    58.97    11.98

@wecharyu
Copy link
Contributor Author

@deniskuzZ @kasakrisz @saihemanth-cloudera: Could you please review this PR?

@wecharyu
Copy link
Contributor Author

Put the benchmark in PR description, it shows some performance improvement. FYI: @deniskuzZ @saihemanth-cloudera @VenuReddy2103

if (!ms.dropPartition(catName, db_name, tbl_name, part_vals)) {
String partName = Warehouse.makePartName(tbl.getPartitionKeys(), part_vals);
if (!ms.dropPartition(catName, db_name, tbl_name, partName)) {
throw new MetaException("Unable to drop partition");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw the exception stack trace/reason for failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ms.dropPartition() will throw the exception, we do not need to throw this MetaException here, will remove this code.

Assert.assertEquals(2, numPartitions);

try (AutoCloseable c = deadline()) {
objectStore.dropPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, value1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I would like to keep the old tests and add new tests that takes in "partName" as argument.

return null;
}),
null);
} catch (TException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to remove this? I think we are somehow the percentage of error records is not correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the benchmarkDropPartitions() and some others, catching exception here should have nothing to do with benchmark statistics, because the result statistic is in measure() method:

public DescriptiveStatistics measure(@Nullable Runnable pre,
@NotNull Runnable test,
@Nullable Runnable post) {
// Warmup phase

Also the Err% should be Coefficient of variation of successful operations rather than percentage of error records:

Copy link
Contributor

@saihemanth-cloudera saihemanth-cloudera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you consider the given suggestions?

@wecharyu
Copy link
Contributor Author

Address comments. cc: @saihemanth-cloudera

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 7 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Copy link
Contributor

@saihemanth-cloudera saihemanth-cloudera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you look at the given suggestions?

Copy link
Contributor

@saihemanth-cloudera saihemanth-cloudera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@deniskuzZ deniskuzZ merged commit 7f46b40 into apache:master Apr 24, 2023
henrib pushed a commit to henrib/hive that referenced this pull request Apr 25, 2023
…eviewed by Sai Hemanth Gantasala, Denys Kuzmenko)

Closes apache#4123
yeahyung pushed a commit to yeahyung/hive that referenced this pull request Jul 20, 2023
…eviewed by Sai Hemanth Gantasala, Denys Kuzmenko)

Closes apache#4123
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
…eviewed by Sai Hemanth Gantasala, Denys Kuzmenko)

Closes apache#4123
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants