Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21213][SQL][FOLLOWUP] Improve partition statistics in AnalyzePartitionCommand #23584

Closed
wants to merge 2 commits into from

Conversation

wangshuo128
Copy link
Contributor

@wangshuo128 wangshuo128 commented Jan 18, 2019

What changes were proposed in this pull request?

This pr proposes to improve partition statistics in AnalyzePartitionCommand:

  1. Restore Spark partition statistics in HiveExternalCatalog.listPartitions and HiveExternalCatalog.listPartitionsByFilter.
  2. Compare with old partition stats instead of old table stats in AnalyzePartitionCommand.

Thus partitions listed in AnalyzePartitionCommand would contain Spark stats and would not update HiveMetaStore if stats not changed.

val partitions = sessionState.catalog.listPartitions(tableMeta.identifier, partitionValueSpec)

How was this patch tested?

Modified existing tests.

… of old table stats in AnalyzePartitionCommand
@wangshuo128
Copy link
Contributor Author

cc @mbasmanova @cloud-fan @gatorsmile Would you take a look if have time. :)

@@ -110,7 +110,7 @@ case class AnalyzePartitionCommand(
val newTotalSize = CommandUtils.calculateLocationSize(
sessionState, tableMeta.identifier, p.storage.locationUri)
val newRowCount = rowCounts.get(p.spec)
val newStats = CommandUtils.compareAndGetNewStats(tableMeta.stats, newTotalSize, newRowCount)
val newStats = CommandUtils.compareAndGetNewStats(p.stats, newTotalSize, newRowCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there different?

Copy link
Contributor Author

@wangshuo128 wangshuo128 Jan 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different. Stats in tableMeta are table level stats, stats in Partition are partition level stats.

@wangshuo128 wangshuo128 changed the title [SPARK-21213][SQL][FOLLOWUP] Compare with old partition stats instead of old table stats in AnalyzePartitionCommand [SPARK-21213][SQL][FOLLOWUP] Improve partition statistics in AnalyzePartitionCommand Jan 20, 2019
@wangshuo128
Copy link
Contributor Author

Also cc @wzhfy

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@github-actions
Copy link

github-actions bot commented Jan 2, 2020

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just
a way of keeping the PR queue manageable.

If you'd like to revive this PR, please reopen it!

@github-actions github-actions bot added the Stale label Jan 2, 2020
@github-actions github-actions bot closed this Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants