Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34084][SQL] Fix auto updating of table stats in ALTER TABLE .. ADD PARTITION #31149

Closed
wants to merge 2 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 12, 2021

What changes were proposed in this pull request?

Fix an issue in ALTER TABLE .. ADD PARTITION which happens when:

  • A table doesn't have stats
  • spark.sql.statistics.size.autoUpdate.enabled is true

In that case, ALTER TABLE .. ADD PARTITION does not update table stats automatically.

Why are the changes needed?

The changes fix the issue demonstrated by the example:

spark-sql> create table tbl (col0 int, part int) partitioned by (part);
spark-sql> insert into tbl partition (part = 0) select 0;
spark-sql> set spark.sql.statistics.size.autoUpdate.enabled=true;
spark-sql> alter table tbl add partition (part = 1);

the add partition command should update table stats but it does not. There are no stats in the output of:

spark-sql> describe table extended tbl;

Does this PR introduce any user-facing change?

Yes. After the changes, ALTER TABLE .. ADD PARTITION updates stats even when a table doesn't have stats before the command:

spark-sql> alter table tbl add partition (part = 1);
spark-sql> describe table extended tbl;
col0	int	NULL
part	int	NULL
# Partition Information
# col_name	data_type	comment
part	int	NULL

# Detailed Table Information
...
Statistics	2 bytes

How was this patch tested?

By running new UT and existing test suites:

$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"

@github-actions github-actions bot added the SQL label Jan 12, 2021
@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38552/

@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38552/

@cloud-fan
Copy link
Contributor

thanks, merging to master! Please open backport PRs for 3.1/3.0

@cloud-fan cloud-fan closed this in 6c04795 Jan 12, 2021
@SparkQA
Copy link

SparkQA commented Jan 12, 2021

Test build #133965 has finished for PR 31149 at commit df1301d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

MaxGekk added a commit to MaxGekk/spark that referenced this pull request Jan 12, 2021
…. ADD PARTITION`

Fix an issue in `ALTER TABLE .. ADD PARTITION` which happens when:
- A table doesn't have stats
- `spark.sql.statistics.size.autoUpdate.enabled` is `true`

In that case, `ALTER TABLE .. ADD PARTITION` does not update table stats automatically.

The changes fix the issue demonstrated by the example:
```sql
spark-sql> create table tbl (col0 int, part int) partitioned by (part);
spark-sql> insert into tbl partition (part = 0) select 0;
spark-sql> set spark.sql.statistics.size.autoUpdate.enabled=true;
spark-sql> alter table tbl add partition (part = 1);
```
the `add partition` command should update table stats but it does not. There is no stats in the output of:
```
spark-sql> describe table extended tbl;
```

Yes. After the changes, `ALTER TABLE .. ADD PARTITION` updates stats even when a table does have them before the command:
```sql
spark-sql> alter table tbl add partition (part = 1);
spark-sql> describe table extended tbl;
col0	int	NULL
part	int	NULL
part	int	NULL

...
Statistics	2 bytes
```

By running new UT and existing test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"
```

Closes apache#31149 from MaxGekk/fix-stats-in-add-partition.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 6c04795)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
MaxGekk added a commit to MaxGekk/spark that referenced this pull request Jan 12, 2021
…. ADD PARTITION`

Fix an issue in `ALTER TABLE .. ADD PARTITION` which happens when:
- A table doesn't have stats
- `spark.sql.statistics.size.autoUpdate.enabled` is `true`

In that case, `ALTER TABLE .. ADD PARTITION` does not update table stats automatically.

The changes fix the issue demonstrated by the example:
```sql
spark-sql> create table tbl (col0 int, part int) partitioned by (part);
spark-sql> insert into tbl partition (part = 0) select 0;
spark-sql> set spark.sql.statistics.size.autoUpdate.enabled=true;
spark-sql> alter table tbl add partition (part = 1);
```
the `add partition` command should update table stats but it does not. There is no stats in the output of:
```
spark-sql> describe table extended tbl;
```

Yes. After the changes, `ALTER TABLE .. ADD PARTITION` updates stats even when a table does have them before the command:
```sql
spark-sql> alter table tbl add partition (part = 1);
spark-sql> describe table extended tbl;
col0	int	NULL
part	int	NULL
part	int	NULL

...
Statistics	2 bytes
```

By running new UT and existing test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"
```

Closes apache#31149 from MaxGekk/fix-stats-in-add-partition.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 6c04795)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 960f158)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants