Skip to content

Commit

Permalink
[SPARK-34084][SQL] Fix auto updating of table stats in `ALTER TABLE .…
Browse files Browse the repository at this point in the history
…. ADD PARTITION`

Fix an issue in `ALTER TABLE .. ADD PARTITION` which happens when:
- A table doesn't have stats
- `spark.sql.statistics.size.autoUpdate.enabled` is `true`

In that case, `ALTER TABLE .. ADD PARTITION` does not update table stats automatically.

The changes fix the issue demonstrated by the example:
```sql
spark-sql> create table tbl (col0 int, part int) partitioned by (part);
spark-sql> insert into tbl partition (part = 0) select 0;
spark-sql> set spark.sql.statistics.size.autoUpdate.enabled=true;
spark-sql> alter table tbl add partition (part = 1);
```
the `add partition` command should update table stats but it does not. There is no stats in the output of:
```
spark-sql> describe table extended tbl;
```

Yes. After the changes, `ALTER TABLE .. ADD PARTITION` updates stats even when a table does have them before the command:
```sql
spark-sql> alter table tbl add partition (part = 1);
spark-sql> describe table extended tbl;
col0	int	NULL
part	int	NULL
part	int	NULL

...
Statistics	2 bytes
```

By running new UT and existing test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"
```

Closes apache#31149 from MaxGekk/fix-stats-in-add-partition.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 6c04795)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
  • Loading branch information
MaxGekk committed Jan 12, 2021
1 parent 8dbf500 commit 960f158
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 10 deletions.
Expand Up @@ -486,17 +486,17 @@ case class AlterTableAddPartitionCommand(
}

sparkSession.catalog.refreshTable(table.identifier.quotedString)
if (table.stats.nonEmpty) {
if (sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
val addedSize = CommandUtils.calculateMultipleLocationSizes(sparkSession, table.identifier,
parts.map(_.storage.locationUri)).sum
if (addedSize > 0) {
val newStats = CatalogStatistics(sizeInBytes = table.stats.get.sizeInBytes + addedSize)
catalog.alterTableStats(table.identifier, Some(newStats))
}
} else {
catalog.alterTableStats(table.identifier, None)
if (table.stats.nonEmpty && sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
// Updating table stats only if new partition is not empty
val addedSize = CommandUtils.calculateMultipleLocationSizes(sparkSession, table.identifier,
parts.map(_.storage.locationUri)).sum
if (addedSize > 0) {
val newStats = CatalogStatistics(sizeInBytes = table.stats.get.sizeInBytes + addedSize)
catalog.alterTableStats(table.identifier, Some(newStats))
}
} else {
// Re-calculating of table size including all partitions
CommandUtils.updateTableStats(sparkSession, table)
}
Seq.empty[Row]
}
Expand Down
Expand Up @@ -1554,4 +1554,20 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto
}
}
}

test("SPARK-34084: auto update table stats") {
Seq("parquet", "hive").foreach { format =>
withTable("t") {
withSQLConf(SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "false") {
sql(s"CREATE TABLE t (col0 int, part int) USING $format PARTITIONED BY (part)")
sql("INSERT INTO t PARTITION (part=0) SELECT 0")
assert(getCatalogTable("t").stats.isEmpty)
}
withSQLConf(SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") {
sql("ALTER TABLE t ADD PARTITION (part=1)")
assert(getTableStats("t").sizeInBytes > 0)
}
}
}
}
}

0 comments on commit 960f158

Please sign in to comment.