Skip to content

Commit

Permalink
[SPARK-34084][SQL] Fix auto updating of table stats in `ALTER TABLE .…
Browse files Browse the repository at this point in the history
…. ADD PARTITION`

### What changes were proposed in this pull request?
Fix an issue in `ALTER TABLE .. ADD PARTITION` which happens when:
- A table doesn't have stats
- `spark.sql.statistics.size.autoUpdate.enabled` is `true`

In that case, `ALTER TABLE .. ADD PARTITION` does not update table stats automatically.

### Why are the changes needed?
The changes fix the issue demonstrated by the example:
```sql
spark-sql> create table tbl (col0 int, part int) partitioned by (part);
spark-sql> insert into tbl partition (part = 0) select 0;
spark-sql> set spark.sql.statistics.size.autoUpdate.enabled=true;
spark-sql> alter table tbl add partition (part = 1);
```
the `add partition` command should update table stats but it does not. There is no stats in the output of:
```
spark-sql> describe table extended tbl;
```

### Does this PR introduce _any_ user-facing change?
Yes. After the changes, `ALTER TABLE .. ADD PARTITION` updates stats even when a table does have them before the command:
```sql
spark-sql> alter table tbl add partition (part = 1);
spark-sql> describe table extended tbl;
col0	int	NULL
part	int	NULL
# Partition Information
# col_name	data_type	comment
part	int	NULL

# Detailed Table Information
...
Statistics	2 bytes
```

### How was this patch tested?
By running new UT and existing test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"
```

Closes apache#31149 from MaxGekk/fix-stats-in-add-partition.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
MaxGekk authored and cloud-fan committed Jan 12, 2021
1 parent a4b7075 commit 6c04795
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 13 deletions.
Expand Up @@ -486,17 +486,17 @@ case class AlterTableAddPartitionCommand(
}

sparkSession.catalog.refreshTable(table.identifier.quotedString)
if (table.stats.nonEmpty) {
if (sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
val addedSize = CommandUtils.calculateMultipleLocationSizes(sparkSession, table.identifier,
parts.map(_.storage.locationUri)).sum
if (addedSize > 0) {
val newStats = CatalogStatistics(sizeInBytes = table.stats.get.sizeInBytes + addedSize)
catalog.alterTableStats(table.identifier, Some(newStats))
}
} else {
catalog.alterTableStats(table.identifier, None)
if (table.stats.nonEmpty && sparkSession.sessionState.conf.autoSizeUpdateEnabled) {
// Updating table stats only if new partition is not empty
val addedSize = CommandUtils.calculateMultipleLocationSizes(sparkSession, table.identifier,
parts.map(_.storage.locationUri)).sum
if (addedSize > 0) {
val newStats = CatalogStatistics(sizeInBytes = table.stats.get.sizeInBytes + addedSize)
catalog.alterTableStats(table.identifier, Some(newStats))
}
} else {
// Re-calculating of table size including all partitions
CommandUtils.updateTableStats(sparkSession, table)
}
Seq.empty[Row]
}
Expand Down
Expand Up @@ -98,10 +98,11 @@ trait DDLCommandTestUtils extends SQLTestUtils {
sql(s"DESCRIBE TABLE EXTENDED $tableName")
.select("data_type")
.where("col_name = 'Statistics'")
.first()
.getString(0)
if (stats.isEmpty) {
throw new IllegalArgumentException(s"The table $tableName does not have stats")
}
val tableSizeInStats = ".*(\\d) bytes.*".r
val size = stats match {
val size = stats.first().getString(0) match {
case tableSizeInStats(s) => s.toInt
case _ => throw new IllegalArgumentException("Not found table size in stats")
}
Expand Down
Expand Up @@ -23,6 +23,7 @@ import org.apache.commons.io.FileUtils

import org.apache.spark.sql.{AnalysisException, Row}
import org.apache.spark.sql.execution.command
import org.apache.spark.sql.internal.SQLConf

/**
* This base suite contains unified tests for the `ALTER TABLE .. ADD PARTITION` command that
Expand Down Expand Up @@ -72,6 +73,23 @@ trait AlterTableAddPartitionSuiteBase extends command.AlterTableAddPartitionSuit
checkAnswer(sql("SELECT * FROM t"), Seq(Row(0, 0), Row(0, 1)))
}
}

test("SPARK-34084: auto update table stats") {
withNamespaceAndTable("ns", "tbl") { t =>
withSQLConf(SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "false") {
sql(s"CREATE TABLE $t (col0 int, part int) $defaultUsing PARTITIONED BY (part)")
sql(s"INSERT INTO $t PARTITION (part=0) SELECT 0")
val errMsg = intercept[IllegalArgumentException] {
getTableSize(t)
}.getMessage
assert(errMsg.contains(s"The table $t does not have stats"))
}
withSQLConf(SQLConf.AUTO_SIZE_UPDATE_ENABLED.key -> "true") {
sql(s"ALTER TABLE $t ADD PARTITION (part=1)")
assert(getTableSize(t) > 0)
}
}
}
}

/**
Expand Down

0 comments on commit 6c04795

Please sign in to comment.