Skip to content

Commit

Permalink
[SPARK-20881][SQL] Clearly document the mechanism to choose between t…
Browse files Browse the repository at this point in the history
…wo sources of statistics

## What changes were proposed in this pull request?

Now, we have two sources of statistics, i.e. Spark's stats and Hive's stats. Spark's stats is generated by running "analyze" command in Spark. Once it's available, we respect this stats over Hive's.

This pr is to clearly document in related code the mechanism to choose between these two sources of stats.

## How was this patch tested?

Not related.

Author: Zhenhua Wang <wzh_zju@163.com>

Closes #18105 from wzhfy/cboSwitchStats.
  • Loading branch information
wzhfy authored and gatorsmile committed May 28, 2017
1 parent 24d3428 commit 9d0db5a
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -681,9 +681,11 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
}
}

// construct Spark's statistics from information in Hive metastore
// Restore Spark's statistics from information in Metastore.
val statsProps = table.properties.filterKeys(_.startsWith(STATISTICS_PREFIX))

// Currently we have two sources of statistics: one from Hive and the other from Spark.
// In our design, if Spark's statistics is available, we respect it over Hive's statistics.
if (statsProps.nonEmpty) {
val colStats = new mutable.HashMap[String, ColumnStat]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -434,6 +434,8 @@ private[hive] class HiveClientImpl(
}
val comment = properties.get("comment")

// Here we are reading statistics from Hive.
// Note that this statistics could be overridden by Spark's statistics if that's available.
val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
val rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)).filter(_ >= 0)
Expand Down

0 comments on commit 9d0db5a

Please sign in to comment.