diff --git a/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/80-analyze-table.md b/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/80-analyze-table.md deleted file mode 100644 index 6e9f763760..0000000000 --- a/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/80-analyze-table.md +++ /dev/null @@ -1,79 +0,0 @@ ---- -title: ANALYZE TABLE -sidebar_position: 7 ---- - -Computes various statistics for a table. This command does *not* display the results after execution. To show the results, use the function [FUSE_STATISTIC](../../../20-sql-functions/16-system-functions/fuse_statistic.md). - -Databend saves the statistic data of each table as a JSON file named with a UUID (32-character lowercase hexadecimal string) and stores these files in your object storage at the path `/[root]///`. - -## Syntax - -```sql -ANALYZE TABLE [ . ] -``` - -- The command does not identify distinct values by comparing them but by counting the number of storage segments and blocks. This might lead to a significant difference between the estimated results and the actual value, for example, multiple blocks holding the same value. In this case, Databend recommends compacting the storage segments and blocks to merge them as much as possible before you run the estimation. -- The column statistics at the snapshot level may be amplified after execute update/delete/replace statements. You can correct the column statistics by performing analyze statement. - -## Examples - -This example estimates the number of distinct values for each column in a table and shows the results with the function [FUSE_STATISTIC](/sql/sql-functions/system-functions/fuse_statistic): - -```sql -CREATE TABLE sample ( - user_id INT, - name VARCHAR(50), - age INT -); - -INSERT INTO sample (user_id, name, age) VALUES -(1, 'Alice', 30), -(2, 'Bob', 25), -(3, 'Charlie', 35), -(4, 'Diana', 28), -(5, 'Eve', 28); - -SET enable_analyze_histogram = 1; - --- FUSE_STATISTIC will not return any results until you run an estimation with ANALYZE TABLE. -SELECT * FROM FUSE_STATISTIC('default', 'sample'); - -ANALYZE TABLE sample; - -SELECT * FROM FUSE_STATISTIC('default', 'sample'); - -┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ column_name │ distinct_count │ histogram │ -├─────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ -│ age │ 4 │ [bucket id: 0, min: "25", max: "25", ndv: 1.0, count: 1.0], [bucket id: 1, min: "28", max: "28", ndv: 1.0, count: 1.0], [bucket id: 2, min: "28", max: "28", ndv: 1.0, count: 1.0], [bucket id: 3, min: "30", max: "30", ndv: 1.0, count: 1.0], [bucket id: 4, min: "35", max: "35", ndv: 1.0, count: 1.0] │ -│ user_id │ 5 │ [bucket id: 0, min: "1", max: "1", ndv: 1.0, count: 1.0], [bucket id: 1, min: "2", max: "2", ndv: 1.0, count: 1.0], [bucket id: 2, min: "3", max: "3", ndv: 1.0, count: 1.0], [bucket id: 3, min: "4", max: "4", ndv: 1.0, count: 1.0], [bucket id: 4, min: "5", max: "5", ndv: 1.0, count: 1.0] │ -│ name │ 5 │ [bucket id: 0, min: "Alice", max: "Alice", ndv: 1.0, count: 1.0], [bucket id: 1, min: "Bob", max: "Bob", ndv: 1.0, count: 1.0], [bucket id: 2, min: "Charlie", max: "Charlie", ndv: 1.0, count: 1.0], [bucket id: 3, min: "Diana", max: "Diana", ndv: 1.0, count: 1.0], [bucket id: 4, min: "Eve", max: "Eve", ndv: 1.0, count: 1.0] │ -└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - -INSERT INTO sample (user_id, name, age) VALUES -(6, 'Frank', 40); - --- FUSE_STATISTIC returns results of your last estimation. To get the most recent estimated values, run ANALYZE TABLE again. -SELECT * FROM FUSE_STATISTIC('default', 'sample'); - -┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ column_name │ distinct_count │ histogram │ -├─────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ -│ age │ 4 │ [bucket id: 0, min: "25", max: "25", ndv: 1.0, count: 1.0], [bucket id: 1, min: "28", max: "28", ndv: 1.0, count: 1.0], [bucket id: 2, min: "28", max: "28", ndv: 1.0, count: 1.0], [bucket id: 3, min: "30", max: "30", ndv: 1.0, count: 1.0], [bucket id: 4, min: "35", max: "35", ndv: 1.0, count: 1.0] │ -│ user_id │ 5 │ [bucket id: 0, min: "1", max: "1", ndv: 1.0, count: 1.0], [bucket id: 1, min: "2", max: "2", ndv: 1.0, count: 1.0], [bucket id: 2, min: "3", max: "3", ndv: 1.0, count: 1.0], [bucket id: 3, min: "4", max: "4", ndv: 1.0, count: 1.0], [bucket id: 4, min: "5", max: "5", ndv: 1.0, count: 1.0] │ -│ name │ 5 │ [bucket id: 0, min: "Alice", max: "Alice", ndv: 1.0, count: 1.0], [bucket id: 1, min: "Bob", max: "Bob", ndv: 1.0, count: 1.0], [bucket id: 2, min: "Charlie", max: "Charlie", ndv: 1.0, count: 1.0], [bucket id: 3, min: "Diana", max: "Diana", ndv: 1.0, count: 1.0], [bucket id: 4, min: "Eve", max: "Eve", ndv: 1.0, count: 1.0] │ -└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - -ANALYZE TABLE sample; - -SELECT * FROM FUSE_STATISTIC('default', 'sample'); - -┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ column_name │ distinct_count │ histogram │ -├─────────────┼────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ -│ name │ 6 │ [bucket id: 0, min: "Alice", max: "Alice", ndv: 1.0, count: 1.0], [bucket id: 1, min: "Bob", max: "Bob", ndv: 1.0, count: 1.0], [bucket id: 2, min: "Charlie", max: "Charlie", ndv: 1.0, count: 1.0], [bucket id: 3, min: "Diana", max: "Diana", ndv: 1.0, count: 1.0], [bucket id: 4, min: "Eve", max: "Eve", ndv: 1.0, count: 1.0], [bucket id: 5, min: "Frank", max: "Frank", ndv: 1.0, count: 1.0] │ -│ age │ 5 │ [bucket id: 0, min: "25", max: "25", ndv: 1.0, count: 1.0], [bucket id: 1, min: "28", max: "28", ndv: 1.0, count: 1.0], [bucket id: 2, min: "28", max: "28", ndv: 1.0, count: 1.0], [bucket id: 3, min: "30", max: "30", ndv: 1.0, count: 1.0], [bucket id: 4, min: "35", max: "35", ndv: 1.0, count: 1.0], [bucket id: 5, min: "40", max: "40", ndv: 1.0, count: 1.0] │ -│ user_id │ 6 │ [bucket id: 0, min: "1", max: "1", ndv: 1.0, count: 1.0], [bucket id: 1, min: "2", max: "2", ndv: 1.0, count: 1.0], [bucket id: 2, min: "3", max: "3", ndv: 1.0, count: 1.0], [bucket id: 3, min: "4", max: "4", ndv: 1.0, count: 1.0], [bucket id: 4, min: "5", max: "5", ndv: 1.0, count: 1.0], [bucket id: 5, min: "6", max: "6", ndv: 1.0, count: 1.0] │ -└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ -``` \ No newline at end of file diff --git a/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/index.md b/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/index.md index b68f6ede28..1114fa9a59 100644 --- a/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/index.md +++ b/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/index.md @@ -49,7 +49,6 @@ This page provides a comprehensive overview of table operations in Databend, org | Command | Description | |---------|-------------| -| [ANALYZE TABLE](80-analyze-table.md) | Calculates table statistics to improve query performance | | [OPTIMIZE TABLE](60-optimize-table.md) | Compacts or purges historical data to save storage space and enhance query performance | | [SET CLUSTER KEY](../06-clusterkey/dml-set-cluster-key.md) | Configures a cluster key to enhance query performance for large tables | diff --git a/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/show-statistics.md b/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/show-statistics.md new file mode 100644 index 0000000000..50e0fa7c9a --- /dev/null +++ b/docs/en/sql-reference/10-sql-commands/00-ddl/01-table/show-statistics.md @@ -0,0 +1,88 @@ +--- +title: SHOW STATISTICS +sidebar_position: 15 +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Displays statistical information about tables and their columns. Statistics help the query optimizer make better decisions about query execution plans by providing information about data distribution, row counts, and distinct values. + +Databend automatically generates statistics during data insertion. You can use this command to inspect the statistics and compare them with actual data to identify any discrepancies that might affect query performance. + +## Syntax + +```sql +SHOW STATISTICS [ FROM DATABASE | FROM TABLE . ] +``` + +| Parameter | Description | +|-----------|-----------------------------------------------------------------------------------------------------------------------------| +| FROM DATABASE | Shows statistics for all tables in the specified database. | +| FROM TABLE | Shows statistics for the specified table only. | + +If no parameter is specified, the command returns statistics for all tables in the current database. + +## Output Columns + +The command returns the following columns for each column in each table: + +| Column | Description | +|--------|-----------------------------------------------------------------------------------------------------------------------------| +| database | The database name. | +| table | The table name. | +| column_name | The column name. | +| stats_row_count | The accumulated number of rows considered in statistics. Since stats are updated on inserts but not decremented on deletes, this number can be **greater than** actual_row_count. | +| actual_row_count | The actual number of rows in the table under the current snapshot. | +| distinct_count | Estimated number of distinct values (NDV), computed from HyperLogLog. | +| null_count | Number of NULL values in the column. | +| avg_size | Average size in bytes of each value in the column. | + +## Examples + +### Show Statistics for Current Database + +```sql +CREATE DATABASE test_db; +USE test_db; + +CREATE TABLE t1 (id INT, name VARCHAR(50)); +INSERT INTO t1 VALUES (1, 'Alice'), (2, 'Bob'); + +SHOW STATISTICS; +``` + +Output: +``` +database table column_name stats_row_count actual_row_count distinct_count null_count avg_size +test_db t1 id 2 2 2 0 4 +test_db t1 name 2 2 2 0 16 +``` + +### Show Statistics for a Specific Table + +```sql +CREATE TABLE t2 (age INT, city VARCHAR(50)); +INSERT INTO t2 VALUES (25, 'New York'), (30, 'London'); + +SHOW STATISTICS FROM TABLE test_db.t2; +``` + +Output: +``` +database table column_name stats_row_count actual_row_count distinct_count null_count avg_size +test_db t2 age 2 2 2 0 4 +test_db t2 city 2 2 2 0 19 +``` + +### Show Statistics for All Tables in a Database + +```sql +SHOW STATISTICS FROM DATABASE test_db; +``` + +This will show statistics for all tables (`t1` and `t2`) in the `test_db` database. + +## Related Commands + +- [SHOW TABLE STATUS](show-table-status.md): Shows status information about tables diff --git a/docs/en/sql-reference/20-sql-functions/16-system-functions/fuse_statistic.md b/docs/en/sql-reference/20-sql-functions/16-system-functions/fuse_statistic.md index e30d4b4f38..31dbb43a72 100644 --- a/docs/en/sql-reference/20-sql-functions/16-system-functions/fuse_statistic.md +++ b/docs/en/sql-reference/20-sql-functions/16-system-functions/fuse_statistic.md @@ -5,28 +5,6 @@ import FunctionDescription from '@site/src/components/FunctionDescription'; -Returns statistical information about each column in a specified table: - -- `distinct_count`: Returns the estimated number of distinct values. -- `histogram`: Generates histograms for the column, breaking down the distribution of data into buckets. Each bucket includes information such as: - - `bucket id`: The identifier for the bucket. - - `min`: The minimum value within the bucket. - - `max`: The maximum value within the bucket. - - `ndv` (number of distinct values): The count of unique values within the bucket. - - `count`: The total number of values within the bucket. - -## Syntax - -```sql -FUSE_STATISTIC('', '') -``` - -The `enable_analyze_histogram` setting must be set to `1` for the function to generate histograms. By default, this setting is `0`, meaning histograms are not generated unless explicitly enabled. - -```sql -SET enable_analyze_histogram = 1; -``` - -## Examples - -You're most likely to use this function together with [ANALYZE TABLE](/sql/sql-commands/ddl/table/analyze-table) to check the statistical information of a table. See the [Examples](/sql/sql-commands/ddl/table/analyze-table#examples) section on the [ANALYZE TABLE](/sql/sql-commands/ddl/table/analyze-table) page. \ No newline at end of file +:::note +This function is deprecated. Use [SHOW STATISTICS](/sql/sql-commands/ddl/table/show-statistics) instead to view table statistics. +::: \ No newline at end of file