digoal
2024-05-03
PostgreSQL , PolarDB , DuckDB , pivot , 行列转换 , 统计 , internals
pivot 用法参考如下, 本文主要演示DuckDB内部如何转换pivot结果? 帮助更好的理解pivot.
《DuckDB 语法糖: Dynamic PIVOT and UNPIVOT 动态行列转换》
参考: https://duckdb.org/docs/sql/statements/pivot
Cities表数据
Country | Name | Year | Population |
---|---|---|---|
NL | Amsterdam | 2000 | 1005 |
NL | Amsterdam | 2010 | 1065 |
NL | Amsterdam | 2020 | 1158 |
US | Seattle | 2000 | 564 |
US | Seattle | 2010 | 608 |
US | Seattle | 2020 | 738 |
US | New York City | 2000 | 8015 |
US | New York City | 2010 | 8175 |
US | New York City | 2020 | 8772 |
以下pivot 语句的内部转换过程:
PIVOT Cities
ON Year
USING sum(Population);
1、首先需要创建一个枚举, 存储ON字段的distinct值
CREATE TEMPORARY TYPE __pivot_enum_0_0 AS ENUM (
SELECT DISTINCT
Year::VARCHAR
FROM Cities
ORDER BY
Year
);
2、将ON字段映射到以上临时ENUM枚举类型中
PIVOT Cities
ON Year IN __pivot_enum_0_0
USING sum(Population);
3、使用list表达ON和USING的内容
SELECT Country, Name, list(Year), list(population_sum)
FROM (
-- agg $USING ...$
SELECT Country, Name, Year, sum(population) AS population_sum
FROM Cities
GROUP BY ALL
)
GROUP BY ALL;
Country | Name | list(“YEAR”) | list(population_sum) |
---|---|---|---|
NL | Amsterdam | [2000, 2010, 2020] | [1005, 1065, 1158] |
US | Seattle | [2000, 2010, 2020] | [564, 608, 738] |
US | New York City | [2000, 2010, 2020] | [8015, 8175, 8772] |
4、经过PhysicalPivot将ON Year展开为多个字段, 将population_sum映射到展开字段中
Country | Name | 2000 | 2010 | 2020 |
---|---|---|---|---|
NL | Amsterdam | 1005 | 1065 | 1158 |
US | Seattle | 564 | 608 | 738 |
US | New York City | 8015 | 8175 | 8772 |