digoal
2024-05-03
PostgreSQL , PolarDB , DuckDB , unpivot , 行列转换 , 统计
使用DuckDB unpivot语法进行行列转换实在是太好用了.
手册:
更多相关参考:
- 《DuckDB pivot 行列转换 太好用了》
- 《DuckDB pivot 行列转换 internals》
- 《DuckDB 语法糖: Dynamic PIVOT and UNPIVOT 动态行列转换》
- 《DuckDB select 语法糖: exclude, replace columns》
- 《DuckDB 语法糖: Dynamic column selection 支持通配符,exclude,replace,lambda 等动态列选择|值替换|列选择》
语法:
UNPIVOT ⟨dataset⟩
ON ⟨column(s)⟩ -- 支持 动态列 exclude 语法
INTO
NAME ⟨name-column-name⟩
VALUE ⟨value-column-name(s)⟩
ORDER BY ⟨column(s)-with-order-direction(s)⟩
LIMIT ⟨number-of-rows⟩;
例子
示例数据:
CREATE OR REPLACE TABLE monthly_sales
(empid INTEGER, dept TEXT, Jan INTEGER, Feb INTEGER, Mar INTEGER, Apr INTEGER, May INTEGER, Jun INTEGER);
INSERT INTO monthly_sales VALUES
(1, 'electronics', 1, 2, 3, 4, 5, 6),
(2, 'clothes', 10, 20, 30, 40, 50, 60),
(3, 'cars', 100, 200, 300, 400, 500, 600);
数据
empid | dept | Jan | Feb | Mar | Apr | May | Jun |
---|---|---|---|---|---|---|---|
1 | electronics | 1 | 2 | 3 | 4 | 5 | 6 |
2 | clothes | 10 | 20 | 30 | 40 | 50 | 60 |
3 | cars | 100 | 200 | 300 | 400 | 500 | 600 |
sql1:
UNPIVOT monthly_sales
ON jan, feb, mar, apr, may, jun -- 原表字段名
INTO
NAME month -- 原表“字段名”作为 value 整合进month字段
VALUE sales; -- 原“字段”下对应的原value 写入sales字段
返回
empid | dept | month | sales |
---|---|---|---|
1 | electronics | Jan | 1 |
1 | electronics | Feb | 2 |
1 | electronics | Mar | 3 |
1 | electronics | Apr | 4 |
1 | electronics | May | 5 |
1 | electronics | Jun | 6 |
2 | clothes | Jan | 10 |
2 | clothes | Feb | 20 |
2 | clothes | Mar | 30 |
2 | clothes | Apr | 40 |
2 | clothes | May | 50 |
2 | clothes | Jun | 60 |
3 | cars | Jan | 100 |
3 | cars | Feb | 200 |
3 | cars | Mar | 300 |
3 | cars | Apr | 400 |
3 | cars | May | 500 |
3 | cars | Jun | 600 |
sql2:
UNPIVOT monthly_sales
ON (jan, feb, mar) AS q1, (apr, may, jun) AS q2 -- 原表字段名分成2组, 对应2个"组名"
INTO
NAME quarter -- 原表“字段组名”作为 value 整合进quarter字段
VALUE month_1_sales, month_2_sales, month_3_sales; -- 原“字段组”下对应的3个原value 分别写入3个字段 (month_1_sales, month_2_sales, month_3_sales)
-- ON 分组时, 每个组内的字段个数必须一致
-- ON 分组时, VALUE里面的字段个数 必须等于 ON一个分组内的字段个数
返回
empid | dept | quarter | month_1_sales | month_2_sales | month_3_sales |
---|---|---|---|---|---|
1 | electronics | q1 | 1 | 2 | 3 |
1 | electronics | q2 | 4 | 5 | 6 |
2 | clothes | q1 | 10 | 20 | 30 |
2 | clothes | q2 | 40 | 50 | 60 |
3 | cars | q1 | 100 | 200 | 300 |
3 | cars | q2 | 400 | 500 | 600 |
sql3:
-- unpivot用于alias或cte中
WITH unpivot_alias AS (
UNPIVOT monthly_sales
ON COLUMNS(* EXCLUDE (empid, dept)) -- ON 支持 COLUMNS exclude 语法, 不需要手工写入所有列名
INTO
NAME month
VALUE sales
)
SELECT * FROM unpivot_alias;
-- unpivot用于alias或cte中
SELECT *
FROM (
UNPIVOT monthly_sales
ON COLUMNS(* EXCLUDE (empid, dept)) -- ON 支持 COLUMNS exclude 语法, 不需要手工写入所有列名
INTO
NAME month
VALUE sales
) unpivot_alias;
sql4:
-- unpivot 不仅可以处理表, 也可以处理subquery。
UNPIVOT
(SELECT 42 as col1, 'woot' as col2)
ON
(col1 * 2)::VARCHAR,
col2;
-- 当没有指定INTO name value时, 默认使用列名"name", "value".
返回
name | value |
---|---|
col1 | 84 |
col2 | woot |
DuckDB也支持sql标准unpivot语法:
FROM [dataset]
UNPIVOT [INCLUDE NULLS] (
[value-column-name(s)]
FOR [name-column-name] IN [column(s)]
);
用法举例:
FROM monthly_sales UNPIVOT (
sales
FOR month IN (jan, feb, mar, apr, may, jun)
);
FROM monthly_sales UNPIVOT (
sales
FOR month IN (columns(* EXCLUDE (empid, dept)))
);
FROM monthly_sales
UNPIVOT (
(month_1_sales, month_2_sales, month_3_sales)
FOR quarter IN (
(jan, feb, mar) AS q1,
(apr, may, jun) AS q2
)
);