Skip to content

Latest commit

 

History

History
209 lines (165 loc) · 6.22 KB

20240503_03.md

File metadata and controls

209 lines (165 loc) · 6.22 KB

DuckDB unpivot 行列转换 太好用了

作者

digoal

日期

2024-05-03

标签

PostgreSQL , PolarDB , DuckDB , unpivot , 行列转换 , 统计


背景

使用DuckDB unpivot语法进行行列转换实在是太好用了.

手册:

更多相关参考:

语法:

UNPIVOT ⟨dataset⟩  
ON ⟨column(s)⟩   -- 支持 动态列 exclude 语法  
INTO  
    NAME ⟨name-column-name⟩  
    VALUE ⟨value-column-name(s)⟩  
ORDER BY ⟨column(s)-with-order-direction(s)⟩    
LIMIT ⟨number-of-rows⟩;  

例子

示例数据:

CREATE OR REPLACE TABLE monthly_sales  
    (empid INTEGER, dept TEXT, Jan INTEGER, Feb INTEGER, Mar INTEGER, Apr INTEGER, May INTEGER, Jun INTEGER);  
INSERT INTO monthly_sales VALUES  
    (1, 'electronics', 1, 2, 3, 4, 5, 6),  
    (2, 'clothes', 10, 20, 30, 40, 50, 60),  
    (3, 'cars', 100, 200, 300, 400, 500, 600);  

数据

empid dept Jan Feb Mar Apr May Jun
1 electronics 1 2 3 4 5 6
2 clothes 10 20 30 40 50 60
3 cars 100 200 300 400 500 600

sql1:

UNPIVOT monthly_sales    
ON jan, feb, mar, apr, may, jun  -- 原表字段名    
INTO  
    NAME month  -- 原表“字段名”作为 value 整合进month字段   
    VALUE sales;  -- 原“字段”下对应的原value 写入sales字段   

返回

empid dept month sales
1 electronics Jan 1
1 electronics Feb 2
1 electronics Mar 3
1 electronics Apr 4
1 electronics May 5
1 electronics Jun 6
2 clothes Jan 10
2 clothes Feb 20
2 clothes Mar 30
2 clothes Apr 40
2 clothes May 50
2 clothes Jun 60
3 cars Jan 100
3 cars Feb 200
3 cars Mar 300
3 cars Apr 400
3 cars May 500
3 cars Jun 600

sql2:

UNPIVOT monthly_sales  
    ON (jan, feb, mar) AS q1, (apr, may, jun) AS q2   -- 原表字段名分成2组, 对应2个"组名"      
    INTO  
        NAME quarter  -- 原表“字段组名”作为 value 整合进quarter字段    
        VALUE month_1_sales, month_2_sales, month_3_sales;    -- 原“字段组”下对应的3个原value 分别写入3个字段 (month_1_sales, month_2_sales, month_3_sales)     
-- ON 分组时, 每个组内的字段个数必须一致    
-- ON 分组时, VALUE里面的字段个数 必须等于 ON一个分组内的字段个数    

返回

empid dept quarter month_1_sales month_2_sales month_3_sales
1 electronics q1 1 2 3
1 electronics q2 4 5 6
2 clothes q1 10 20 30
2 clothes q2 40 50 60
3 cars q1 100 200 300
3 cars q2 400 500 600

sql3:

-- unpivot用于alias或cte中  
WITH unpivot_alias AS (  
    UNPIVOT monthly_sales  
    ON COLUMNS(* EXCLUDE (empid, dept))   -- ON 支持 COLUMNS exclude 语法, 不需要手工写入所有列名    
    INTO  
        NAME month  
        VALUE sales  
)  
SELECT * FROM unpivot_alias;  
-- unpivot用于alias或cte中  
SELECT *  
FROM (  
    UNPIVOT monthly_sales  
    ON COLUMNS(* EXCLUDE (empid, dept))  -- ON 支持 COLUMNS exclude 语法, 不需要手工写入所有列名    
    INTO  
        NAME month  
        VALUE sales  
) unpivot_alias;  

sql4:

-- unpivot 不仅可以处理表, 也可以处理subquery。  
UNPIVOT  
    (SELECT 42 as col1, 'woot' as col2)  
    ON  
        (col1 * 2)::VARCHAR,    
        col2;    
-- 当没有指定INTO name value时, 默认使用列名"name", "value".  

返回

name value
col1 84
col2 woot

DuckDB也支持sql标准unpivot语法:

FROM [dataset]  
UNPIVOT [INCLUDE NULLS] (  
    [value-column-name(s)]  
    FOR [name-column-name] IN [column(s)]  
);  

用法举例:

FROM monthly_sales UNPIVOT (  
    sales  
    FOR month IN (jan, feb, mar, apr, may, jun)  
);  
  
FROM monthly_sales UNPIVOT (  
    sales  
    FOR month IN (columns(* EXCLUDE (empid, dept)))  
);  
  
FROM monthly_sales  
UNPIVOT (  
    (month_1_sales, month_2_sales, month_3_sales)  
    FOR quarter IN (  
        (jan, feb, mar) AS q1,  
        (apr, may, jun) AS q2  
    )  
);  

digoal's wechat