Skip to content

Latest commit

 

History

History
112 lines (77 loc) · 3.45 KB

20240503_01.md

File metadata and controls

112 lines (77 loc) · 3.45 KB

DuckDB pivot 行列转换 internals

作者

digoal

日期

2024-05-03

标签

PostgreSQL , PolarDB , DuckDB , pivot , 行列转换 , 统计 , internals


背景

pivot 用法参考如下, 本文主要演示DuckDB内部如何转换pivot结果? 帮助更好的理解pivot.

《DuckDB pivot 行列转换 太好用了》

《DuckDB 语法糖: Dynamic PIVOT and UNPIVOT 动态行列转换》

参考: https://duckdb.org/docs/sql/statements/pivot

Cities表数据

Country Name Year Population
NL Amsterdam 2000 1005
NL Amsterdam 2010 1065
NL Amsterdam 2020 1158
US Seattle 2000 564
US Seattle 2010 608
US Seattle 2020 738
US New York City 2000 8015
US New York City 2010 8175
US New York City 2020 8772

以下pivot 语句的内部转换过程:

PIVOT Cities  
ON Year  
USING sum(Population);  

1、首先需要创建一个枚举, 存储ON字段的distinct值

CREATE TEMPORARY TYPE __pivot_enum_0_0 AS ENUM (  
    SELECT DISTINCT  
        Year::VARCHAR  
    FROM Cities  
    ORDER BY  
        Year  
    );  

2、将ON字段映射到以上临时ENUM枚举类型中

PIVOT Cities  
ON Year IN __pivot_enum_0_0  
USING sum(Population);  

3、使用list表达ON和USING的内容

SELECT Country, Name, list(Year), list(population_sum)  
FROM (  
    -- agg $USING ...$  
    SELECT Country, Name, Year, sum(population) AS population_sum  
    FROM Cities  
    GROUP BY ALL   
)  
GROUP BY ALL;  
Country Name list(“YEAR”) list(population_sum)
NL Amsterdam [2000, 2010, 2020] [1005, 1065, 1158]
US Seattle [2000, 2010, 2020] [564, 608, 738]
US New York City [2000, 2010, 2020] [8015, 8175, 8772]

4、经过PhysicalPivot将ON Year展开为多个字段, 将population_sum映射到展开字段中

Country Name 2000 2010 2020
NL Amsterdam 1005 1065 1158
US Seattle 564 608 738
US New York City 8015 8175 8772

digoal's wechat