Skip to content

Latest commit

 

History

History
127 lines (38 loc) · 5.03 KB

20200412_03.md

File metadata and controls

127 lines (38 loc) · 5.03 KB

一款兼容PostgreSQL协议的时序数据库 - QuestDB

作者

digoal

日期

2020-04-12

标签

PostgreSQL , QuestDB , 时序数据库 , vops


背景

What is QuestDB

QuestDB is an open-source NewSQL relational database designed to process time-series data, faster. Our approach comes from low-latency trading; QuestDB’s stack is engineered from scratch, zero-GC Java and dependency-free.

QuestDB ingests data via HTTP, PostgreSQL wire protocol, Influx line protocol or directly from Java. Reading data is done using SQL via HTTP, PostgreSQL wire protocol or via Java API. The whole database and console fits in a 3.5Mb package.

https://github.com/questdb/questdb

https://www.questdb.io/blog/2020/04/02/using-simd-to-aggregate-billions-of-rows-per-second

Summary

We have implemented SIMD-based vector execution of queries, such as select sum(value) from table. This is ~100x faster than non-vector based execution. This is just the beginning as we will introduce vectors to more operations going forward. Try our first implementation in this release - stay tuned for more features in the upcoming releases!

questdb 支持simd向量指令, 以及瓦片结构, 所以快.

PostgreSQL 同样有类似的插件: 与之类似的PG插件有vops.

questdb 宣传的对比数据中, PG是原生的, 并没有使用vops, 所以看起来questdb比pg快很多, 实际上pg使用vops后, 性能会大幅度提升.

《PostgreSQL VOPS 向量计算 + DBLINK异步并行 - 单实例 10亿 聚合计算跑进2秒》

《PostgreSQL 向量化执行插件(瓦片式实现-vops) 10x提速OLAP》

https://github.com/postgrespro/vops

Vertical storage
Traditional query executor (like Postgres executor) deals with single row of data at each moment of time. If it has to evaluate expression (x+y) then it fetches value of "x", then value of "y", performs operation "+" and returns the result value to the upper node. In contrast vectorized executor is able to process in one operation multiple values. In this case "x" and "y" represent not just a single scalar value, but vector of values and result is also a vector of values. In vector execution model interpretation and function call overhead is divided by size of vector. The price of performing function call is the same, but as far as function proceeds N values instead of just one, this overhead become less critical.

What is the optimal size for the vector? From the explanation above it is clear that the larger vector is, the less per-row overhead we have. So we can form vector from all values of the correspondent table attribute. It is so called vertical data model or columnar store. Unlike classical "horizontal" data model where the unit of storing data is row (tuple), here we have vertical columns. Columnar store has the following main advantages:

Reduce size of fetched data: only columns used in query need to be fetched
Better compression: storing all values of the same attribute together makes it possible to much better and faster compress them, for example using delta encoding.
Minimize interpretation overhead: each operation is perform not for single value, but for set of values
Use CPU vector instruction (SIMD) to process data

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat