digoal
2023-03-29
PostgreSQL , PolarDB , databend , rust , olap , 数据湖 , 归档
一款兼容mysql,clickhouse 使用rust写的数据湖产品databend
About
- A modern cloud data warehouse focusing on reducing cost and complexity for your massive-scale analytics needs. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com
架构如图:
核心也是parquet, arrow这种列存储+向量计算+元数据管理+对象存储, 参见:
基于这种架构的产品适合什么场景? (时序、IoT、feed?、分析、数据归档)
- 改动少, 追加多(特别是appendonly)
- 分析计算场景多
- 需要压缩存储, 节省存储成本
源码
官网
大量大数据产品 benchmark 对比:
从这份benchmark可以看出, duckdb这种轻量化的分析引擎性能已经远超传统olap产品. 得益于向量化和列存储.
Databend is an open-source Elastic and Workload-Aware modern cloud data warehouse focusing on Low-Cost and Low-Complexity for your massive-scale analytics needs.
Databend uses the latest techniques in vectorized query processing to allow you to do blazing-fast data analytics on object storage:
(S3, Azure Blob, Google Cloud Storage, Alibaba Cloud OSS, Tencent Cloud COS, Huawei Cloud OBS, Cloudflare R2, Wasabi or MinIO).
-
Feature-Rich
Support for atomic operations including
SELECT/INSERT/DELETE/UPDATE/REPLACE/COPY/ALTER
and advanced features like Time Travel, Multi Catalog(Apache Hive/Apache Iceberg). -
Instant Elasticity
Databend completely separates storage from compute, which allows you easily scale up or scale down based on your application's needs.
-
Blazing Performance
Databend leverages data-level parallelism(Vectorized Query Execution) and instruction-level parallelism(SIMD) technology, offering blazing performance data analytics.
-
Git-like MVCC Storage
Databend stores data with snapshots, enabling users to effortlessly query, clone, or restore data from any history timepoint.
-
Support for Semi-Structured Data
Databend supports ingestion of semi-structured data in various formats like CSV, JSON, and Parquet, which are located in the cloud or your local file system; Databend also supports semi-structured data types: ARRAY, TUPLE, MAP, JSON, which is easy to import and operate on semi-structured.
-
MySQL/ClickHouse Compatible
Databend is ANSI SQL compliant and MySQL/ClickHouse wire protocol compatible, making it easy to connect with existing tools(MySQL Client, ClickHouse HTTP Handler, Vector, DBeaver, Jupyter, JDBC, etc.).
-
Easy to Use
Databend has no indexes to build, no manual tuning required, no manual figuring out partitions or shard data, it’s all done for you as data is loaded into the table.
下载docker
docker pull datafuselabs/databend
docker run -d -it --cap-add=SYS_PTRACE --privileged=true --name databend datafuselabs/databend
docker exec -it -u root --privileged -w /root databend /bin/bash
查看配置
root@3b7d98544288:~# ps -ewf|grep bend
root 11 7 0 09:38 pts/0 00:00:00 databend-meta --log-file-dir /var/log/databend --log-stderr-level WARN --raft-dir /var/lib/databend/meta --single
root 31 7 0 09:38 pts/0 00:00:00 databend-query -c /etc/databend/query.toml
root@3b7d98544288:~# more /etc/databend/query.toml
[query]
max_active_sessions = 256
wait_timeout_mills = 5000
flight_api_address = "0.0.0.0:9090"
admin_api_address = "0.0.0.0:8080"
metric_api_address = "0.0.0.0:7070"
mysql_handler_host = "0.0.0.0"
mysql_handler_port = 3307
clickhouse_http_handler_host = "0.0.0.0"
clickhouse_http_handler_port = 8124
http_handler_host = "0.0.0.0"
http_handler_port = 8000
tenant_id = "default"
cluster_id = "default"
[log]
[log.stderr]
level = "WARN"
format = "text"
[log.file]
level = "INFO"
dir = "/var/log/databend"
[meta]
endpoints = ["0.0.0.0:9191"]
username = "root"
password = "root"
client_timeout_in_second = 60
[storage]
type = "fs"
[storage.fs]
data_path = "/var/lib/databend/query"
安装mysql客户端
apt update
apt install -y mysql-client
连接databend进行简单测试, 这类主打分析的产品性能真的很爆.
root@3b7d98544288:~# mysql -h 127.0.0.1 -P 3307 -u root
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 13
Server version: 8.0.26-v1.0.38-nightly-ef01c31da3a7cf38fa715b36e428baf135a43bdc(rust-1.70.0-nightly-2023-03-28T23:34:59.012648491Z) 0
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+----------------------+
| databases_in_default |
+----------------------+
| default |
| information_schema |
| system |
+----------------------+
3 rows in set (0.01 sec)
Read 3 rows, 116.00 B in 0.006 sec., 485.85 rows/sec., 18.35 KiB/sec.
mysql> create table tbl (id int, info varchar(512), ts timestamp);
Query OK, 0 rows affected (0.03 sec)
mysql> insert into tbl values (1, 'test', now());
Query OK, 1 row affected (0.03 sec)
mysql> select * from tbl;
+------+------+----------------------------+
| id | info | ts |
+------+------+----------------------------+
| 1 | test | 2023-03-29 09:44:25.137973 |
+------+------+----------------------------+
1 row in set (0.02 sec)
Read 1 rows, 32.00 B in 0.005 sec., 190.37 rows/sec., 5.95 KiB/sec.
mysql> insert into tbl select * from tbl;
Query OK, 1 row affected (0.03 sec)
........
mysql> insert into tbl select * from tbl;
Query OK, 4194304 rows affected (1.67 sec)
mysql>
mysql> select count(*) from tbl;
+----------+
| count(*) |
+----------+
| 8388608 |
+----------+
1 row in set (0.01 sec)
Read 1 rows, 1.00 B in 0.002 sec., 467.25 rows/sec., 467.25 B/sec.
mysql> select count(distinct id) from tbl;
+---------+
| count() |
+---------+
| 1 |
+---------+
1 row in set (0.03 sec)
Read 8388608 rows, 32.00 MiB in 0.020 sec., 416.58 million rows/sec., 1.55 GiB/sec.
mysql> select count(distinct ts) from tbl;
+---------+
| count() |
+---------+
| 1 |
+---------+
1 row in set (0.03 sec)
Read 8388608 rows, 64.00 MiB in 0.022 sec., 389.65 million rows/sec., 2.90 GiB/sec.
mysql> select count(distinct info,ts) from tbl;
+---------+
| count() |
+---------+
| 1 |
+---------+
1 row in set (0.17 sec)
Read 8388608 rows, 160.03 MiB in 0.163 sec., 51.48 million rows/sec., 982.18 MiB/sec.
mysql> select count(distinct info) from tbl;
+---------+
| count() |
+---------+
| 1 |
+---------+
1 row in set (0.06 sec)
Read 8388608 rows, 96.03 MiB in 0.041 sec., 203.36 million rows/sec., 2.27 GiB/sec.
root@3b7d98544288:/var/lib/databend/query/1/21/_b# pwd
/var/lib/databend/query/1/21/_b
root@3b7d98544288:/var/lib/databend/query/1/21/_b# ll
total 33960
drwxr-xr-x 2 root root 4096 Mar 29 09:44 ./
drwxr-xr-x 6 root root 4096 Mar 29 09:44 ../
-rw-r--r-- 1 root root 67975 Mar 29 09:44 03bdfe985396474cb6904c0945181f29_v2.parquet
-rw-r--r-- 1 root root 4129331 Mar 29 09:44 08be5903e5774c95a5b61ad4f476c9a7_v2.parquet
-rw-r--r-- 1 root root 723 Mar 29 09:44 0c8133a131274ddfa442de88fb296960_v2.parquet
-rw-r--r-- 1 root root 137346 Mar 29 09:44 0ed5e3e893304ff0915b0c6270519944_v2.parquet
-rw-r--r-- 1 root root 244716 Mar 29 09:44 1542ae041a1e4386ba89cddf12aebb2b_v2.parquet
-rw-r--r-- 1 root root 1134763 Mar 29 09:44 1ff9e7fa1e144601b7937d59ff224189_v2.parquet
-rw-r--r-- 1 root root 4624 Mar 29 09:44 225eafad06254f4b967f2d72fc5134bf_v2.parquet
-rw-r--r-- 1 root root 139969 Mar 29 09:44 22911aaa2a0f4eb49ca63c60c9d9dc70_v2.parquet
-rw-r--r-- 1 root root 594 Mar 29 09:44 230c05bce2fe4b39be70ee42e5cdcc05_v2.parquet
-rw-r--r-- 1 root root 153782 Mar 29 09:44 28be67fda2524aecbfd53389b79827b9_v2.parquet
-rw-r--r-- 1 root root 530 Mar 29 09:44 2d4729b1db04475b96cb2a87adb7ba9f_v2.parquet
-rw-r--r-- 1 root root 3376939 Mar 29 09:44 2f4d83ca31984fb18023c3c5e06c90a2_v2.parquet
-rw-r--r-- 1 root root 1085083 Mar 29 09:44 3392ae6ad45b4166bcfa1448acdbafcf_v2.parquet
-rw-r--r-- 1 root root 131415 Mar 29 09:44 354f7d4c178d4f5ba6628c54ba57db8b_v2.parquet
-rw-r--r-- 1 root root 585062 Mar 29 09:44 3aca1ea18ee64a0fa7234d5dcb7bbb6d_v2.parquet
-rw-r--r-- 1 root root 51045 Mar 29 09:44 3ad3e36d9e6742babc151cc0bc35fff2_v2.parquet
-rw-r--r-- 1 root root 592224 Mar 29 09:44 3dba43b942b948a08ee8b26eb7bd5b2c_v2.parquet
-rw-r--r-- 1 root root 283639 Mar 29 09:44 4506e092c57b41ecbf5e70d4e69f478d_v2.parquet
-rw-r--r-- 1 root root 505372 Mar 29 09:44 45a5a10c28d9493483b8e790b1884bd9_v2.parquet
-rw-r--r-- 1 root root 2219429 Mar 29 09:44 492ae1b3331a42c8815fe4879ace0ccd_v2.parquet
-rw-r--r-- 1 root root 176009 Mar 29 09:44 4e221b7d2d6149a69d56b27b7ae014ba_v2.parquet
-rw-r--r-- 1 root root 480 Mar 29 09:44 595226aae4214648a655c8c4f2f3bd9c_v2.parquet
-rw-r--r-- 1 root root 984 Mar 29 09:44 6372e55d5a5d40869ed6aacde0c1b7ee_v2.parquet
-rw-r--r-- 1 root root 443 Mar 29 09:44 6c650844a2f04c62b0b20964feaffc23_v2.parquet
-rw-r--r-- 1 root root 4129331 Mar 29 09:44 6e9668fbd09c408bbcc6ffbee3758ab8_v2.parquet
-rw-r--r-- 1 root root 85040 Mar 29 09:44 721336ec901f49b494deec22125ec8a8_v2.parquet
-rw-r--r-- 1 root root 535415 Mar 29 09:44 7d921e5a6554475ea829cf88d0dc67cd_v2.parquet
-rw-r--r-- 1 root root 826220 Mar 29 09:44 85e0ada468d04e49a57a8156b78309a2_v2.parquet
-rw-r--r-- 1 root root 199063 Mar 29 09:44 90eb4f1ce431410db2806e799a0d722a_v2.parquet
-rw-r--r-- 1 root root 1020924 Mar 29 09:44 94b5ed15887c4424a6fd165b2b42ea1a_v2.parquet
-rw-r--r-- 1 root root 34151 Mar 29 09:44 95697682712d4cd4afd02b554c8cae41_v2.parquet
-rw-r--r-- 1 root root 2543 Mar 29 09:44 988e107feba64f988c3b149cfc11dcf3_v2.parquet
-rw-r--r-- 1 root root 540472 Mar 29 09:44 a4ec3f5f63b54238a4d2be2846290d2e_v2.parquet
-rw-r--r-- 1 root root 1504 Mar 29 09:44 b50aec3dbb4f4590a0741070156f94e3_v2.parquet
-rw-r--r-- 1 root root 17239 Mar 29 09:44 b77e68170ee94bafa8306d1ec836a324_v2.parquet
-rw-r--r-- 1 root root 1996080 Mar 29 09:44 ba21829944884b808bfe8bb44120b810_v2.parquet
-rw-r--r-- 1 root root 301749 Mar 29 09:44 bcdbccc5bd45446f958cd72098ee24e2_v2.parquet
-rw-r--r-- 1 root root 13913 Mar 29 09:44 bdeae0db0c3b430d8803533cb32fa17a_v2.parquet
-rw-r--r-- 1 root root 8783 Mar 29 09:44 c028cf23ae954319841474156008c7eb_v2.parquet
-rw-r--r-- 1 root root 1090284 Mar 29 09:44 ca06a9e6500c41948e7031d16d4dc5e5_v2.parquet
-rw-r--r-- 1 root root 2207744 Mar 29 09:44 cffcaef705ce401d8e73862bbb30e0d5_v2.parquet
-rw-r--r-- 1 root root 2237402 Mar 29 09:44 d3a444f4e60341fbbd1e6ac05a778985_v2.parquet
-rw-r--r-- 1 root root 443 Mar 29 09:44 dbb00d85e1504563b4f2e14773033f3e_v2.parquet
-rw-r--r-- 1 root root 464 Mar 29 09:44 ed3f7a56aeaa4ee6b0d0fbd4d3657329_v2.parquet
-rw-r--r-- 1 root root 4129331 Mar 29 09:44 f19ad3f2a479414c8218c3e37ea3fa7f_v2.parquet
-rw-r--r-- 1 root root 253704 Mar 29 09:44 fa7ed6080d1047549c38baed9e329ef9_v2.parquet
-rw-r--r-- 1 root root 498 Mar 29 09:44 fecf7f712e3545299aacf02890883947_v2.parquet