Skip to content

Latest commit

 

History

History
346 lines (237 loc) · 15.5 KB

20230329_01.md

File metadata and controls

346 lines (237 loc) · 15.5 KB

一款兼容mysql,clickhouse 使用rust写的数据湖产品databend(号称开源版snowflake) - 适合"时序、IoT、feed?、分析、数据归档"等场景

作者

digoal

日期

2023-03-29

标签

PostgreSQL , PolarDB , databend , rust , olap , 数据湖 , 归档


背景

一款兼容mysql,clickhouse 使用rust写的数据湖产品databend

About

  • A modern cloud data warehouse focusing on reducing cost and complexity for your massive-scale analytics needs. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com

架构如图:

pic

核心也是parquet, arrow这种列存储+向量计算+元数据管理+对象存储, 参见:

基于这种架构的产品适合什么场景? (时序、IoT、feed?、分析、数据归档)

  • 改动少, 追加多(特别是appendonly)
  • 分析计算场景多
  • 需要压缩存储, 节省存储成本

源码

官网

大量大数据产品 benchmark 对比:

从这份benchmark可以看出, duckdb这种轻量化的分析引擎性能已经远超传统olap产品. 得益于向量化和列存储.

What is Databend?

Databend is an open-source Elastic and Workload-Aware modern cloud data warehouse focusing on Low-Cost and Low-Complexity for your massive-scale analytics needs.

Databend uses the latest techniques in vectorized query processing to allow you to do blazing-fast data analytics on object storage:
(S3, Azure Blob, Google Cloud Storage, Alibaba Cloud OSS, Tencent Cloud COS, Huawei Cloud OBS, Cloudflare R2, Wasabi or MinIO).

  • Feature-Rich

    Support for atomic operations including SELECT/INSERT/DELETE/UPDATE/REPLACE/COPY/ALTER and advanced features like Time Travel, Multi Catalog(Apache Hive/Apache Iceberg).

  • Instant Elasticity

    Databend completely separates storage from compute, which allows you easily scale up or scale down based on your application's needs.

  • Blazing Performance

    Databend leverages data-level parallelism(Vectorized Query Execution) and instruction-level parallelism(SIMD) technology, offering blazing performance data analytics.

  • Git-like MVCC Storage

    Databend stores data with snapshots, enabling users to effortlessly query, clone, or restore data from any history timepoint.

  • Support for Semi-Structured Data

    Databend supports ingestion of semi-structured data in various formats like CSV, JSON, and Parquet, which are located in the cloud or your local file system; Databend also supports semi-structured data types: ARRAY, TUPLE, MAP, JSON, which is easy to import and operate on semi-structured.

  • MySQL/ClickHouse Compatible

    Databend is ANSI SQL compliant and MySQL/ClickHouse wire protocol compatible, making it easy to connect with existing tools(MySQL Client, ClickHouse HTTP Handler, Vector, DBeaver, Jupyter, JDBC, etc.).

  • Easy to Use

    Databend has no indexes to build, no manual tuning required, no manual figuring out partitions or shard data, it’s all done for you as data is loaded into the table.

Architecture

databend-arch

试用 databend

下载docker

docker pull datafuselabs/databend    
    
    
docker run -d -it --cap-add=SYS_PTRACE --privileged=true --name databend datafuselabs/databend    
    
docker exec -it -u root --privileged -w /root databend /bin/bash    

查看配置

root@3b7d98544288:~# ps -ewf|grep bend    
root        11     7  0 09:38 pts/0    00:00:00 databend-meta --log-file-dir /var/log/databend --log-stderr-level WARN --raft-dir /var/lib/databend/meta --single    
root        31     7  0 09:38 pts/0    00:00:00 databend-query -c /etc/databend/query.toml    
    
    
    
root@3b7d98544288:~# more /etc/databend/query.toml    
[query]    
max_active_sessions = 256    
wait_timeout_mills = 5000    
    
flight_api_address = "0.0.0.0:9090"    
admin_api_address = "0.0.0.0:8080"    
metric_api_address = "0.0.0.0:7070"    
    
mysql_handler_host = "0.0.0.0"    
mysql_handler_port = 3307    
    
clickhouse_http_handler_host = "0.0.0.0"    
clickhouse_http_handler_port = 8124    
    
http_handler_host = "0.0.0.0"    
http_handler_port = 8000    
    
tenant_id = "default"    
cluster_id = "default"    
    
[log]    
    
[log.stderr]    
level = "WARN"    
format = "text"    
    
[log.file]    
level = "INFO"    
dir = "/var/log/databend"    
    
[meta]    
endpoints = ["0.0.0.0:9191"]    
username = "root"    
password = "root"    
client_timeout_in_second = 60    
[storage]    
type = "fs"    
[storage.fs]    
data_path = "/var/lib/databend/query"    

安装mysql客户端

apt update    
apt install -y mysql-client    

连接databend进行简单测试, 这类主打分析的产品性能真的很爆.

root@3b7d98544288:~# mysql -h 127.0.0.1 -P 3307 -u root         
Welcome to the MySQL monitor.  Commands end with ; or \g.    
Your MySQL connection id is 13    
Server version: 8.0.26-v1.0.38-nightly-ef01c31da3a7cf38fa715b36e428baf135a43bdc(rust-1.70.0-nightly-2023-03-28T23:34:59.012648491Z) 0    
    
Copyright (c) 2000, 2023, Oracle and/or its affiliates.    
    
Oracle is a registered trademark of Oracle Corporation and/or its    
affiliates. Other names may be trademarks of their respective    
owners.    
    
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.    
    
mysql> show databases;    
+----------------------+    
| databases_in_default |    
+----------------------+    
| default              |    
| information_schema   |    
| system               |    
+----------------------+    
3 rows in set (0.01 sec)    
Read 3 rows, 116.00 B in 0.006 sec., 485.85 rows/sec., 18.35 KiB/sec.    
    
    
    
mysql> create table tbl (id int, info varchar(512), ts timestamp);    
Query OK, 0 rows affected (0.03 sec)    
    
mysql> insert into tbl values (1, 'test', now());    
Query OK, 1 row affected (0.03 sec)    
    
mysql> select * from tbl;    
+------+------+----------------------------+    
| id   | info | ts                         |    
+------+------+----------------------------+    
|    1 | test | 2023-03-29 09:44:25.137973 |    
+------+------+----------------------------+    
1 row in set (0.02 sec)    
Read 1 rows, 32.00 B in 0.005 sec., 190.37 rows/sec., 5.95 KiB/sec.    
    
mysql> insert into tbl select * from tbl;    
Query OK, 1 row affected (0.03 sec)    
    
........    
    
mysql> insert into tbl select * from tbl;    
Query OK, 4194304 rows affected (1.67 sec)    
    
mysql>     
mysql> select count(*) from tbl;    
+----------+    
| count(*) |    
+----------+    
|  8388608 |    
+----------+    
1 row in set (0.01 sec)    
Read 1 rows, 1.00 B in 0.002 sec., 467.25 rows/sec., 467.25 B/sec.    
    
mysql> select count(distinct id) from tbl;    
+---------+    
| count() |    
+---------+    
|       1 |    
+---------+    
1 row in set (0.03 sec)    
Read 8388608 rows, 32.00 MiB in 0.020 sec., 416.58 million rows/sec., 1.55 GiB/sec.    
    
mysql> select count(distinct ts) from tbl;    
+---------+    
| count() |    
+---------+    
|       1 |    
+---------+    
1 row in set (0.03 sec)    
Read 8388608 rows, 64.00 MiB in 0.022 sec., 389.65 million rows/sec., 2.90 GiB/sec.    
    
mysql> select count(distinct info,ts) from tbl;    
+---------+    
| count() |    
+---------+    
|       1 |    
+---------+    
1 row in set (0.17 sec)    
Read 8388608 rows, 160.03 MiB in 0.163 sec., 51.48 million rows/sec., 982.18 MiB/sec.    
    
mysql> select count(distinct info) from tbl;    
+---------+    
| count() |    
+---------+    
|       1 |    
+---------+    
1 row in set (0.06 sec)    
Read 8388608 rows, 96.03 MiB in 0.041 sec., 203.36 million rows/sec., 2.27 GiB/sec.    
    
    
    
    
root@3b7d98544288:/var/lib/databend/query/1/21/_b# pwd    
/var/lib/databend/query/1/21/_b    
    
    
root@3b7d98544288:/var/lib/databend/query/1/21/_b# ll    
total 33960    
drwxr-xr-x 2 root root    4096 Mar 29 09:44 ./    
drwxr-xr-x 6 root root    4096 Mar 29 09:44 ../    
-rw-r--r-- 1 root root   67975 Mar 29 09:44 03bdfe985396474cb6904c0945181f29_v2.parquet    
-rw-r--r-- 1 root root 4129331 Mar 29 09:44 08be5903e5774c95a5b61ad4f476c9a7_v2.parquet    
-rw-r--r-- 1 root root     723 Mar 29 09:44 0c8133a131274ddfa442de88fb296960_v2.parquet    
-rw-r--r-- 1 root root  137346 Mar 29 09:44 0ed5e3e893304ff0915b0c6270519944_v2.parquet    
-rw-r--r-- 1 root root  244716 Mar 29 09:44 1542ae041a1e4386ba89cddf12aebb2b_v2.parquet    
-rw-r--r-- 1 root root 1134763 Mar 29 09:44 1ff9e7fa1e144601b7937d59ff224189_v2.parquet    
-rw-r--r-- 1 root root    4624 Mar 29 09:44 225eafad06254f4b967f2d72fc5134bf_v2.parquet    
-rw-r--r-- 1 root root  139969 Mar 29 09:44 22911aaa2a0f4eb49ca63c60c9d9dc70_v2.parquet    
-rw-r--r-- 1 root root     594 Mar 29 09:44 230c05bce2fe4b39be70ee42e5cdcc05_v2.parquet    
-rw-r--r-- 1 root root  153782 Mar 29 09:44 28be67fda2524aecbfd53389b79827b9_v2.parquet    
-rw-r--r-- 1 root root     530 Mar 29 09:44 2d4729b1db04475b96cb2a87adb7ba9f_v2.parquet    
-rw-r--r-- 1 root root 3376939 Mar 29 09:44 2f4d83ca31984fb18023c3c5e06c90a2_v2.parquet    
-rw-r--r-- 1 root root 1085083 Mar 29 09:44 3392ae6ad45b4166bcfa1448acdbafcf_v2.parquet    
-rw-r--r-- 1 root root  131415 Mar 29 09:44 354f7d4c178d4f5ba6628c54ba57db8b_v2.parquet    
-rw-r--r-- 1 root root  585062 Mar 29 09:44 3aca1ea18ee64a0fa7234d5dcb7bbb6d_v2.parquet    
-rw-r--r-- 1 root root   51045 Mar 29 09:44 3ad3e36d9e6742babc151cc0bc35fff2_v2.parquet    
-rw-r--r-- 1 root root  592224 Mar 29 09:44 3dba43b942b948a08ee8b26eb7bd5b2c_v2.parquet    
-rw-r--r-- 1 root root  283639 Mar 29 09:44 4506e092c57b41ecbf5e70d4e69f478d_v2.parquet    
-rw-r--r-- 1 root root  505372 Mar 29 09:44 45a5a10c28d9493483b8e790b1884bd9_v2.parquet    
-rw-r--r-- 1 root root 2219429 Mar 29 09:44 492ae1b3331a42c8815fe4879ace0ccd_v2.parquet    
-rw-r--r-- 1 root root  176009 Mar 29 09:44 4e221b7d2d6149a69d56b27b7ae014ba_v2.parquet    
-rw-r--r-- 1 root root     480 Mar 29 09:44 595226aae4214648a655c8c4f2f3bd9c_v2.parquet    
-rw-r--r-- 1 root root     984 Mar 29 09:44 6372e55d5a5d40869ed6aacde0c1b7ee_v2.parquet    
-rw-r--r-- 1 root root     443 Mar 29 09:44 6c650844a2f04c62b0b20964feaffc23_v2.parquet    
-rw-r--r-- 1 root root 4129331 Mar 29 09:44 6e9668fbd09c408bbcc6ffbee3758ab8_v2.parquet    
-rw-r--r-- 1 root root   85040 Mar 29 09:44 721336ec901f49b494deec22125ec8a8_v2.parquet    
-rw-r--r-- 1 root root  535415 Mar 29 09:44 7d921e5a6554475ea829cf88d0dc67cd_v2.parquet    
-rw-r--r-- 1 root root  826220 Mar 29 09:44 85e0ada468d04e49a57a8156b78309a2_v2.parquet    
-rw-r--r-- 1 root root  199063 Mar 29 09:44 90eb4f1ce431410db2806e799a0d722a_v2.parquet    
-rw-r--r-- 1 root root 1020924 Mar 29 09:44 94b5ed15887c4424a6fd165b2b42ea1a_v2.parquet    
-rw-r--r-- 1 root root   34151 Mar 29 09:44 95697682712d4cd4afd02b554c8cae41_v2.parquet    
-rw-r--r-- 1 root root    2543 Mar 29 09:44 988e107feba64f988c3b149cfc11dcf3_v2.parquet    
-rw-r--r-- 1 root root  540472 Mar 29 09:44 a4ec3f5f63b54238a4d2be2846290d2e_v2.parquet    
-rw-r--r-- 1 root root    1504 Mar 29 09:44 b50aec3dbb4f4590a0741070156f94e3_v2.parquet    
-rw-r--r-- 1 root root   17239 Mar 29 09:44 b77e68170ee94bafa8306d1ec836a324_v2.parquet    
-rw-r--r-- 1 root root 1996080 Mar 29 09:44 ba21829944884b808bfe8bb44120b810_v2.parquet    
-rw-r--r-- 1 root root  301749 Mar 29 09:44 bcdbccc5bd45446f958cd72098ee24e2_v2.parquet    
-rw-r--r-- 1 root root   13913 Mar 29 09:44 bdeae0db0c3b430d8803533cb32fa17a_v2.parquet    
-rw-r--r-- 1 root root    8783 Mar 29 09:44 c028cf23ae954319841474156008c7eb_v2.parquet    
-rw-r--r-- 1 root root 1090284 Mar 29 09:44 ca06a9e6500c41948e7031d16d4dc5e5_v2.parquet    
-rw-r--r-- 1 root root 2207744 Mar 29 09:44 cffcaef705ce401d8e73862bbb30e0d5_v2.parquet    
-rw-r--r-- 1 root root 2237402 Mar 29 09:44 d3a444f4e60341fbbd1e6ac05a778985_v2.parquet    
-rw-r--r-- 1 root root     443 Mar 29 09:44 dbb00d85e1504563b4f2e14773033f3e_v2.parquet    
-rw-r--r-- 1 root root     464 Mar 29 09:44 ed3f7a56aeaa4ee6b0d0fbd4d3657329_v2.parquet    
-rw-r--r-- 1 root root 4129331 Mar 29 09:44 f19ad3f2a479414c8218c3e37ea3fa7f_v2.parquet    
-rw-r--r-- 1 root root  253704 Mar 29 09:44 fa7ed6080d1047549c38baed9e329ef9_v2.parquet    
-rw-r--r-- 1 root root     498 Mar 29 09:44 fecf7f712e3545299aacf02890883947_v2.parquet    

digoal's wechat