add sybase iq and carbon data query performance comparison doc chines…

…e doc to carbondata
apache · Dec 20, 2019 · d555a5f · d555a5f
1 parent fdcfcbf
commit d555a5f
Showing 1 changed file with 105 additions and 0 deletions.
diff --git a/docs/zh_cn/SybaseIQ和CarbonData查询性能对比.md b/docs/zh_cn/SybaseIQ和CarbonData查询性能对比.md
@@ -0,0 +1,105 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more 
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership. 
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with 
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing, software 
+    distributed under the License is distributed on an "AS IS" BASIS, 
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and 
+    limitations under the License.
+-->
+
+## Carbondata 替换Sybase IQ查询性能对比
+
+本文主要在于给用户呈现Carbondata在替换Syabse IQ过程中对于Sybase IQ的查询性能，Carbondata自身的优势和特点，本文的数据仅为基于某领域查询特点框架下SQL的查询结果，只代表该特定查询特点下的性能对比。
+
+
+
+## 1.集群状态对比
+
+| 集群       | 描述                                                      |
+| ---------- | --------------------------------------------------------- |
+| IQ集群     | 1个加载节点，1个协调节点，1个查询节点，SSD硬盘，磁阵      |
+| Hadoop集群 | 2个namenode，6个datanode，STAT硬盘，查询队列分配1/6的资源 |
+
+## 2.查询SQL模型介绍
+
+IQ与Carbon查询SQL本身存在差异，在执行性能测试之前需要对SQL进行修改。
+
+```IQ的查询SQL模型：```
+
+SELECT TOP 5000 SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_C , SUM(COALESCE(COLUMN_A, 0)) AS COLUMN_A_A , SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_B_B , SUM(COALESCE(COLUMN_D, 0)) + SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_F , SUM(COALESCE(COLUMN_D, 0)) AS COLUMN_D_D , SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_E_E , (SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0))) * 8 / 72000 / 1024 AS COLUMN_F , SUM(COALESCE(COLUMN_A, 0)) * 8 / 72000 / 1024 AS COLUMN_G , SUM(COALESCE(COLUMN_B, 0)) * 8 / 72000 / 1024 AS COLUMN_H , MT."202080101" AS "202080101", COUNT(1) OVER () AS countNum FROM ( SELECT COALESCE(SUM("COLUMN_1_A"), 0) AS COLUMN_A , COALESCE(SUM("COLUMN_1_B"), 0) AS COLUMN_B , COALESCE(SUM("COLUMN_1_E"), 0) AS COLUMN_E , COALESCE(SUM("COLUMN_1_D"), 0) AS COLUMN_D , TABLE_A."202080101" AS "202080101" FROM TABLE_B LEFT JOIN ( SELECT "COLUMN_CSI" AS "202050101" , CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END AS "202080101" , CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END AS NAME_202080101 FROM DIMENSION_TABLE GROUP BY "COLUMN_CSI", CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END, CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END ) TABLE_A ON "COLUMN_CSI" = TABLE_A."202050101" WHERE TABLE_A.NAME_202080101 IS NOT NULL AND "TIME" < 1576087200 AND "TIME" >= 1576015200 GROUP BY TABLE_A."202080101" ) MT GROUP BY MT."202080101" ORDER BY COLUMN_C DESC
+
+其中一个SUM后面称为一个counter
+
+```Spark的查询SQL模型：```
+
+SELECT COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0) AS COLUMN_C , COALESCE(SUM(COLUMN_A), 0) AS COLUMN_A_A , COALESCE(SUM(COLUMN_B), 0) AS COLUMN_B_B , COALESCE(SUM(COLUMN_D), 0) + COALESCE(SUM(COLUMN_E), 0) AS COLUMN_F , COALESCE(SUM(COLUMN_D), 0) AS COLUMN_D_D , COALESCE(SUM(COLUMN_E), 0) AS COLUMN_E_E , (COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0)) * 8 / 72000 / 1024 AS COLUMN_F , COALESCE(SUM(COLUMN_A), 0) * 8 / 72000 / 1024 AS COLUMN_G , COALESCE(SUM(COLUMN_B), 0) * 8 / 72000 / 1024 AS COLUMN_H , MT.`202080101` AS `202080101` FROM ( SELECT `COLUMN_1_A` AS COLUMN_A, `COLUMN_1_E` AS COLUMN_E, `COLUMN_1_B` AS COLUMN_B, `COLUMN_1_D` AS COLUMN_D, TABLE_A.`202080101` AS `202080101` FROM TABLE_B LEFT JOIN ( SELECT `COLUMN_CSI` AS `202050101` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END AS `202080101` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END AS NAME_202080101 FROM DIMENSION_TABLE GROUP BY `COLUMN_CSI`, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END ) TABLE_A ON `COLUMN_CSI` = TABLE_A.`202050101` WHERE TABLE_A.NAME_202080101 IS NOT NULL AND `TIME` >= 1576015200 AND `TIME` < 1576087200 ) MT GROUP BY MT.`202080101` ORDER BY COLUMN_C DESC LIMIT 5000
+
+## 3.Carbon主要配置参数
+
+```主要配置```
+
+| Carbon主要配置                       | 参数值 | 描述                                                         |
+| ------------------------------------ | ------ | ------------------------------------------------------------ |
+| carbon.inmemory.record.size          | 480000 | 查询每个表需要加载到内存的总行数。                           |
+| carbon.number.of.cores               | 4      | carbon查询过程中并行扫描的线程数。                           |
+| carbon.number.of.cores.while.loading | 15     | carbon数据加载过程中并行扫描的线程数。                       |
+| carbon.sort.file.buffer.size         | 20     | 在合并排序(读/写)操作时存储每个临时过程文件的所使用的总缓存大小。单位为MB |
+| carbon.sort.size                     | 500000 | 在数据加载操作时，每次被排序的记录数。                       |
+| Spark主要配置                        |        |                                                              |
+| spark.sql.shuffle.partitions         | 70     |                                                              |
+| spark.executor.instances             | 6      |                                                              |
+| spark.executor.cores                 | 13     |                                                              |
+| spark.locality.wait                  | 0      |                                                              |
+| spark.executor.memory                | 5G     |                                                              |
+| spark.driver.cores                   | 3      |                                                              |
+| spark.driver.memory                  | 50G    |                                                              |
+| spark.sql.codegen.wholeStage         | True   |                                                              |
+| spark.sql.codegen.hugeMethodLimit    | 8000   |                                                              |
+
+## 4.不同数量级查询性能对比结果：
+
+| Data  volume +counter number | Data volume | Data size     | counter number | IQ（SSD）+ 磁阵  平均耗时（s） | opensource carbondata  平均耗时（s） |
+| ---------------------------- | ----------- | ------------- | -------------- | ------------------------------ | ------------------------------------ |
+| 100K_9Counter                | 100K        | 100,000       | 9Counter       | 0.91                           | 3.53                                 |
+| 100K_18Counter               | 100K        | 100,000       | 18Counter      | 1.30                           | 3.81                                 |
+| 100K_36Counter               | 100K        | 100,000       | 36Counter      | 1.87                           | 4.29                                 |
+| 100K_72Counter               | 100K        | 100,000       | 72Counter      | 3.82                           | 5.09                                 |
+| 500K_9Counter                | 500K        | 500,000       | 9Counter       | 1.47                           | 4.04                                 |
+| 500K_18Counter               | 500K        | 500,000       | 18Counter      | 1.98                           | 4.61                                 |
+| 500K_36Counter               | 500K        | 500,000       | 36Counter      | 2.99                           | 5.63                                 |
+| 500K_72Counter               | 500K        | 500,000       | 72Counter      | 5.67                           | 7.53                                 |
+| 1M_9Counter                  | 1M          | 1,000,000     | 9Counter       | 4.72                           | 4.24                                 |
+| 1M_18Counter                 | 1M          | 1,000,000     | 18Counter      | 5.13                           | 4.84                                 |
+| 1M_36Counter                 | 1M          | 1,000,000     | 36Counter      | 6.55                           | 5.83                                 |
+| 1M_72Counter                 | 1M          | 1,000,000     | 72Counter      | 10.83                          | 7.90                                 |
+| 5M_9Counter                  | 5M          | 5,000,000     | 9Counter       | 5.82                           | 4.59                                 |
+| 5M_18Counter                 | 5M          | 5,000,000     | 18Counter      | 7.70                           | 5.26                                 |
+| 5M_36Counter                 | 5M          | 5,000,000     | 36Counter      | 11.32                          | 6.73                                 |
+| 5M_72Counter                 | 5M          | 5,000,000     | 72Counter      | 21.78                          | 9.27                                 |
+| 10M_9Counter                 | 10M         | 10,000,000    | 9Counter       | 7.98                           | 5.32                                 |
+| 10M_18Counter                | 10M         | 10,000,000    | 18Counter      | 11.39                          | 6.03                                 |
+| 10M_36Counter                | 10M         | 10,000,000    | 36Counter      | 17.40                          | 7.43                                 |
+| 10M_72Counter                | 10M         | 10,000,000    | 72Counter      | 34.50                          | 10.48                                |
+| 50M_9Counter                 | 50M         | 50,000,000    | 9Counter       | 16.89                          | 8.95                                 |
+| 50M_18Counter                | 50M         | 50,000,000    | 18Counter      | 25.50                          | 10.42                                |
+| 50M_36Counter                | 50M         | 50,000,000    | 36Counter      | 268.10                         | 12.78                                |
+| 50M_72Counter                | 50M         | 50,000,000    | 72Counter      | 554.16                         | 18.79                                |
+| 100M_9Counter                | 100M        | 100,000,000   | 9Counter       | 25.13                          | 13.19                                |
+| 100M_18Counter               | 100M        | 100,000,000   | 18Counter      | 35.57                          | 14.87                                |
+| 100M_36Counter               | 100M        | 100,000,000   | 36Counter      | 299.43                         | 18.96                                |
+| 100M_72Counter               | 100M        | 100,000,000   | 72Counter      | 678.72                         | 28.12                                |
+| 1B_9Counter                  | 1B          | 1,000,000,000 | 9Counter       | 167.50                         | 47.95                                |
+| 1B_18Counter                 | 1B          | 1,000,000,000 | 18Counter      | 261.20                         | 55.79                                |
+| 1B_36Counter                 | 1B          | 1,000,000,000 | 36Counter      | 654.99                         | 73.14                                |
+| 1B_72Counter                 | 1B          | 1,000,000,000 | 72Counter      | 1575.81                        | 116.63                               |
+
+
+