New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[doc_zh_cn] add a commercial inventory DB and carbon data query performance comparison doc chinese doc to carbondata #3521
Conversation
Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1227/ |
Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1237/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1246/ |
Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1254/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1275/ |
Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1264/ |
Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1263/ |
Build Success with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1273/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1284/ |
| 集群 | 描述 | | ||
| ---------------- | --------------------------------------------------------- | | ||
| 某商业列存DB集群 | 3节点,SSD硬盘 | | ||
| Hadoop集群 | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these two cluster using the same resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes carbonData use 1/6 resources equal one machine, Business DB query use one machine.
limitations under the License. | ||
--> | ||
|
||
## CarbonData 替换某商业列存DB查询性能对比 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## CarbonData 替换某商业列存DB查询性能对比 | |
## CarbonData与商业列存DB查询性能对比 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
|
||
|
||
## 1.集群状态对比 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## 1.集群状态对比 | |
## 1. 测试环境 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
| 集群 | 描述 | | ||
| ---------------- | --------------------------------------------------------- | | ||
| 某商业列存DB集群 | 3节点,SSD硬盘 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please mention CPU and memory resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
533f8ed
to
3138d72
Compare
Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1344/ |
Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1353/ |
docs/zh_cn/CarbonData与商业列存DB性能对比.md
Outdated
| carbon.sort.file.buffer.size | 20 | 在合并排序(读/写)操作时存储每个临时过程文件的所使用的总缓存大小。单位为MB | | ||
| carbon.sort.size | 500000 | 在数据加载操作时,每次被排序的记录数。 | | ||
| Spark主要配置 | | | | ||
| spark.sql.shuffle.partitions | 70 | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些配置的描述也请加一些简单描述吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/zh_cn/CarbonData与商业列存DB性能对比.md
Outdated
|
||
```Spark的查询SQL模型:``` | ||
|
||
SELECT COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0) AS COLUMN_C , COALESCE(SUM(COLUMN_A), 0) AS COLUMN_A_A , COALESCE(SUM(COLUMN_B), 0) AS COLUMN_B_B , COALESCE(SUM(COLUMN_D), 0) + COALESCE(SUM(COLUMN_E), 0) AS COLUMN_F , COALESCE(SUM(COLUMN_D), 0) AS COLUMN_D_D , COALESCE(SUM(COLUMN_E), 0) AS COLUMN_E_E , (COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0)) * delta AS COLUMN_F , COALESCE(SUM(COLUMN_A), 0) * delta AS COLUMN_G , COALESCE(SUM(COLUMN_B), 0) * delta AS COLUMN_H , MT.`TEMP` AS `TEMP` FROM ( SELECT `COLUMN_1_A` AS COLUMN_A, `COLUMN_1_E` AS COLUMN_E, `COLUMN_1_B` AS COLUMN_B, `COLUMN_1_D` AS COLUMN_D, TABLE_A.`TEMP` AS `TEMP` FROM TABLE_B LEFT JOIN ( SELECT `COLUMN_CSI` AS `TEMP2` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END AS `TEMP` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY `COLUMN_CSI`, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END ) TABLE_A ON `COLUMN_CSI` = TABLE_A.`TEMP2` WHERE TABLE_A.NAME_TEMP IS NOT NULL AND `TIME` >= A AND `TIME` < B ) MT GROUP BY MT.`TEMP` ORDER BY COLUMN_C DESC LIMIT 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0) AS COLUMN_C , COALESCE(SUM(COLUMN_A), 0) AS COLUMN_A_A , COALESCE(SUM(COLUMN_B), 0) AS COLUMN_B_B , COALESCE(SUM(COLUMN_D), 0) + COALESCE(SUM(COLUMN_E), 0) AS COLUMN_F , COALESCE(SUM(COLUMN_D), 0) AS COLUMN_D_D , COALESCE(SUM(COLUMN_E), 0) AS COLUMN_E_E , (COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0)) * delta AS COLUMN_F , COALESCE(SUM(COLUMN_A), 0) * delta AS COLUMN_G , COALESCE(SUM(COLUMN_B), 0) * delta AS COLUMN_H , MT.`TEMP` AS `TEMP` FROM ( SELECT `COLUMN_1_A` AS COLUMN_A, `COLUMN_1_E` AS COLUMN_E, `COLUMN_1_B` AS COLUMN_B, `COLUMN_1_D` AS COLUMN_D, TABLE_A.`TEMP` AS `TEMP` FROM TABLE_B LEFT JOIN ( SELECT `COLUMN_CSI` AS `TEMP2` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END AS `TEMP` , CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY `COLUMN_CSI`, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END, CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END ) TABLE_A ON `COLUMN_CSI` = TABLE_A.`TEMP2` WHERE TABLE_A.NAME_TEMP IS NOT NULL AND `TIME` >= A AND `TIME` < B ) MT GROUP BY MT.`TEMP` ORDER BY COLUMN_C DESC LIMIT 5000 | |
```SQL | |
SELECT | |
COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0) AS COLUMN_C , | |
COALESCE(SUM(COLUMN_A), 0) AS COLUMN_A_A , | |
COALESCE(SUM(COLUMN_B), 0) AS COLUMN_B_B , | |
COALESCE(SUM(COLUMN_D), 0) + COALESCE(SUM(COLUMN_E), 0) AS COLUMN_F , | |
COALESCE(SUM(COLUMN_D), 0) AS COLUMN_D_D , | |
COALESCE(SUM(COLUMN_E), 0) AS COLUMN_E_E , | |
(COALESCE(SUM(COLUMN_A), 0) + COALESCE(SUM(COLUMN_B), 0)) * delta AS COLUMN_F , | |
COALESCE(SUM(COLUMN_A), 0) * delta AS COLUMN_G , | |
COALESCE(SUM(COLUMN_B), 0) * delta AS COLUMN_H , | |
MT.`TEMP` AS `TEMP` | |
FROM ( | |
SELECT | |
`COLUMN_1_A` AS COLUMN_A, | |
`COLUMN_1_E` AS COLUMN_E, | |
`COLUMN_1_B` AS COLUMN_B, | |
`COLUMN_1_D` AS COLUMN_D, | |
TABLE_A.`TEMP` AS `TEMP` | |
FROM TABLE_B LEFT JOIN ( | |
SELECT | |
`COLUMN_CSI` AS `TEMP2` , | |
CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END AS `TEMP` , | |
CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END AS NAME_TEMP | |
FROM DIMENSION_TABLE | |
GROUP BY | |
`COLUMN_CSI`, | |
CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_CSI` END, | |
CASE WHEN `TYPE_ID` = 2 THEN `COLUMN_NAME` END | |
) TABLE_A | |
ON `COLUMN_CSI` = TABLE_A.`TEMP2` | |
WHERE | |
TABLE_A.NAME_TEMP IS NOT NULL AND | |
`TIME` >= A AND `TIME` < B | |
) MT | |
GROUP BY MT.`TEMP` | |
ORDER BY COLUMN_C DESC | |
LIMIT 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/zh_cn/CarbonData与商业列存DB性能对比.md
Outdated
|
||
```某商业列存DB的查询SQL模型:``` | ||
|
||
SELECT TOP 5000 SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_C , SUM(COALESCE(COLUMN_A, 0)) AS COLUMN_A_A , SUM(COALESCE(COLUMN_B, 0)) AS COLUMN_B_B , SUM(COALESCE(COLUMN_D, 0)) + SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_F , SUM(COALESCE(COLUMN_D, 0)) AS COLUMN_D_D , SUM(COALESCE(COLUMN_E, 0)) AS COLUMN_E_E , (SUM(COALESCE(COLUMN_A, 0)) + SUM(COALESCE(COLUMN_B, 0))) * delta AS COLUMN_F , SUM(COALESCE(COLUMN_A, 0)) * delta AS COLUMN_G , SUM(COALESCE(COLUMN_B, 0)) * delta AS COLUMN_H , MT."TEMP" AS "TEMP", COUNT(1) OVER () AS countNum FROM ( SELECT COALESCE(SUM("COLUMN_1_A"), 0) AS COLUMN_A , COALESCE(SUM("COLUMN_1_B"), 0) AS COLUMN_B , COALESCE(SUM("COLUMN_1_E"), 0) AS COLUMN_E , COALESCE(SUM("COLUMN_1_D"), 0) AS COLUMN_D , TABLE_A."TEMP" AS "TEMP" FROM TABLE_B LEFT JOIN ( SELECT "COLUMN_CSI" AS "TEMP2" , CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END AS "TEMP" , CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END AS NAME_TEMP FROM DIMENSION_TABLE GROUP BY "COLUMN_CSI", CASE WHEN "TYPE_ID" = 2 THEN "COLUMN_CSI" END, CASE WHEN "TYPE_ID" = 2 THEN "CLOUMN_NAME" END ) TABLE_A ON "COLUMN_CSI" = TABLE_A."TEMP2" WHERE TABLE_A.NAME_TEMP IS NOT NULL AND "TIME" < A AND "TIME" >= B GROUP BY TABLE_A."TEMP" ) MT GROUP BY MT."TEMP" ORDER BY COLUMN_C DESC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not required.
User can know about the SQL statement by line 47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/zh_cn/CarbonData与商业列存DB性能对比.md
Outdated
|
||
## 2.查询SQL模型介绍 | ||
|
||
某商业列存DB与CarbonData查询SQL本身存在差异,在执行性能测试之前需要对SQL进行修改。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/zh_cn/CarbonData与商业列存DB性能对比.md
Outdated
| 某商业列存DB集群 | 3节点,SSD硬盘 | | ||
| Hadoop集群 | 2个namenode,6个datanode,STAT硬盘,查询队列分配1/6的资源 | | ||
|
||
## 2.查询SQL模型介绍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## 2.查询SQL模型介绍 | |
## 2. SQL测试语句 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1363/ |
…rison doc chinese doc to carbondata
Build Success with Spark 2.1.0, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.1/1364/ |
Build Failed with Spark 2.2.1, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.2/1374/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1385/ |
LGTM |
…ta and columnar DB This closes apache#3521
…ta and columnar DB This closes apache#3521
…e doc to carbondata
as talked with likun add chinese doc file path.
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.