Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](ES Catalog) map nested/object type in ES to JSON type in Doris (#37101) #37183

Merged
merged 2 commits into from
Jul 5, 2024

Conversation

qidaye
Copy link
Contributor

@qidaye qidaye commented Jul 2, 2024

backport #37101

…ris (apache#37101)

1. `nested`/`object` can map to `json` type in Doris, and can be
analyzed with json functions.
2. Add some cases for `json_extract`.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@qidaye
Copy link
Contributor Author

qidaye commented Jul 2, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 2, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 49830 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 06cf7867ba1bc803dbb2cf37c3cf8e9567e87c51, data reload: false

------ Round 1 ----------------------------------
q1	17699	4389	4382	4382
q2	2056	156	154	154
q3	10265	1913	1964	1913
q4	10325	1243	1339	1243
q5	8793	3872	3953	3872
q6	244	164	132	132
q7	2044	1594	1580	1580
q8	9340	2774	2723	2723
q9	10739	10322	10292	10292
q10	8629	3516	3533	3516
q11	420	250	242	242
q12	478	311	316	311
q13	18331	3960	4035	3960
q14	365	331	329	329
q15	515	465	457	457
q16	673	566	587	566
q17	1145	966	944	944
q18	7357	6761	6965	6761
q19	1789	1686	1621	1621
q20	511	318	290	290
q21	4517	4118	4098	4098
q22	526	444	446	444
Total cold run time: 116761 ms
Total hot run time: 49830 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4327	4420	4338	4338
q2	323	229	221	221
q3	4207	4181	4148	4148
q4	2778	2762	2777	2762
q5	7151	7006	7047	7006
q6	245	125	123	123
q7	3213	2827	2859	2827
q8	4324	4456	4471	4456
q9	16943	16862	16786	16786
q10	4219	4290	4231	4231
q11	782	682	702	682
q12	1046	836	853	836
q13	6821	3750	3753	3750
q14	445	415	421	415
q15	519	455	463	455
q16	734	671	673	671
q17	3894	3817	3895	3817
q18	8832	8766	8940	8766
q19	1729	1690	1723	1690
q20	2411	2138	2172	2138
q21	8563	8527	8478	8478
q22	1036	1018	1020	1018
Total cold run time: 84542 ms
Total hot run time: 79614 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.89% (8115/21417)
Line Coverage: 29.56% (66470/224876)
Region Coverage: 29.03% (34258/118018)
Branch Coverage: 24.90% (17597/70664)
Coverage Report: http://coverage.selectdb-in.cc/coverage/06cf7867ba1bc803dbb2cf37c3cf8e9567e87c51_06cf7867ba1bc803dbb2cf37c3cf8e9567e87c51/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 205364 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 06cf7867ba1bc803dbb2cf37c3cf8e9567e87c51, data reload: false

query1	928	427	394	394
query2	6550	2795	2966	2795
query3	6927	206	199	199
query4	20286	18046	18079	18046
query5	19722	6598	6597	6597
query6	304	225	232	225
query7	4174	297	312	297
query8	417	469	419	419
query9	3113	2671	2644	2644
query10	413	327	315	315
query11	11584	10874	10795	10795
query12	128	81	76	76
query13	6288	709	699	699
query14	19425	14291	14049	14049
query15	376	259	249	249
query16	7277	302	287	287
query17	2765	1940	881	881
query18	2302	423	419	419
query19	212	146	152	146
query20	84	78	83	78
query21	191	103	93	93
query22	5472	5522	5454	5454
query23	33375	32154	31967	31967
query24	6933	6547	6549	6547
query25	545	430	426	426
query26	524	168	163	163
query27	1856	307	293	293
query28	6171	2352	2322	2322
query29	2943	2835	2859	2835
query30	249	171	173	171
query31	929	744	743	743
query32	70	61	62	61
query33	403	262	257	257
query34	855	484	480	480
query35	1148	956	950	950
query36	1349	1069	1212	1069
query37	94	60	64	60
query38	3117	2940	2916	2916
query39	1387	1335	1328	1328
query40	213	96	94	94
query41	45	44	45	44
query42	82	86	77	77
query43	708	648	763	648
query44	1125	741	734	734
query45	253	242	238	238
query46	1238	985	981	981
query47	1871	1710	1919	1710
query48	1025	720	706	706
query49	626	380	374	374
query50	863	621	612	612
query51	4732	4675	4675	4675
query52	90	74	87	74
query53	447	334	324	324
query54	2663	2472	2507	2472
query55	86	86	80	80
query56	258	233	216	216
query57	1187	1101	1134	1101
query58	216	202	196	196
query59	4166	4189	3889	3889
query60	211	202	229	202
query61	97	96	97	96
query62	831	455	540	455
query63	491	346	346	346
query64	2466	1541	1347	1347
query65	3675	3582	3546	3546
query66	791	383	384	383
query67	15402	16985	15362	15362
query68	9157	638	661	638
query69	602	366	354	354
query70	1659	1273	1428	1273
query71	437	325	314	314
query72	6497	3519	3497	3497
query73	737	320	314	314
query74	6327	5896	5799	5799
query75	5386	3701	3715	3701
query76	5480	1174	1104	1104
query77	943	259	274	259
query78	12620	11587	12516	11587
query79	9110	630	648	630
query80	1168	408	420	408
query81	494	239	234	234
query82	1524	97	99	97
query83	182	140	138	138
query84	259	72	72	72
query85	885	335	317	317
query86	320	288	296	288
query87	3209	3044	3004	3004
query88	4940	2334	2300	2300
query89	412	304	309	304
query90	2047	218	217	217
query91	192	141	155	141
query92	61	57	54	54
query93	4745	609	621	609
query94	734	214	215	214
query95	1117	1078	1055	1055
query96	636	327	329	327
query97	6431	6381	6407	6381
query98	186	173	171	171
query99	2729	836	918	836
Total cold run time: 317970 ms
Total hot run time: 205364 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.65 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 06cf7867ba1bc803dbb2cf37c3cf8e9567e87c51, data reload: false

query1	0.03	0.02	0.02
query2	0.08	0.02	0.02
query3	0.25	0.05	0.04
query4	1.78	0.06	0.06
query5	0.55	0.53	0.53
query6	1.23	0.61	0.61
query7	0.01	0.01	0.01
query8	0.04	0.02	0.02
query9	0.53	0.47	0.49
query10	0.55	0.55	0.53
query11	0.12	0.08	0.09
query12	0.12	0.10	0.09
query13	0.62	0.60	0.60
query14	0.77	0.80	0.78
query15	0.79	0.76	0.78
query16	0.36	0.36	0.39
query17	1.03	1.01	1.01
query18	0.20	0.24	0.25
query19	1.92	1.87	1.88
query20	0.02	0.01	0.01
query21	15.46	0.57	0.55
query22	1.85	1.96	1.37
query23	17.32	0.86	0.94
query24	5.55	1.08	1.85
query25	0.40	0.08	0.05
query26	0.66	0.16	0.15
query27	0.04	0.04	0.04
query28	7.13	0.76	0.73
query29	12.77	2.37	2.32
query30	0.62	0.52	0.53
query31	2.80	0.40	0.39
query32	3.38	0.50	0.51
query33	3.09	3.08	3.12
query34	15.29	4.82	4.80
query35	4.85	4.86	4.86
query36	1.06	1.01	1.01
query37	0.06	0.04	0.04
query38	0.04	0.02	0.02
query39	0.02	0.02	0.01
query40	0.15	0.13	0.14
query41	0.07	0.01	0.01
query42	0.02	0.01	0.01
query43	0.03	0.01	0.02
Total cold run time: 103.66 s
Total hot run time: 30.65 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 06cf7867ba1bc803dbb2cf37c3cf8e9567e87c51 with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       21.6 seconds inserted 10000000 Rows, about 462K ops/s

@qidaye
Copy link
Contributor Author

qidaye commented Jul 3, 2024

run external

@qidaye
Copy link
Contributor Author

qidaye commented Jul 3, 2024

run p0

@qidaye
Copy link
Contributor Author

qidaye commented Jul 3, 2024

run external

1 similar comment
@qidaye
Copy link
Contributor Author

qidaye commented Jul 3, 2024

run external

@qidaye
Copy link
Contributor Author

qidaye commented Jul 3, 2024

run buildall

@github-actions github-actions bot added the area/planner Issues or PRs related to the query planner label Jul 3, 2024
Copy link
Contributor

github-actions bot commented Jul 3, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 49848 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 22289f6ffc8faed27ed164379fbdaa6b4f624ba2, data reload: false

------ Round 1 ----------------------------------
q1	17673	4649	4413	4413
q2	2097	158	150	150
q3	10612	1901	1937	1901
q4	10376	1293	1327	1293
q5	8413	3908	3935	3908
q6	237	152	127	127
q7	2032	1599	1616	1599
q8	9510	2741	2714	2714
q9	14024	10234	10225	10225
q10	8621	3468	3558	3468
q11	415	255	247	247
q12	466	296	312	296
q13	18358	3949	4016	3949
q14	364	325	327	325
q15	508	473	459	459
q16	673	574	576	574
q17	1144	934	921	921
q18	7217	7037	6817	6817
q19	1800	1679	1610	1610
q20	569	305	299	299
q21	4488	4147	4109	4109
q22	541	456	444	444
Total cold run time: 120138 ms
Total hot run time: 49848 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4330	4308	4323	4308
q2	323	222	220	220
q3	4199	4137	4135	4135
q4	2749	2724	2741	2724
q5	7169	7132	7103	7103
q6	236	122	119	119
q7	3244	2835	2815	2815
q8	4313	4427	4462	4427
q9	16948	16789	16844	16789
q10	4240	4261	4328	4261
q11	756	679	657	657
q12	1030	844	860	844
q13	6821	3750	3710	3710
q14	444	425	417	417
q15	514	456	456	456
q16	742	673	665	665
q17	3779	3827	3770	3770
q18	8821	8715	8860	8715
q19	1753	1641	1666	1641
q20	2458	2147	2108	2108
q21	8462	8479	8429	8429
q22	1084	941	981	941
Total cold run time: 84415 ms
Total hot run time: 79254 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 203700 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 22289f6ffc8faed27ed164379fbdaa6b4f624ba2, data reload: false

query1	924	421	382	382
query2	6554	2982	2687	2687
query3	6924	207	203	203
query4	19951	17958	18045	17958
query5	19723	6488	6524	6488
query6	309	229	238	229
query7	4162	299	313	299
query8	464	401	395	395
query9	3116	2660	2600	2600
query10	414	298	290	290
query11	11291	10720	10674	10674
query12	122	83	75	75
query13	5631	690	672	672
query14	18153	13515	13647	13515
query15	358	258	237	237
query16	6454	283	272	272
query17	1733	1460	881	881
query18	2291	409	407	407
query19	211	150	150	150
query20	82	80	80	80
query21	194	94	98	94
query22	5236	4974	5013	4974
query23	32648	31975	32092	31975
query24	6912	6554	6552	6552
query25	540	434	448	434
query26	543	165	166	165
query27	1865	291	301	291
query28	6107	2355	2315	2315
query29	2863	2702	2606	2606
query30	240	179	169	169
query31	912	726	750	726
query32	72	65	63	63
query33	392	267	259	259
query34	843	466	480	466
query35	1093	904	877	877
query36	1314	1187	1563	1187
query37	93	61	62	61
query38	3086	2886	2870	2870
query39	1390	1321	1317	1317
query40	203	93	91	91
query41	46	45	44	44
query42	85	85	83	83
query43	838	743	689	689
query44	1133	711	712	711
query45	251	234	238	234
query46	1220	967	973	967
query47	1841	1770	1708	1708
query48	997	700	712	700
query49	635	374	382	374
query50	860	607	605	605
query51	4680	4649	4704	4649
query52	87	79	99	79
query53	443	322	314	314
query54	2631	2450	2440	2440
query55	89	91	78	78
query56	247	218	216	216
query57	1177	1123	1114	1114
query58	225	205	204	204
query59	4136	4187	4148	4148
query60	229	201	212	201
query61	98	97	99	97
query62	782	442	490	442
query63	480	346	345	345
query64	2497	1567	1454	1454
query65	3655	3533	3543	3533
query66	819	378	388	378
query67	18231	16072	15032	15032
query68	8768	658	655	655
query69	586	368	362	362
query70	1777	1490	1306	1306
query71	406	312	312	312
query72	6517	3504	3518	3504
query73	731	326	322	322
query74	6301	5828	5870	5828
query75	5155	3648	3672	3648
query76	5263	1136	1162	1136
query77	901	259	258	258
query78	12595	11776	12807	11776
query79	13425	650	666	650
query80	1521	417	416	416
query81	489	237	239	237
query82	758	103	101	101
query83	177	134	137	134
query84	254	71	70	70
query85	1041	329	324	324
query86	342	296	308	296
query87	3255	3070	3045	3045
query88	5695	2287	2292	2287
query89	416	296	294	294
query90	2603	219	219	219
query91	186	139	138	138
query92	56	52	57	52
query93	5428	549	574	549
query94	1298	213	211	211
query95	1132	1073	1066	1066
query96	645	332	328	328
query97	6435	6351	6378	6351
query98	202	186	177	177
query99	2931	975	823	823
Total cold run time: 321114 ms
Total hot run time: 203700 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.89% (8114/21417)
Line Coverage: 29.56% (66476/224906)
Region Coverage: 29.03% (34263/118042)
Branch Coverage: 24.90% (17600/70682)
Coverage Report: http://coverage.selectdb-in.cc/coverage/22289f6ffc8faed27ed164379fbdaa6b4f624ba2_22289f6ffc8faed27ed164379fbdaa6b4f624ba2/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 30.44 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 22289f6ffc8faed27ed164379fbdaa6b4f624ba2, data reload: false

query1	0.02	0.02	0.02
query2	0.08	0.02	0.02
query3	0.24	0.05	0.04
query4	1.83	0.07	0.06
query5	0.53	0.52	0.52
query6	1.24	0.62	0.62
query7	0.02	0.01	0.01
query8	0.03	0.03	0.02
query9	0.51	0.50	0.47
query10	0.52	0.53	0.54
query11	0.12	0.08	0.10
query12	0.12	0.09	0.09
query13	0.63	0.61	0.62
query14	0.78	0.79	0.80
query15	0.77	0.76	0.75
query16	0.36	0.37	0.37
query17	1.01	1.01	1.00
query18	0.23	0.26	0.25
query19	1.90	1.85	1.82
query20	0.01	0.02	0.01
query21	15.47	0.55	0.55
query22	2.07	1.91	1.38
query23	16.76	1.00	0.94
query24	4.65	1.89	0.92
query25	0.36	0.10	0.05
query26	0.58	0.15	0.15
query27	0.03	0.04	0.04
query28	7.79	0.80	0.78
query29	12.74	2.37	2.26
query30	0.55	0.49	0.51
query31	2.81	0.39	0.37
query32	3.41	0.50	0.49
query33	3.09	3.09	3.08
query34	15.24	4.81	4.77
query35	4.87	4.82	4.86
query36	1.05	1.00	1.01
query37	0.06	0.05	0.04
query38	0.04	0.02	0.02
query39	0.02	0.01	0.02
query40	0.16	0.14	0.14
query41	0.06	0.02	0.02
query42	0.02	0.02	0.02
query43	0.02	0.02	0.02
Total cold run time: 102.8 s
Total hot run time: 30.44 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 22289f6ffc8faed27ed164379fbdaa6b4f624ba2 with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.4 seconds inserted 10000000 Rows, about 467K ops/s

@morningman morningman merged commit cb12b26 into apache:branch-2.0 Jul 5, 2024
22 of 24 checks passed
@qidaye qidaye deleted the pick_es_json_2.0 branch July 8, 2024 07:42
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/planner Issues or PRs related to the query planner kind/test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants