Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](ES Catalog) map nested/object type in ES to JSON type in Doris #37101

Merged
merged 2 commits into from
Jul 2, 2024

Conversation

qidaye
Copy link
Contributor

@qidaye qidaye commented Jul 1, 2024

Proposed changes

  1. nested/object can map to json type in Doris, and can be analyzed with json functions.
  2. Add some cases for json_extract.

1. `nested`/`object` can map to json type in Doris
2. Add some cases for `json_extract`
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@qidaye
Copy link
Contributor Author

qidaye commented Jul 1, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 1, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40503 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7bfe7217a2d4c56cce1c1c7edd21058767711c15, data reload: false

------ Round 1 ----------------------------------
q1	17621	4304	4295	4295
q2	2010	188	184	184
q3	10457	1215	1109	1109
q4	10191	828	753	753
q5	7529	2652	2623	2623
q6	220	136	137	136
q7	954	594	602	594
q8	9218	2075	2052	2052
q9	8995	6488	6452	6452
q10	9056	3757	3738	3738
q11	433	230	237	230
q12	513	243	242	242
q13	17774	2976	2973	2973
q14	272	226	225	225
q15	531	495	488	488
q16	514	379	374	374
q17	951	637	697	637
q18	7989	7377	7388	7377
q19	5431	1446	1513	1446
q20	660	327	333	327
q21	5128	3976	3909	3909
q22	404	341	339	339
Total cold run time: 116851 ms
Total hot run time: 40503 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4415	4225	4232	4225
q2	381	264	263	263
q3	2995	2885	2863	2863
q4	2028	1732	1708	1708
q5	5564	5513	5443	5443
q6	229	132	129	129
q7	2190	1834	1845	1834
q8	3296	3387	3443	3387
q9	8675	8604	8838	8604
q10	4103	3817	3679	3679
q11	589	484	486	484
q12	867	636	621	621
q13	17204	3137	3148	3137
q14	308	288	275	275
q15	532	486	482	482
q16	495	433	426	426
q17	1806	1524	1498	1498
q18	8201	7935	7781	7781
q19	1743	1616	1456	1456
q20	2138	1877	1839	1839
q21	5161	4966	4761	4761
q22	612	531	549	531
Total cold run time: 73532 ms
Total hot run time: 55426 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174507 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7bfe7217a2d4c56cce1c1c7edd21058767711c15, data reload: false

query1	918	393	388	388
query2	6463	2473	2307	2307
query3	6642	216	229	216
query4	18805	17373	17548	17373
query5	3624	484	480	480
query6	274	194	170	170
query7	4593	302	287	287
query8	325	290	285	285
query9	8491	2389	2365	2365
query10	570	301	279	279
query11	10695	10103	10117	10103
query12	115	84	80	80
query13	1640	354	356	354
query14	10188	7281	7215	7215
query15	227	181	189	181
query16	7696	278	267	267
query17	1891	536	510	510
query18	1808	279	269	269
query19	198	149	152	149
query20	90	86	86	86
query21	208	131	127	127
query22	4500	4209	4008	4008
query23	33615	33776	33487	33487
query24	10855	2902	2892	2892
query25	588	395	384	384
query26	711	156	152	152
query27	2296	326	333	326
query28	6001	2152	2127	2127
query29	883	635	638	635
query30	259	154	155	154
query31	956	752	755	752
query32	98	59	55	55
query33	719	296	293	293
query34	932	502	484	484
query35	752	621	644	621
query36	1137	1002	982	982
query37	136	77	82	77
query38	2970	2835	2811	2811
query39	900	844	852	844
query40	217	132	127	127
query41	60	56	53	53
query42	127	106	112	106
query43	602	552	571	552
query44	1155	745	720	720
query45	186	169	164	164
query46	1068	698	721	698
query47	1841	1809	1786	1786
query48	361	301	294	294
query49	851	418	406	406
query50	757	388	381	381
query51	6905	6735	6809	6735
query52	101	95	91	91
query53	361	296	293	293
query54	900	438	437	437
query55	75	75	73	73
query56	283	281	275	275
query57	1145	1070	1058	1058
query58	255	245	236	236
query59	3424	3257	3167	3167
query60	305	281	292	281
query61	97	93	89	89
query62	583	436	444	436
query63	313	299	289	289
query64	8456	2292	1845	1845
query65	3215	3118	3090	3090
query66	764	331	333	331
query67	15341	14934	15062	14934
query68	4531	537	547	537
query69	620	440	337	337
query70	1098	1142	1127	1127
query71	414	294	291	291
query72	7243	6023	5642	5642
query73	741	333	325	325
query74	5902	5549	5493	5493
query75	3363	2694	2681	2681
query76	2224	917	927	917
query77	488	312	303	303
query78	10443	9952	9660	9660
query79	2670	519	507	507
query80	2347	502	490	490
query81	601	220	229	220
query82	920	113	109	109
query83	323	175	197	175
query84	283	90	94	90
query85	1785	422	268	268
query86	485	302	327	302
query87	3291	3107	3109	3107
query88	3802	2374	2366	2366
query89	478	374	379	374
query90	1803	187	181	181
query91	132	97	98	97
query92	61	51	49	49
query93	3375	515	502	502
query94	1148	191	183	183
query95	402	316	319	316
query96	590	266	275	266
query97	3173	3025	3069	3025
query98	221	198	196	196
query99	1161	849	858	849
Total cold run time: 269635 ms
Total hot run time: 174507 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7bfe7217a2d4c56cce1c1c7edd21058767711c15, data reload: false

query1	0.04	0.03	0.02
query2	0.08	0.05	0.04
query3	0.22	0.05	0.05
query4	1.67	0.08	0.09
query5	0.50	0.47	0.50
query6	1.12	0.71	0.72
query7	0.02	0.01	0.02
query8	0.06	0.04	0.05
query9	0.54	0.49	0.49
query10	0.54	0.54	0.55
query11	0.14	0.11	0.12
query12	0.15	0.12	0.13
query13	0.59	0.59	0.59
query14	0.77	0.78	0.77
query15	0.84	0.82	0.81
query16	0.36	0.36	0.37
query17	1.02	1.03	0.97
query18	0.22	0.25	0.23
query19	1.83	1.73	1.71
query20	0.02	0.01	0.01
query21	15.45	0.78	0.65
query22	3.81	6.68	2.50
query23	18.33	1.50	1.32
query24	2.12	0.24	0.23
query25	0.16	0.10	0.09
query26	0.26	0.17	0.18
query27	0.08	0.08	0.08
query28	13.23	1.02	1.00
query29	12.60	3.25	3.26
query30	0.25	0.06	0.06
query31	2.86	0.39	0.39
query32	3.25	0.47	0.47
query33	2.86	2.94	2.92
query34	17.11	4.39	4.43
query35	4.46	4.40	4.44
query36	0.65	0.48	0.46
query37	0.19	0.17	0.15
query38	0.16	0.15	0.15
query39	0.04	0.04	0.04
query40	0.19	0.14	0.14
query41	0.09	0.04	0.04
query42	0.06	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 108.98 s
Total hot run time: 31.11 s

@qidaye
Copy link
Contributor Author

qidaye commented Jul 2, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 2, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39789 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9f92890efe32bdd35a7d40327186e31dab13547a, data reload: false

------ Round 1 ----------------------------------
q1	17841	4586	4271	4271
q2	2026	198	196	196
q3	10543	1239	1134	1134
q4	10354	735	825	735
q5	7511	2641	2649	2641
q6	222	136	146	136
q7	973	602	612	602
q8	9227	2104	2087	2087
q9	8882	6499	6495	6495
q10	9012	3734	3717	3717
q11	459	239	240	239
q12	443	252	236	236
q13	17774	2977	3000	2977
q14	263	233	213	213
q15	523	500	504	500
q16	538	378	388	378
q17	982	730	783	730
q18	8226	7625	7331	7331
q19	7877	1451	1355	1355
q20	687	334	351	334
q21	4890	3141	3970	3141
q22	393	346	341	341
Total cold run time: 119646 ms
Total hot run time: 39789 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4374	4249	4257	4249
q2	379	274	269	269
q3	3163	2919	2941	2919
q4	1988	1717	1749	1717
q5	5479	5467	5447	5447
q6	228	135	135	135
q7	2303	1835	1809	1809
q8	3288	3455	3455	3455
q9	8654	8811	8708	8708
q10	4131	3770	3849	3770
q11	597	501	496	496
q12	856	642	623	623
q13	16301	3116	3195	3116
q14	307	289	268	268
q15	538	479	498	479
q16	476	441	442	441
q17	1830	1531	1539	1531
q18	8195	8041	7745	7745
q19	1850	1642	1684	1642
q20	2160	1900	1854	1854
q21	5257	4861	4739	4739
q22	683	547	595	547
Total cold run time: 73037 ms
Total hot run time: 55959 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174246 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9f92890efe32bdd35a7d40327186e31dab13547a, data reload: false

query1	913	375	378	375
query2	6456	2468	2463	2463
query3	6639	206	218	206
query4	22138	17529	17381	17381
query5	3674	507	477	477
query6	251	165	165	165
query7	4595	304	293	293
query8	317	293	299	293
query9	8500	2374	2361	2361
query10	581	306	289	289
query11	10764	10128	10062	10062
query12	116	89	84	84
query13	1642	382	371	371
query14	10167	7168	7895	7168
query15	226	188	190	188
query16	7940	278	275	275
query17	1874	571	547	547
query18	2047	288	280	280
query19	203	156	162	156
query20	97	82	83	82
query21	216	129	126	126
query22	4613	4158	4128	4128
query23	34655	33638	33284	33284
query24	10968	2888	2947	2888
query25	608	433	399	399
query26	704	160	162	160
query27	2180	324	335	324
query28	5929	2150	2139	2139
query29	903	671	658	658
query30	247	159	165	159
query31	992	774	770	770
query32	101	56	55	55
query33	766	315	305	305
query34	979	504	482	482
query35	759	647	669	647
query36	1142	1000	982	982
query37	142	81	78	78
query38	2956	2874	2856	2856
query39	922	838	902	838
query40	213	128	127	127
query41	51	51	50	50
query42	128	101	113	101
query43	587	556	552	552
query44	1251	745	725	725
query45	188	159	161	159
query46	1103	710	706	706
query47	1856	1769	1800	1769
query48	378	294	301	294
query49	837	419	420	419
query50	771	407	397	397
query51	6908	6673	6758	6673
query52	113	94	91	91
query53	367	293	297	293
query54	889	441	439	439
query55	74	72	75	72
query56	283	267	264	264
query57	1162	1067	1041	1041
query58	256	237	245	237
query59	3408	3230	3252	3230
query60	293	295	278	278
query61	114	90	86	86
query62	600	440	432	432
query63	322	295	295	295
query64	8765	2263	1711	1711
query65	3187	3098	3105	3098
query66	754	321	330	321
query67	15630	15006	14909	14909
query68	6222	546	538	538
query69	699	417	316	316
query70	1233	1039	1098	1039
query71	510	325	270	270
query72	8565	5470	5494	5470
query73	804	331	331	331
query74	5857	5516	5595	5516
query75	5063	2632	2665	2632
query76	4527	992	900	900
query77	783	310	303	303
query78	10604	9809	9721	9721
query79	4840	513	523	513
query80	1040	469	461	461
query81	538	218	216	216
query82	1104	109	107	107
query83	360	180	178	178
query84	274	93	81	81
query85	1329	275	274	274
query86	448	327	305	305
query87	3320	3093	3118	3093
query88	4752	2402	2381	2381
query89	512	393	384	384
query90	1970	188	189	188
query91	131	100	100	100
query92	60	48	48	48
query93	5446	509	503	503
query94	1207	191	193	191
query95	448	319	312	312
query96	617	271	272	271
query97	3195	3021	3017	3017
query98	219	194	198	194
query99	1114	830	839	830
Total cold run time: 287201 ms
Total hot run time: 174246 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.54 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9f92890efe32bdd35a7d40327186e31dab13547a, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.03
query3	0.22	0.05	0.05
query4	1.67	0.09	0.08
query5	0.50	0.49	0.50
query6	1.12	0.72	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.05
query9	0.55	0.50	0.48
query10	0.54	0.54	0.53
query11	0.15	0.11	0.11
query12	0.14	0.12	0.12
query13	0.59	0.58	0.58
query14	0.76	0.78	0.79
query15	0.84	0.80	0.83
query16	0.35	0.38	0.37
query17	0.97	1.00	0.98
query18	0.24	0.25	0.24
query19	1.86	1.71	1.77
query20	0.01	0.01	0.01
query21	15.43	0.78	0.66
query22	4.19	6.95	1.70
query23	18.34	1.42	1.32
query24	2.10	0.24	0.22
query25	0.16	0.09	0.09
query26	0.27	0.18	0.18
query27	0.09	0.08	0.08
query28	13.29	1.00	1.00
query29	12.65	3.38	3.37
query30	0.25	0.07	0.06
query31	2.86	0.40	0.38
query32	3.25	0.47	0.47
query33	2.88	2.88	2.97
query34	17.03	4.43	4.44
query35	4.48	4.53	4.49
query36	0.65	0.48	0.47
query37	0.19	0.15	0.16
query38	0.16	0.15	0.14
query39	0.04	0.03	0.04
query40	0.18	0.15	0.14
query41	0.09	0.06	0.05
query42	0.06	0.04	0.04
query43	0.05	0.04	0.03
Total cold run time: 109.39 s
Total hot run time: 30.54 s

@xiaokang xiaokang changed the title [feature](ES Catalog)Support JSON type for ES catalog [feature](ES Catalog) map nested/object type in ES to JSON type in Doris Jul 2, 2024
Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Jul 2, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jul 2, 2024
Copy link
Contributor

github-actions bot commented Jul 2, 2024

PR approved by anyone and no changes requested.

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 6e192d2 into apache:master Jul 2, 2024
28 of 32 checks passed
@qidaye qidaye deleted the es_json_type branch July 2, 2024 09:50
qidaye added a commit to qidaye/incubator-doris that referenced this pull request Jul 2, 2024
…ris (apache#37101)

1. `nested`/`object` can map to `json` type in Doris, and can be
analyzed with json functions.
2. Add some cases for `json_extract`.
qidaye added a commit to qidaye/incubator-doris that referenced this pull request Jul 2, 2024
…ris (apache#37101)

1. `nested`/`object` can map to `json` type in Doris, and can be
analyzed with json functions.
2. Add some cases for `json_extract`.
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
…ris (#37101)

1. `nested`/`object` can map to `json` type in Doris, and can be
analyzed with json functions.
2. Add some cases for `json_extract`.
@xiaokang xiaokang mentioned this pull request Jul 18, 2024
@yiguolei yiguolei mentioned this pull request Jul 19, 2024
1 task
morningman pushed a commit to apache/doris-website that referenced this pull request Aug 5, 2024
1. Add json type mapping apache/doris#37101
2. Add session instruction `enable_es_parallel_scroll` and `batch_size`
apache/doris#37180
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants