Skip to content

[Fix](partial update) Fix partial update load false when schema includes auto increment column#31725

Merged
zhannngchen merged 4 commits intoapache:masterfrom
Yukang-Lian:Fix_Partial_Update_Auto_Inc_Column_Table_Insert_Value
Mar 5, 2024
Merged

[Fix](partial update) Fix partial update load false when schema includes auto increment column#31725
zhannngchen merged 4 commits intoapache:masterfrom
Yukang-Lian:Fix_Partial_Update_Auto_Inc_Column_Table_Insert_Value

Conversation

@Yukang-Lian
Copy link
Collaborator

Problem:
When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs.

Reason:
The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error.

Solution:
Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs.

Reason:
The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error.

Solution:
Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.
@Yukang-Lian Yukang-Lian force-pushed the Fix_Partial_Update_Auto_Inc_Column_Table_Insert_Value branch from 8bbe654 to 52fadfe Compare March 4, 2024 15:49
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38125 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 52fadfe7bb3bf936175b399a7d02c8a5207c664a, data reload: false

------ Round 1 ----------------------------------
q1	17651	4107	4027	4027
q2	2025	150	150	150
q3	10551	952	957	952
q4	4665	941	952	941
q5	7592	2927	3030	2927
q6	176	124	125	124
q7	1344	850	821	821
q8	9502	2114	2121	2114
q9	7261	6443	6461	6443
q10	8248	2582	2570	2570
q11	435	211	212	211
q12	788	316	308	308
q13	17966	2947	2927	2927
q14	286	254	254	254
q15	493	432	456	432
q16	478	394	394	394
q17	951	843	800	800
q18	6703	5957	5961	5957
q19	1585	1531	1516	1516
q20	556	281	294	281
q21	7558	3690	3695	3690
q22	801	336	286	286
Total cold run time: 107615 ms
Total hot run time: 38125 ms

----- Round 2, with runtime_filter_mode=off -----
q1	3997	3996	3989	3989
q2	328	222	237	222
q3	2943	2948	2888	2888
q4	1859	1847	1796	1796
q5	5260	5272	5232	5232
q6	211	118	118	118
q7	2351	1827	1875	1827
q8	3223	3274	3282	3274
q9	8527	8524	8561	8524
q10	6159	3709	3738	3709
q11	538	435	449	435
q12	680	509	540	509
q13	11337	2757	2784	2757
q14	270	254	255	254
q15	474	452	440	440
q16	461	421	397	397
q17	1708	1671	1653	1653
q18	7894	7502	7309	7309
q19	5728	1596	1617	1596
q20	1968	1750	1708	1708
q21	4965	4788	4774	4774
q22	537	472	459	459
Total cold run time: 71418 ms
Total hot run time: 53870 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 177828 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 52fadfe7bb3bf936175b399a7d02c8a5207c664a, data reload: false

query1	921	357	336	336
query2	7398	2050	2173	2050
query3	6710	217	210	210
query4	27441	21064	21156	21064
query5	4280	469	432	432
query6	276	182	172	172
query7	4619	301	294	294
query8	235	176	166	166
query9	8525	2205	2193	2193
query10	426	226	236	226
query11	15066	14387	14556	14387
query12	141	94	89	89
query13	1674	441	437	437
query14	9191	6932	6815	6815
query15	230	192	187	187
query16	7260	269	269	269
query17	934	564	549	549
query18	1913	278	272	272
query19	215	165	164	164
query20	94	93	96	93
query21	198	131	124	124
query22	4646	4520	4417	4417
query23	31444	30756	30753	30753
query24	12245	3100	3050	3050
query25	712	403	396	396
query26	1922	164	168	164
query27	2962	360	365	360
query28	6699	1829	1796	1796
query29	1311	625	610	610
query30	320	149	149	149
query31	958	730	747	730
query32	101	67	62	62
query33	770	262	264	262
query34	988	457	466	457
query35	931	796	818	796
query36	934	824	850	824
query37	287	73	79	73
query38	3198	3088	3124	3088
query39	1405	1376	1383	1376
query40	295	122	120	120
query41	65	56	54	54
query42	113	101	107	101
query43	459	413	407	407
query44	1083	707	708	707
query45	207	193	196	193
query46	1054	803	774	774
query47	1624	1531	1571	1531
query48	436	354	349	349
query49	1215	339	346	339
query50	803	383	380	380
query51	6827	6699	6621	6621
query52	106	99	101	99
query53	349	284	290	284
query54	346	245	251	245
query55	90	85	82	82
query56	251	231	219	219
query57	1123	1008	1005	1005
query58	263	224	224	224
query59	2658	2493	2392	2392
query60	265	259	265	259
query61	119	117	112	112
query62	660	425	410	410
query63	312	284	289	284
query64	6416	3371	3237	3237
query65	3126	3060	3023	3023
query66	1465	354	336	336
query67	15212	14403	14249	14249
query68	14642	573	564	564
query69	699	381	387	381
query70	1515	1090	1130	1090
query71	645	274	280	274
query72	10035	2626	2508	2508
query73	3185	333	342	333
query74	7210	6763	6918	6763
query75	6785	2668	2686	2668
query76	8064	1111	1171	1111
query77	1150	260	259	259
query78	10141	9522	9618	9522
query79	8782	516	523	516
query80	1356	438	445	438
query81	500	210	216	210
query82	249	94	98	94
query83	401	152	140	140
query84	287	84	77	77
query85	1227	342	328	328
query86	375	317	303	303
query87	3472	3184	3192	3184
query88	3605	2311	2319	2311
query89	529	371	357	357
query90	2568	179	177	177
query91	165	130	128	128
query92	66	50	54	50
query93	3435	525	522	522
query94	1680	197	188	188
query95	454	351	359	351
query96	602	264	264	264
query97	3987	3845	3915	3845
query98	233	227	201	201
query99	1099	773	743	743
Total cold run time: 315665 ms
Total hot run time: 177828 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.4 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 52fadfe7bb3bf936175b399a7d02c8a5207c664a, data reload: false

query1	0.04	0.03	0.04
query2	0.07	0.03	0.03
query3	0.25	0.06	0.07
query4	1.64	0.09	0.10
query5	0.48	0.48	0.49
query6	1.28	0.66	0.66
query7	0.03	0.01	0.01
query8	0.05	0.03	0.03
query9	0.58	0.52	0.52
query10	0.58	0.58	0.57
query11	0.14	0.10	0.10
query12	0.12	0.10	0.10
query13	0.57	0.58	0.57
query14	0.73	0.74	0.76
query15	0.83	0.80	0.80
query16	0.38	0.37	0.37
query17	0.98	0.94	0.96
query18	0.27	0.25	0.25
query19	1.82	1.73	1.72
query20	0.02	0.01	0.01
query21	15.43	0.63	0.65
query22	2.58	4.15	3.02
query23	17.01	1.12	0.98
query24	2.30	0.60	0.22
query25	0.13	0.08	0.03
query26	0.22	0.13	0.13
query27	0.05	0.04	0.04
query28	12.15	0.88	0.84
query29	12.64	3.39	3.31
query30	0.60	0.58	0.58
query31	2.80	0.34	0.34
query32	3.36	0.44	0.43
query33	2.96	2.94	2.87
query34	15.52	4.33	4.30
query35	4.33	4.30	4.34
query36	1.10	1.03	1.02
query37	0.07	0.06	0.05
query38	0.04	0.02	0.02
query39	0.03	0.02	0.02
query40	0.16	0.13	0.13
query41	0.09	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.02	0.02
Total cold run time: 104.5 s
Total hot run time: 31.4 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 52fadfe7bb3bf936175b399a7d02c8a5207c664a with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          60 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       18.1 seconds inserted 10000000 Rows, about 552K ops/s

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Mar 5, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@sollhui sollhui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhannngchen zhannngchen merged commit 28678ca into apache:master Mar 5, 2024
yiguolei pushed a commit that referenced this pull request Mar 6, 2024
…des auto increment column (#31725)

Problem:
When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs.

Reason:
The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error.

Solution:
Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.
@wm1581066 wm1581066 added the usercase Important user case type label label Mar 12, 2024
yiguolei pushed a commit that referenced this pull request Mar 15, 2024
…des auto increment column (#31725)

Problem:
When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs.

Reason:
The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error.

Solution:
Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request Apr 3, 2024
…des auto increment column (apache#31725)

Problem:
When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs.

Reason:
The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error.

Solution:
Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request Apr 3, 2024
…des auto increment column (apache#31725)

Problem:
When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs.

Reason:
The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error.

Solution:
Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.
xiaokang pushed a commit that referenced this pull request Apr 3, 2024
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. meta-change reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants