Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](partial update) Allow to only specify key columns in partial update #40736

Merged
merged 3 commits into from
Sep 14, 2024

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Sep 12, 2024

branch-2.1-pick: #40863
branch-2.0-pick: #40864

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@bobhan1
Copy link
Contributor Author

bobhan1 commented Sep 12, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 42969 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5e784fd6a454203ccc464ab920a24b37a47af187, data reload: false

------ Round 1 ----------------------------------
q1	17608	7389	7248	7248
q2	2036	187	182	182
q3	10626	1287	1386	1287
q4	10514	1019	1068	1019
q5	7751	3203	3154	3154
q6	245	155	157	155
q7	1041	641	622	622
q8	9458	2059	2021	2021
q9	6830	6285	6326	6285
q10	7027	2515	2495	2495
q11	434	259	266	259
q12	420	237	243	237
q13	17781	3045	3078	3045
q14	292	265	250	250
q15	599	534	524	524
q16	537	429	423	423
q17	1008	972	967	967
q18	7429	6638	6943	6638
q19	1378	1259	1239	1239
q20	624	342	337	337
q21	3960	3620	3588	3588
q22	1084	1030	994	994
Total cold run time: 108682 ms
Total hot run time: 42969 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7574	7250	7255	7250
q2	348	237	246	237
q3	3102	3164	3047	3047
q4	2148	2109	2049	2049
q5	5724	5607	5749	5607
q6	242	154	156	154
q7	2167	1817	1809	1809
q8	3378	3443	3435	3435
q9	8919	8885	8826	8826
q10	3435	3583	3622	3583
q11	592	490	491	490
q12	824	625	583	583
q13	9611	3241	3220	3220
q14	308	302	268	268
q15	595	546	547	546
q16	537	477	466	466
q17	1805	1761	1778	1761
q18	8595	8054	8059	8054
q19	1784	1762	1759	1759
q20	2130	1875	1881	1875
q21	6054	5491	5706	5491
q22	1136	1076	1015	1015
Total cold run time: 71008 ms
Total hot run time: 61525 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.88% (9457/25642)
Line Coverage: 28.26% (77812/275334)
Region Coverage: 27.65% (40158/145220)
Branch Coverage: 24.27% (20410/84092)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5e784fd6a454203ccc464ab920a24b37a47af187_5e784fd6a454203ccc464ab920a24b37a47af187/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 199885 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5e784fd6a454203ccc464ab920a24b37a47af187, data reload: false

query1	1268	863	842	842
query2	6336	1839	1799	1799
query3	10663	3959	3986	3959
query4	55086	25542	24123	24123
query5	5101	527	559	527
query6	332	167	168	167
query7	5628	306	295	295
query8	271	225	208	208
query9	5997	2618	2620	2618
query10	403	306	281	281
query11	16094	15707	15751	15707
query12	148	102	96	96
query13	1398	388	371	371
query14	10628	6618	7222	6618
query15	210	181	181	181
query16	7043	495	470	470
query17	1165	606	620	606
query18	1910	300	302	300
query19	204	175	152	152
query20	124	120	115	115
query21	211	111	108	108
query22	4594	4770	4740	4740
query23	34838	33827	33832	33827
query24	6140	3069	3036	3036
query25	511	396	391	391
query26	604	149	151	149
query27	1594	274	283	274
query28	2856	2124	2099	2099
query29	677	410	402	402
query30	227	153	154	153
query31	978	778	789	778
query32	72	49	51	49
query33	445	288	298	288
query34	884	477	477	477
query35	848	715	728	715
query36	1048	876	909	876
query37	140	81	79	79
query38	3952	3916	3989	3916
query39	1475	1402	1400	1400
query40	210	112	112	112
query41	47	48	46	46
query42	127	98	95	95
query43	496	443	439	439
query44	1237	795	784	784
query45	196	173	167	167
query46	1100	842	810	810
query47	1889	1806	1796	1796
query48	370	282	288	282
query49	729	444	445	444
query50	934	440	430	430
query51	7099	6859	6908	6859
query52	99	88	86	86
query53	252	186	185	185
query54	563	463	473	463
query55	76	76	76	76
query56	268	262	263	262
query57	1209	1086	1077	1077
query58	227	251	265	251
query59	2939	2573	2721	2573
query60	286	272	283	272
query61	102	98	99	98
query62	810	668	675	668
query63	218	184	182	182
query64	1596	693	646	646
query65	3241	3222	3194	3194
query66	689	290	293	290
query67	15846	15707	15425	15425
query68	1430	579	572	572
query69	393	276	275	275
query70	1198	1109	1105	1105
query71	339	284	276	276
query72	4837	4028	4000	4000
query73	775	326	321	321
query74	9349	9151	9042	9042
query75	3319	2683	2721	2683
query76	1380	1340	1352	1340
query77	505	321	309	309
query78	10882	9488	9273	9273
query79	954	897	887	887
query80	1015	859	825	825
query81	488	274	268	268
query82	651	266	266	266
query83	198	191	197	191
query84	267	105	107	105
query85	720	410	469	410
query86	371	309	315	309
query87	4388	4362	4477	4362
query88	4136	4088	4038	4038
query89	389	365	374	365
query90	1867	312	308	308
query91	121	125	123	123
query92	87	77	78	77
query93	1093	1056	1069	1056
query94	729	403	365	365
query95	481	426	422	422
query96	490	474	479	474
query97	3156	3108	3163	3108
query98	228	227	229	227
query99	1739	1318	1299	1299
Total cold run time: 292933 ms
Total hot run time: 199885 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.72 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5e784fd6a454203ccc464ab920a24b37a47af187, data reload: false

query1	0.05	0.05	0.04
query2	0.07	0.04	0.04
query3	0.23	0.04	0.05
query4	1.67	0.08	0.08
query5	0.50	0.50	0.49
query6	1.16	0.72	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.57	0.52	0.53
query10	0.56	0.58	0.57
query11	0.16	0.12	0.12
query12	0.16	0.13	0.12
query13	0.63	0.62	0.60
query14	1.49	1.45	1.46
query15	0.91	0.88	0.88
query16	0.37	0.36	0.35
query17	1.05	1.02	1.03
query18	0.16	0.16	0.16
query19	1.90	1.83	1.77
query20	0.01	0.00	0.01
query21	15.40	0.67	0.68
query22	3.81	7.70	1.10
query23	17.79	1.28	1.34
query24	2.11	0.24	0.23
query25	0.19	0.08	0.09
query26	0.29	0.18	0.18
query27	0.08	0.08	0.07
query28	13.19	1.12	1.09
query29	12.67	3.33	3.29
query30	0.24	0.05	0.05
query31	2.89	0.42	0.42
query32	3.23	0.50	0.49
query33	3.05	3.00	3.11
query34	15.42	4.33	4.31
query35	4.33	4.35	4.34
query36	0.69	0.48	0.49
query37	0.19	0.16	0.16
query38	0.16	0.16	0.15
query39	0.05	0.03	0.04
query40	0.16	0.16	0.14
query41	0.10	0.05	0.05
query42	0.05	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 107.86 s
Total hot run time: 30.72 s

@bobhan1 bobhan1 force-pushed the partial-update-allow-only-keys branch from 5e784fd to a8d47dd Compare September 13, 2024 11:44
@bobhan1
Copy link
Contributor Author

bobhan1 commented Sep 13, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 42573 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a8d47dd12f18c83d39fde101baf41397adde27e7, data reload: false

------ Round 1 ----------------------------------
q1	17583	7210	7187	7187
q2	2036	202	180	180
q3	10455	1242	1350	1242
q4	10532	957	988	957
q5	7718	3142	3125	3125
q6	237	154	148	148
q7	1028	633	610	610
q8	9436	2004	2010	2004
q9	6788	6240	6332	6240
q10	7021	2505	2499	2499
q11	433	246	251	246
q12	403	224	232	224
q13	17756	2998	3014	2998
q14	281	251	265	251
q15	585	525	525	525
q16	521	434	424	424
q17	967	944	942	942
q18	7423	6726	6908	6726
q19	1375	1221	1230	1221
q20	613	345	327	327
q21	3865	3557	3495	3495
q22	1094	1002	1014	1002
Total cold run time: 108150 ms
Total hot run time: 42573 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7144	7086	7157	7086
q2	330	227	232	227
q3	3010	3230	3125	3125
q4	2070	2098	2114	2098
q5	5687	5621	5602	5602
q6	239	159	153	153
q7	2199	1788	1769	1769
q8	3326	3421	3386	3386
q9	8727	8891	8870	8870
q10	3548	3563	3543	3543
q11	584	478	496	478
q12	813	613	625	613
q13	9340	3241	3187	3187
q14	311	279	273	273
q15	581	530	552	530
q16	522	469	461	461
q17	1779	1796	1750	1750
q18	8380	8165	8130	8130
q19	1749	1737	1744	1737
q20	2157	1908	1900	1900
q21	5947	5417	5603	5417
q22	1151	1058	1027	1027
Total cold run time: 69594 ms
Total hot run time: 61362 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.89% (9464/25654)
Line Coverage: 28.25% (77816/275426)
Region Coverage: 27.67% (40208/145293)
Branch Coverage: 24.28% (20427/84138)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a8d47dd12f18c83d39fde101baf41397adde27e7_a8d47dd12f18c83d39fde101baf41397adde27e7/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 199827 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a8d47dd12f18c83d39fde101baf41397adde27e7, data reload: false

query1	1262	870	876	870
query2	6203	1811	1758	1758
query3	10643	3856	3991	3856
query4	57458	26038	24311	24311
query5	5107	552	526	526
query6	331	168	161	161
query7	5657	311	299	299
query8	274	230	227	227
query9	6153	2598	2607	2598
query10	396	291	284	284
query11	16061	15539	15616	15539
query12	151	106	102	102
query13	1480	455	412	412
query14	10179	7162	6932	6932
query15	215	180	192	180
query16	7030	504	505	504
query17	1156	627	597	597
query18	1899	322	307	307
query19	204	161	165	161
query20	120	113	113	113
query21	212	110	110	110
query22	4793	4534	4644	4534
query23	34689	33970	33563	33563
query24	6138	3057	3130	3057
query25	526	410	415	410
query26	614	162	165	162
query27	1561	287	291	287
query28	2802	2104	2081	2081
query29	683	438	435	435
query30	223	155	157	155
query31	965	791	819	791
query32	80	55	59	55
query33	472	309	318	309
query34	898	488	482	482
query35	873	749	724	724
query36	1065	929	918	918
query37	140	86	84	84
query38	4144	3919	4026	3919
query39	1495	1398	1410	1398
query40	217	119	118	118
query41	51	47	47	47
query42	121	102	101	101
query43	501	452	441	441
query44	1226	799	774	774
query45	197	172	174	172
query46	1077	842	801	801
query47	1870	1778	1813	1778
query48	381	295	297	295
query49	753	467	477	467
query50	898	449	458	449
query51	7092	6975	6815	6815
query52	102	92	89	89
query53	256	185	187	185
query54	580	465	458	458
query55	78	74	84	74
query56	289	275	297	275
query57	1225	1075	1084	1075
query58	234	233	245	233
query59	2697	2498	2463	2463
query60	311	279	266	266
query61	102	107	116	107
query62	794	673	663	663
query63	233	188	191	188
query64	1600	692	689	689
query65	3250	3176	3184	3176
query66	686	305	289	289
query67	16013	15578	15457	15457
query68	2033	585	598	585
query69	413	276	298	276
query70	1188	1083	1106	1083
query71	353	288	282	282
query72	6052	4106	4055	4055
query73	768	333	332	332
query74	9371	9160	8945	8945
query75	3361	2739	2768	2739
query76	1376	1252	1289	1252
query77	544	336	328	328
query78	10107	9553	9354	9354
query79	1077	885	860	860
query80	1051	847	836	836
query81	487	272	270	270
query82	1159	269	261	261
query83	201	187	196	187
query84	271	106	108	106
query85	777	418	398	398
query86	340	321	306	306
query87	4464	4322	4404	4322
query88	4162	4080	4062	4062
query89	394	369	380	369
query90	1925	329	309	309
query91	126	122	123	122
query92	82	80	77	77
query93	1262	1028	1022	1022
query94	790	382	392	382
query95	554	426	424	424
query96	470	470	469	469
query97	3182	3156	3130	3130
query98	231	235	239	235
query99	1716	1300	1293	1293
Total cold run time: 297473 ms
Total hot run time: 199827 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.28 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a8d47dd12f18c83d39fde101baf41397adde27e7, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.04	0.04
query3	0.22	0.05	0.04
query4	1.67	0.07	0.07
query5	0.50	0.49	0.49
query6	1.14	0.72	0.72
query7	0.01	0.01	0.01
query8	0.05	0.05	0.05
query9	0.58	0.51	0.50
query10	0.57	0.60	0.56
query11	0.16	0.11	0.11
query12	0.16	0.12	0.12
query13	0.63	0.62	0.61
query14	1.45	1.50	1.48
query15	0.90	0.88	0.87
query16	0.35	0.37	0.37
query17	1.08	1.06	1.04
query18	0.16	0.16	0.16
query19	1.96	1.82	1.84
query20	0.01	0.00	0.01
query21	15.40	0.69	0.68
query22	4.20	8.09	1.44
query23	17.90	1.36	1.38
query24	2.26	0.23	0.23
query25	0.19	0.09	0.08
query26	0.29	0.18	0.18
query27	0.08	0.08	0.07
query28	13.18	1.13	1.09
query29	12.76	3.36	3.33
query30	0.24	0.04	0.05
query31	2.89	0.42	0.41
query32	3.23	0.49	0.48
query33	3.04	3.03	3.05
query34	15.43	4.32	4.32
query35	4.33	4.35	4.33
query36	0.68	0.49	0.51
query37	0.18	0.16	0.16
query38	0.16	0.15	0.16
query39	0.04	0.04	0.04
query40	0.17	0.13	0.13
query41	0.10	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 108.57 s
Total hot run time: 31.28 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 14, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 14, 2024
@bobhan1 bobhan1 mentioned this pull request Oct 14, 2024
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 15, 2024
…update apache#39619

pick [opt](partial update) Remove unnecessary lock and refactor some code for partial update (apache#40062)

1. apache#34112 let partial update fetch
rowsets in the initialization of RowsetBuilder rather than flush phase.
So we can remove that tablet header lock.
2. refactor some partial update code

fix compile

pick [Fix](partial update) Fix __DORIS_SEQUENCE_COL__ is not set for newly inserted rows in partial update apache#40272

picks apache#40272

pick [Cherry-pick](branch-2.1) Pick "[Featrue](default value) Support bitmap_empty default value (apache#40364)" (apache#40487)

Pick apache#40364

<!--Describe your changes.-->

pick [Feature](partial update) Support flexible partial update in stream load with json files (apache#39756)

This PR add the ability to update different columns for each row in one
stream load
Doc: apache/doris-website#1140
```sql
MySQL root@127.1:d1> CREATE TABLE t1 (
                  -> `k` int(11) NULL,
                  -> `v1` BIGINT NULL,
                  -> `v2` BIGINT NULL DEFAULT "9876",
                  -> `v3` BIGINT NOT NULL,
                  -> `v4` BIGINT NOT NULL DEFAULT "1234",
                  -> `v5` BIGINT NULL
                  -> ) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1
                  -> PROPERTIES(
                  -> "replication_num" = "1",
                  -> "enable_unique_key_merge_on_write" = "true");
Query OK, 0 rows affected
Time: 0.013s
MySQL root@127.1:d1> insert into t1 select number, number, number, number, number, number from numbers("number" = "6");
Query OK, 6 rows affected
Time: 0.107s
MySQL root@127.1:d1> select * from t1;
+---+----+----+----+----+----+
| k | v1 | v2 | v3 | v4 | v5 |
+---+----+----+----+----+----+
| 0 | 0  | 0  | 0  | 0  | 0  |
| 1 | 1  | 1  | 1  | 1  | 1  |
| 2 | 2  | 2  | 2  | 2  | 2  |
| 3 | 3  | 3  | 3  | 3  | 3  |
| 4 | 4  | 4  | 4  | 4  | 4  |
| 5 | 5  | 5  | 5  | 5  | 5  |
+---+----+----+----+----+----+
```
test1.json:
```json
{"k": 1, "v1": 10}
{"k": 2, "v2": 20, "v5": 25}
{"k": 3, "v3": 30}
{"k": 4, "v4": 20, "v1": 43, "v3": 99}
{"k": 5, "v5": null}
{"k": 6, "v1": 999, "v3": 777}
{"k": 2, "v4": 222}
{"k": 1, "v2": 111, "v3": 111}
```

```bash
curl --location-trusted -u root: \
-H "strict_mode:false" \
-H "format:json" \
-H "read_json_by_line:true" \
-H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \
-T test1.json \
-XPUT http://<host>:<http_port>/api/d1/t1/_stream_load
```

```sql
MySQL root@127.1:d1> select * from t1;
+---+-----+------+-----+------+--------+
| k | v1  | v2   | v3  | v4   | v5     |
+---+-----+------+-----+------+--------+
| 0 | 0   | 0    | 0   | 0    | 0      |
| 1 | 10  | 111  | 111 | 1    | 1      |
| 2 | 2   | 20   | 2   | 222  | 25     |
| 3 | 3   | 3    | 30  | 3    | 3      |
| 4 | 43  | 4    | 99  | 20   | 4      |
| 5 | 5   | 5    | 5   | 5    | <null> |
| 6 | 999 | 9876 | 777 | 1234 | <null> |
+---+-----+------+-----+------+--------+
```

fix compile

pick [branch-2.1] Picks "[opt](partial update) Allow to only specify key columns in partial update apache#40736" (apache#40863)

picks apache#40736

fix
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 16, 2024
bobhan1 added a commit to bobhan1/doris that referenced this pull request Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants