Skip to content

[improvement](fe) Enhance FE meta service request validation#63782

Open
CalvinKirs wants to merge 1 commit into
apache:masterfrom
CalvinKirs:rich-master-meta-token-auth
Open

[improvement](fe) Enhance FE meta service request validation#63782
CalvinKirs wants to merge 1 commit into
apache:masterfrom
CalvinKirs:rich-master-meta-token-auth

Conversation

@CalvinKirs
Copy link
Copy Markdown
Member

What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: FE meta service endpoints are used for FE-to-FE metadata synchronization and coordination. This PR enhances internal request validation by carrying the cluster token on FE meta requests and validating it on the receiver side. It also adds a temporary compatibility switch for rolling upgrades from FE nodes that do not send the token yet.

Release note

FE meta service internal requests now validate the cluster token by default. During rolling upgrades from older versions, set enable_meta_service_legacy_node_ident_auth=true temporarily on upgraded FEs if old FEs still need to call these endpoints without token headers. Disable it after all FEs are upgraded. The /dump endpoint now always runs the HTTP user check, and /put only accepts the configured FE HTTP port.

Check List (For Author)

  • Test: Unit Test, Manual test
    • ./run-fe-ut.sh --run org.apache.doris.httpv2.meta.MetaServiceTest,org.apache.doris.common.util.HttpURLUtilTest
    • mvn -pl fe-core -am -DskipUT=false -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=org.apache.doris.httpv2.meta.MetaServiceTest,org.apache.doris.common.util.HttpURLUtilTest test
    • Manual: started FE on HTTP 26030 and query port 27030; verified no-token FE meta request returns business code 401, and token-carrying /image?version=155292 returns HTTP 200.
  • Behavior changed: Yes. FE meta service endpoints require the cluster token by default; legacy header-only node identity fallback is available only when enable_meta_service_legacy_node_ident_auth=true. /dump now always runs the HTTP user check. /put rejects ports other than Config.http_port.
  • Does this need documentation: Yes. Document rolling-upgrade use of enable_meta_service_legacy_node_ident_auth and cluster-token validation for FE meta service requests.

### What problem does this PR solve?

Issue Number: N/A

Related PR: N/A

Problem Summary: FE meta service endpoints are used by FE nodes for metadata synchronization and coordination. This change strengthens internal FE caller validation by carrying the cluster token on FE-to-FE meta requests and validating it on the receiver side, while preserving a temporary legacy switch for rolling upgrades.

### Release note

FE meta service internal requests now include cluster token validation by default. During rolling upgrades from older versions, set enable_meta_service_legacy_node_ident_auth=true temporarily on upgraded FEs if old FEs still need to call these endpoints without token headers. Disable it after all FEs are upgraded. The /dump endpoint now always checks HTTP user credentials, and /put only accepts the configured FE HTTP port.

### Check List (For Author)

- Test: Unit Test, Manual test
    - mvn -pl fe-core -am -DskipUT=false -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=org.apache.doris.httpv2.meta.MetaServiceTest test
    - mvn -pl fe-core -am -DskipUT=false -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=org.apache.doris.httpv2.meta.MetaServiceTest,org.apache.doris.common.util.HttpURLUtilTest test
    - ./run-fe-ut.sh --run org.apache.doris.httpv2.meta.MetaServiceTest,org.apache.doris.common.util.HttpURLUtilTest
    - Manual: started FE on HTTP 26030 and query port 27030; verified no-token FE meta request returns business code 401, and token-carrying /image?version=155292 returns HTTP 200.
- Behavior changed: Yes. FE meta service endpoints require the cluster token by default; legacy header-only node identity fallback is available only when enable_meta_service_legacy_node_ident_auth=true. /dump now always checks HTTP user credentials. /put rejects ports other than Config.http_port.
- Does this need documentation: Yes. Document rolling-upgrade use of enable_meta_service_legacy_node_ident_auth and cluster-token validation for FE meta service requests.
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 80.56% (29/36) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31382 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a6cd5641810a53acde03e78545954f473b9c5acd, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17763	4019	3977	3977
q2	q3	10844	1387	800	800
q4	4688	479	342	342
q5	7603	2291	2165	2165
q6	340	174	143	143
q7	926	790	650	650
q8	9370	1728	1578	1578
q9	6907	4931	5037	4931
q10	6430	2237	1888	1888
q11	442	269	236	236
q12	682	426	286	286
q13	18272	3739	2743	2743
q14	269	255	234	234
q15	q16	823	778	716	716
q17	945	930	980	930
q18	6938	5841	5485	5485
q19	1166	1392	1042	1042
q20	552	403	269	269
q21	5904	2692	2659	2659
q22	457	391	308	308
Total cold run time: 101321 ms
Total hot run time: 31382 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4764	4666	4876	4666
q2	q3	4875	5315	4676	4676
q4	2135	2155	1382	1382
q5	4844	4755	4680	4680
q6	230	175	129	129
q7	1820	1687	1613	1613
q8	2433	1999	1918	1918
q9	7423	7403	7352	7352
q10	4757	4670	4247	4247
q11	550	380	346	346
q12	726	742	521	521
q13	3023	3359	2808	2808
q14	262	282	244	244
q15	q16	674	705	613	613
q17	1257	1245	1236	1236
q18	7386	6975	6708	6708
q19	1134	1113	1132	1113
q20	2204	2208	1928	1928
q21	5229	4610	4377	4377
q22	515	458	417	417
Total cold run time: 56241 ms
Total hot run time: 50974 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171487 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a6cd5641810a53acde03e78545954f473b9c5acd, data reload: false

query5	4339	666	515	515
query6	325	229	202	202
query7	4243	589	305	305
query8	324	237	226	226
query9	8808	4066	4013	4013
query10	467	347	302	302
query11	5796	2489	2263	2263
query12	182	130	124	124
query13	1290	623	436	436
query14	6068	5442	5139	5139
query14_1	4498	4466	4475	4466
query15	222	210	186	186
query16	1003	464	360	360
query17	1162	746	608	608
query18	2444	512	374	374
query19	224	207	176	176
query20	138	144	131	131
query21	214	140	118	118
query22	13671	13498	13355	13355
query23	17363	16507	16283	16283
query23_1	16383	16254	16275	16254
query24	7425	1783	1315	1315
query24_1	1346	1312	1335	1312
query25	586	507	441	441
query26	1329	335	177	177
query27	2695	566	348	348
query28	4398	2016	2043	2016
query29	1079	648	514	514
query30	309	236	202	202
query31	1112	1090	963	963
query32	101	81	74	74
query33	573	388	313	313
query34	1181	1189	668	668
query35	757	819	727	727
query36	1411	1420	1256	1256
query37	147	103	90	90
query38	3199	3136	3056	3056
query39	914	926	939	926
query39_1	881	865	877	865
query40	230	146	124	124
query41	66	62	60	60
query42	108	107	115	107
query43	322	329	286	286
query44	
query45	211	205	200	200
query46	1125	1176	728	728
query47	2407	2337	2273	2273
query48	405	409	306	306
query49	622	487	402	402
query50	962	340	247	247
query51	4316	4349	4257	4257
query52	100	103	92	92
query53	251	287	202	202
query54	301	276	249	249
query55	93	89	88	88
query56	309	296	295	295
query57	1428	1414	1343	1343
query58	319	259	257	257
query59	1557	1638	1463	1463
query60	320	315	308	308
query61	173	155	152	152
query62	687	645	574	574
query63	242	197	201	197
query64	2481	789	636	636
query65	
query66	1734	471	368	368
query67	29717	29659	29512	29512
query68	
query69	461	338	306	306
query70	997	1017	996	996
query71	300	266	300	266
query72	3001	2718	2443	2443
query73	806	755	419	419
query74	5107	4945	4805	4805
query75	2692	2596	2278	2278
query76	2312	1136	784	784
query77	406	431	338	338
query78	12342	12470	11887	11887
query79	1515	1092	729	729
query80	1209	569	456	456
query81	501	285	236	236
query82	1355	155	128	128
query83	368	277	245	245
query84	264	140	112	112
query85	927	529	454	454
query86	465	333	333	333
query87	3420	3360	3229	3229
query88	3617	2738	2724	2724
query89	459	384	340	340
query90	1801	185	180	180
query91	181	165	139	139
query92	79	83	76	76
query93	1580	1444	813	813
query94	633	357	286	286
query95	693	370	334	334
query96	1031	793	374	374
query97	2748	2724	2642	2642
query98	243	236	229	229
query99	1174	1164	1027	1027
Total cold run time: 254822 ms
Total hot run time: 171487 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 27.12% (16/59) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants