Skip to content

[refactor](oss) unify FE OSS filesystem with Jindo#61269

Merged
CalvinKirs merged 2 commits intoapache:masterfrom
CalvinKirs:codex/unify-jindo-fs
Mar 17, 2026
Merged

[refactor](oss) unify FE OSS filesystem with Jindo#61269
CalvinKirs merged 2 commits intoapache:masterfrom
CalvinKirs:codex/unify-jindo-fs

Conversation

@CalvinKirs
Copy link
Member

This PR unifies the FE-side OSS Hadoop filesystem implementation to Jindo FS and removes legacy OSS filesystem dependencies that are no longer needed.

Why

We currently have multiple OSS filesystem implementations on the FE classpath, including:

  • org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem
  • paimon-oss

This makes OSS behavior inconsistent and increases the chance of classpath conflicts. Since Doris already packages and uses Jindo FS, FE should consistently use Jindo instead of mixing multiple OSS filesystem implementations.

Changes

  • Switch OSSProperties to use Jindo FS:
    • fs.oss.impl = com.aliyun.jindodata.oss.JindoOssFileSystem
    • fs.AbstractFileSystem.oss.impl = com.aliyun.jindodata.oss.JindoOSS
  • Keep OSSHdfsProperties aligned with the same Jindo FS constants.
  • Add FE unit test coverage to verify OSS Hadoop config is initialized with Jindo FS.
  • Remove legacy OSS filesystem dependencies from FE modules:
    • remove paimon-oss from fe-core
    • remove paimon-oss from preload-extensions
    • remove hadoop-aliyun from FE dependency management and hadoop-deps

Scope

This PR only updates FE-side OSS filesystem wiring and FE-related dependency cleanup.
Non-FE modules are intentionally left unchanged.

Verification

  • run-fe-ut.sh --run org.apache.doris.datasource.property.storage.OSSPropertiesTest,org.apache.doris.datasource.property.storage.OSSHdfsPropertiesTest
  • Full FE reactor build passed

Notes

aliyun-sdk-oss is still kept because it is still used by FE cloud storage code (OssRemote) and is not part of the Hadoop OSS filesystem implementation cleanup in this PR.

@CalvinKirs CalvinKirs requested a review from morningman as a code owner March 12, 2026 09:40
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27835 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 13467210ebc203335926eea974969c5ce2c5a057, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17592	4431	4282	4282
q2	q3	10649	794	514	514
q4	4683	371	255	255
q5	7557	1188	1019	1019
q6	174	175	148	148
q7	789	837	675	675
q8	9284	1483	1374	1374
q9	4868	4731	4759	4731
q10	6243	1886	1639	1639
q11	463	282	267	267
q12	743	567	472	472
q13	18048	2950	2174	2174
q14	229	228	204	204
q15	919	797	804	797
q16	753	737	703	703
q17	710	842	424	424
q18	6042	5341	5269	5269
q19	1124	983	641	641
q20	511	488	392	392
q21	4347	2021	1579	1579
q22	393	321	276	276
Total cold run time: 96121 ms
Total hot run time: 27835 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4886	4590	4513	4513
q2	q3	3932	4355	3829	3829
q4	897	1392	805	805
q5	4055	4395	4333	4333
q6	187	175	140	140
q7	1779	1663	1499	1499
q8	2486	2787	2564	2564
q9	7530	7527	7368	7368
q10	3779	4037	3669	3669
q11	563	453	442	442
q12	510	609	469	469
q13	2707	3141	2297	2297
q14	284	301	279	279
q15	886	812	801	801
q16	706	806	740	740
q17	1205	1494	1357	1357
q18	7001	6883	6645	6645
q19	961	919	946	919
q20	2093	2254	1978	1978
q21	4516	3556	3452	3452
q22	502	440	375	375
Total cold run time: 51465 ms
Total hot run time: 48474 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153036 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 13467210ebc203335926eea974969c5ce2c5a057, data reload: false

query5	4344	631	510	510
query6	333	243	217	217
query7	4229	462	275	275
query8	349	257	241	241
query9	8719	2755	2772	2755
query10	505	421	338	338
query11	7339	5877	5539	5539
query12	191	129	122	122
query13	1263	456	337	337
query14	5714	3765	3496	3496
query14_1	2810	2783	2777	2777
query15	207	196	179	179
query16	989	460	464	460
query17	1111	717	602	602
query18	2448	448	351	351
query19	222	210	185	185
query20	148	132	129	129
query21	227	149	125	125
query22	5119	5112	5048	5048
query23	15965	15592	15263	15263
query23_1	15457	16473	15886	15886
query24	7281	1680	1282	1282
query24_1	1304	1297	1308	1297
query25	627	525	536	525
query26	1468	284	169	169
query27	2852	515	308	308
query28	4517	1937	1949	1937
query29	900	605	471	471
query30	311	252	210	210
query31	1369	1322	1217	1217
query32	79	72	72	72
query33	503	317	269	269
query34	930	907	561	561
query35	645	680	600	600
query36	1089	1115	993	993
query37	135	94	84	84
query38	2965	2915	2853	2853
query39	884	878	848	848
query39_1	821	825	830	825
query40	231	189	133	133
query41	62	59	58	58
query42	313	306	292	292
query43	234	244	220	220
query44	
query45	199	187	180	180
query46	882	981	622	622
query47	2124	2151	2141	2141
query48	303	308	222	222
query49	622	451	380	380
query50	681	273	212	212
query51	4141	4182	4040	4040
query52	288	294	279	279
query53	292	334	277	277
query54	309	274	266	266
query55	95	89	81	81
query56	310	319	309	309
query57	1390	1362	1296	1296
query58	290	277	284	277
query59	1340	1471	1265	1265
query60	343	326	324	324
query61	145	146	140	140
query62	622	584	544	544
query63	304	282	274	274
query64	5050	1277	986	986
query65	
query66	1481	451	346	346
query67	16476	16552	16412	16412
query68	
query69	394	301	276	276
query70	1001	944	1003	944
query71	338	307	300	300
query72	2769	2626	2382	2382
query73	551	547	331	331
query74	10016	9932	9796	9796
query75	2849	2757	2447	2447
query76	2283	1037	670	670
query77	361	379	297	297
query78	11224	11376	10661	10661
query79	1141	766	592	592
query80	1334	611	545	545
query81	564	271	241	241
query82	1094	152	113	113
query83	328	254	243	243
query84	254	116	101	101
query85	902	492	436	436
query86	418	340	313	313
query87	3216	3107	3013	3013
query88	3528	2677	2681	2677
query89	427	365	346	346
query90	2024	169	174	169
query91	162	155	140	140
query92	76	75	69	69
query93	915	804	509	509
query94	643	282	303	282
query95	588	333	314	314
query96	638	521	231	231
query97	2493	2495	2451	2451
query98	231	215	225	215
query99	1017	981	914	914
Total cold run time: 233147 ms
Total hot run time: 153036 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉
Increment coverage report
Complete coverage report

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27041 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2b78cd0ae117ffb43e35ba1642826de2861aeeb3, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17704	4464	4290	4290
q2	q3	10644	804	536	536
q4	4676	372	255	255
q5	7621	1206	1020	1020
q6	182	182	148	148
q7	835	839	678	678
q8	10446	1485	1358	1358
q9	5687	4815	4751	4751
q10	6344	1923	1645	1645
q11	473	269	246	246
q12	764	573	461	461
q13	18057	2950	2165	2165
q14	234	225	217	217
q15	q16	751	750	668	668
q17	736	849	440	440
q18	6027	5339	5327	5327
q19	1123	989	634	634
q20	543	493	382	382
q21	4651	2104	1525	1525
q22	378	356	295	295
Total cold run time: 97876 ms
Total hot run time: 27041 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4671	4694	4620	4620
q2	q3	3889	4342	3828	3828
q4	839	1203	759	759
q5	4046	4395	4409	4395
q6	183	179	148	148
q7	1781	1702	1559	1559
q8	2555	2724	2565	2565
q9	7668	7548	7253	7253
q10	3865	3977	3627	3627
q11	506	424	416	416
q12	523	675	485	485
q13	2851	3192	2319	2319
q14	278	304	281	281
q15	q16	746	761	721	721
q17	1171	1341	1320	1320
q18	7308	6717	6704	6704
q19	867	897	946	897
q20	2093	2237	2037	2037
q21	3973	3455	3323	3323
q22	475	424	377	377
Total cold run time: 50288 ms
Total hot run time: 47634 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168679 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2b78cd0ae117ffb43e35ba1642826de2861aeeb3, data reload: false

query5	4331	645	524	524
query6	335	238	220	220
query7	4216	480	274	274
query8	350	248	256	248
query9	8753	2773	2775	2773
query10	520	378	366	366
query11	7018	5136	4928	4928
query12	189	127	128	127
query13	1277	455	358	358
query14	5793	3736	3495	3495
query14_1	2851	2788	2828	2788
query15	208	199	177	177
query16	996	471	379	379
query17	1111	727	626	626
query18	2452	469	350	350
query19	217	211	189	189
query20	133	132	133	132
query21	218	135	117	117
query22	13329	13978	14638	13978
query23	16234	15799	15604	15604
query23_1	15731	15749	16168	15749
query24	7415	1615	1222	1222
query24_1	1246	1239	1245	1239
query25	567	490	434	434
query26	1270	256	153	153
query27	2771	483	297	297
query28	4471	1863	1857	1857
query29	815	574	471	471
query30	296	225	190	190
query31	1025	940	869	869
query32	82	74	71	71
query33	495	336	276	276
query34	917	862	524	524
query35	645	691	597	597
query36	1067	1131	995	995
query37	133	95	85	85
query38	2997	2895	2921	2895
query39	868	837	810	810
query39_1	784	803	797	797
query40	231	153	132	132
query41	62	62	58	58
query42	264	257	262	257
query43	240	249	225	225
query44	
query45	202	191	182	182
query46	877	1026	603	603
query47	2151	2151	2064	2064
query48	320	313	230	230
query49	635	466	376	376
query50	688	282	209	209
query51	4061	4047	4033	4033
query52	258	265	254	254
query53	284	337	283	283
query54	293	269	275	269
query55	95	92	96	92
query56	310	321	318	318
query57	1819	1643	1697	1643
query58	287	277	264	264
query59	2776	2935	2735	2735
query60	344	337	321	321
query61	149	145	152	145
query62	622	597	545	545
query63	308	281	280	280
query64	5009	1249	964	964
query65	
query66	1474	462	352	352
query67	24277	24338	24232	24232
query68	
query69	402	311	285	285
query70	997	962	978	962
query71	340	305	304	304
query72	2858	2640	2430	2430
query73	559	541	325	325
query74	9642	9643	9452	9452
query75	2866	2754	2466	2466
query76	2274	1032	673	673
query77	373	408	313	313
query78	10958	11168	10456	10456
query79	1144	773	586	586
query80	1337	631	542	542
query81	566	265	226	226
query82	993	150	118	118
query83	351	268	253	253
query84	257	111	91	91
query85	926	513	435	435
query86	436	307	320	307
query87	3134	3128	3051	3051
query88	3556	2669	2672	2669
query89	436	370	358	358
query90	2033	174	192	174
query91	172	165	134	134
query92	82	74	71	71
query93	938	836	508	508
query94	650	324	289	289
query95	592	347	326	326
query96	642	518	230	230
query97	2471	2516	2431	2431
query98	229	232	221	221
query99	1041	998	924	924
Total cold run time: 249604 ms
Total hot run time: 168679 ms

@morningman
Copy link
Contributor

/review

hadoopStorageConfig.set("fs.oss.accessKeyId", accessKey);
hadoopStorageConfig.set("fs.oss.accessKeySecret", secretKey);
hadoopStorageConfig.set("fs.oss.endpoint", endpoint);
hadoopStorageConfig.set("fs.oss.region", region);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we missing this before?

@github-actions
Copy link
Contributor

Code Review Summary

PR: refactor unify FE OSS filesystem with Jindo

This PR unifies the FE-side OSS Hadoop filesystem implementation to Jindo FS, replacing the legacy AliyunOSSFileSystem and paimon-oss dependencies. The changes are clean, focused, and well-tested.

Critical Checkpoint Conclusions

  1. Goal and correctness: The PR accomplishes its stated goal of switching FE OSS filesystem to Jindo FS. Both OSSProperties and OSSHdfsProperties now consistently use Jindo FS class names via shared constants. Existing tests (OSSPropertiesTest, OSSHdfsPropertiesTest) and a new test (testUseJindoFsForHadoopStorageConfig) verify the behavior.

  2. Modification scope: The change is minimal and focused — only the necessary files are touched: two Java source files, four pom.xml files, one test file, and one regression test.

  3. Concurrency: No concurrency concerns — the modified code paths involve object construction and configuration initialization, not shared mutable state.

  4. Lifecycle management: No lifecycle concerns. The Jindo FS constants are package-private static finals, appropriate for their usage scope.

  5. Configuration items: No new configuration items are added. The change is transparent to users — the same user-facing properties (oss.endpoint, oss.access_key, etc.) continue to work; only the underlying filesystem implementation class changes.

  6. Compatibility / rolling upgrades: This is an FE-only change that affects which Hadoop filesystem class is instantiated. Jindo FS was already on the FE classpath (loaded from lib/jindofs/ at startup). No protocol or storage format changes. Rolling upgrades are safe since FE nodes independently resolve filesystem implementations.

  7. Parallel code paths: Verified that all FE code paths setting fs.oss.impl now point to Jindo. No stale references to AliyunOSSFileSystem remain in FE. The broker module (fs_brokers/) still uses the old implementation, but the PR description explicitly scopes this to FE only, which is appropriate.

  8. Regression test fix: The change from 'fs.oss.support' to 'fs.oss-hdfs.support' in oss_hdfs_catalog_test.groovy is correct — the test exercises OSS-HDFS (JindoData DLS) properties, and fs.oss-hdfs.support routes to OSSHdfsProperties while the old fs.oss.support would incorrectly route to OSSProperties.

  9. Dependency removals: Verified that paimon-oss has no remaining references anywhere in the codebase, and hadoop-aliyun has no remaining references in FE pom files. Clean removal.

  10. Region null safety: The newly added hadoopStorageConfig.set("fs.oss.region", region) is safe — region is guaranteed non-null by validation in AbstractS3CompatibleProperties.initNormalizeAndCheckProps() which throws before initializeHadoopStorageConfig() is called.

  11. Test coverage: The new unit test testUseJindoFsForHadoopStorageConfig verifies the Jindo FS impl, abstract FS impl, and region are correctly set. Existing OSSHdfsPropertiesTest already validates Jindo constants with hardcoded string assertions (not constant references), providing independent verification.

  12. Performance: No performance concerns. Configuration initialization is not a hot path.

  13. Observability: No observability changes needed for this refactoring.

Minor Pre-existing Note (not blocking)

OSSProperties.initializeHadoopStorageConfig() does not propagate sessionToken as fs.oss.securityToken to the Hadoop config, meaning STS temporary credentials won't work with Jindo FS via the native oss:// path. This is a pre-existing gap (also present in OSSHdfsProperties and peer implementations like COS/OBS), not introduced by this PR. Consider addressing it in a follow-up.

Verdict: No issues found. The PR is clean and ready.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 17, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@CalvinKirs CalvinKirs merged commit ae65747 into apache:master Mar 17, 2026
29 of 32 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 17, 2026
This PR unifies the FE-side OSS Hadoop filesystem implementation to
Jindo FS and removes legacy OSS filesystem dependencies that are no
longer needed.

## Why

We currently have multiple OSS filesystem implementations on the FE
classpath, including:
- `org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem`
- `paimon-oss`

This makes OSS behavior inconsistent and increases the chance of
classpath conflicts. Since Doris already packages and uses Jindo FS, FE
should consistently use Jindo instead of mixing multiple OSS filesystem
implementations.

## Changes

- Switch `OSSProperties` to use Jindo FS:
  - `fs.oss.impl = com.aliyun.jindodata.oss.JindoOssFileSystem`
  - `fs.AbstractFileSystem.oss.impl = com.aliyun.jindodata.oss.JindoOSS`
- Keep `OSSHdfsProperties` aligned with the same Jindo FS constants.
- Add FE unit test coverage to verify OSS Hadoop config is initialized
with Jindo FS.
- Remove legacy OSS filesystem dependencies from FE modules:
  - remove `paimon-oss` from `fe-core`
  - remove `paimon-oss` from `preload-extensions`
- remove `hadoop-aliyun` from FE dependency management and `hadoop-deps`

## Scope

This PR only updates FE-side OSS filesystem wiring and FE-related
dependency cleanup.
Non-FE modules are intentionally left unchanged.

## Verification

- `run-fe-ut.sh --run
org.apache.doris.datasource.property.storage.OSSPropertiesTest,org.apache.doris.datasource.property.storage.OSSHdfsPropertiesTest`
- Full FE reactor build passed

## Notes

`aliyun-sdk-oss` is still kept because it is still used by FE cloud
storage code (`OssRemote`) and is not part of the Hadoop OSS filesystem
implementation cleanup in this PR.
github-actions bot pushed a commit that referenced this pull request Mar 17, 2026
This PR unifies the FE-side OSS Hadoop filesystem implementation to
Jindo FS and removes legacy OSS filesystem dependencies that are no
longer needed.

## Why

We currently have multiple OSS filesystem implementations on the FE
classpath, including:
- `org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem`
- `paimon-oss`

This makes OSS behavior inconsistent and increases the chance of
classpath conflicts. Since Doris already packages and uses Jindo FS, FE
should consistently use Jindo instead of mixing multiple OSS filesystem
implementations.

## Changes

- Switch `OSSProperties` to use Jindo FS:
  - `fs.oss.impl = com.aliyun.jindodata.oss.JindoOssFileSystem`
  - `fs.AbstractFileSystem.oss.impl = com.aliyun.jindodata.oss.JindoOSS`
- Keep `OSSHdfsProperties` aligned with the same Jindo FS constants.
- Add FE unit test coverage to verify OSS Hadoop config is initialized
with Jindo FS.
- Remove legacy OSS filesystem dependencies from FE modules:
  - remove `paimon-oss` from `fe-core`
  - remove `paimon-oss` from `preload-extensions`
- remove `hadoop-aliyun` from FE dependency management and `hadoop-deps`

## Scope

This PR only updates FE-side OSS filesystem wiring and FE-related
dependency cleanup.
Non-FE modules are intentionally left unchanged.

## Verification

- `run-fe-ut.sh --run
org.apache.doris.datasource.property.storage.OSSPropertiesTest,org.apache.doris.datasource.property.storage.OSSHdfsPropertiesTest`
- Full FE reactor build passed

## Notes

`aliyun-sdk-oss` is still kept because it is still used by FE cloud
storage code (`OssRemote`) and is not part of the Hadoop OSS filesystem
implementation cleanup in this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.x dev/4.0.x dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants