Skip to content

[Draft] Master phase1 fs interface refactoring#61907

Closed
morningman wants to merge 5 commits intoapache:masterfrom
morningman:master-phase1-fs-interface-refactoring
Closed

[Draft] Master phase1 fs interface refactoring#61907
morningman wants to merge 5 commits intoapache:masterfrom
morningman:master-phase1-fs-interface-refactoring

Conversation

@morningman
Copy link
Copy Markdown
Contributor

No description provided.

morningman and others added 5 commits March 30, 2026 12:48
### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Before splitting filesystem implementations into independent
Maven modules (Phase 3), several compile-time couplings must be eliminated.
This commit completes all Phase 0 prerequisite decoupling tasks:

- P0.1: Introduce FsStorageType enum in fe-foundation (zero-dep module) to
  replace StorageBackend.StorageType (Thrift-generated) in PersistentFileSystem.
  Add FsStorageTypeAdapter for bidirectional Thrift conversion. Update all
  subclasses and callers (Repository, BackupJob, RestoreJob, CloudRestoreJob).

- P0.2: Add IOException-based default bridge methods to ObjStorage interface
  (checkObjectExists, getObjectChecked, putObjectChecked, deleteObjectChecked,
  deleteObjectsChecked, copyObjectChecked, listObjectsChecked). Add
  ObjStorageStatusAdapter for Status→IOException conversion. Zero changes to
  existing implementations.

- P0.3: Decouple SwitchingFileSystem from ExternalMetaCacheMgr via new
  FileSystemLookup functional interface. FileSystemProviderImpl passes a lambda.

- P0.4: Extract MultipartUploadCapable interface from ObjFileSystem, removing
  the forced abstract method. S3FileSystem and AzureFileSystem implement it.
  HMSTransaction now uses instanceof check instead of ObjFileSystem cast.

- P0.5: Introduce FileSystemDescriptor POJO for Repository metadata serialization,
  replacing direct PersistentFileSystem subclass serialization. Migrate GsonUtils
  to string-based Class.forName() reflection for legacy format backward compat,
  removing 7 compile-time imports of concrete filesystem classes.

- P0.6: Add FileSystemSpiProvider interface skeleton in fs/spi/ as the future
  ServiceLoader contract for Phase 3 module split.

### Release note

None

### Check List (For Author)

- Test: No need to test (pure refactor; all changes are backward compatible;
  three successful FE builds verified during development)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Maven build cache was caching checkstyle:check results and emitting
'Skipping plugin execution (cached)' even when sources had changed.
Two fixes:

1. Add check/checkstyle/ to global cache input so that changes to
   checkstyle rules (checkstyle.xml, suppressions.xml, etc.) correctly
   invalidate all module caches.

2. Mark the checkstyle:check execution (id: validate) as runAlways in
   executionControl so it is never skipped regardless of cache state.
   Checkstyle is a quality gate and must always execute.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ObjFileSystem: remove unused Map import (was used by the now-removed
  abstract completeMultipartUpload method)
- GsonUtils: fix CustomImportOrder violations - LogManager/Logger imports
  were inserted before com.google.* imports; move them after all com.* imports
  in correct lexicographical order

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pache#61862)

## Summary

This PR completes **Phase 0** of the [FE filesystem SPI
refactoring](apache#61860) — removing
compile-time couplings that would otherwise prevent splitting filesystem
implementations into independent Maven modules in later phases.

## Changes

### P0.1 — FsStorageType enum migration
- Introduce `FsStorageType` enum in `fe-foundation` (zero-dependency
module) to replace Thrift-generated `StorageBackend.StorageType` in
`PersistentFileSystem`
- Add `FsStorageTypeAdapter` in `fe-core` for bidirectional
Thrift↔FsStorageType conversion
- Update all subclasses and callers: `Repository`, `BackupJob`,
`RestoreJob`, `CloudRestoreJob`

### P0.2 — ObjStorage IOException bridge
- Add `IOException`-based `default` bridge methods to `ObjStorage`
interface
- Add `ObjStorageStatusAdapter` for `Status→IOException` conversion

### P0.3 — SwitchingFileSystem decoupling
- Introduce `FileSystemLookup` functional interface
- Decouple `SwitchingFileSystem` from `ExternalMetaCacheMgr`

### P0.4 — MultipartUploadCapable interface
- Extract `MultipartUploadCapable` interface from `ObjFileSystem`
- `S3FileSystem` and `AzureFileSystem` implement it; `HMSTransaction`
uses `instanceof` check

### P0.5 — GsonUtils compile-time decoupling
- Introduce `FileSystemDescriptor` POJO for `Repository` metadata
serialization
- `GsonUtils` removes 7 compile-time concrete class imports, uses
`Class.forName()` reflection

### P0.6 — FileSystemSpiProvider skeleton
- Add `FileSystemSpiProvider` interface in `fs/spi/`

### Build
- Fix Maven build cache incorrectly skipping `checkstyle:check`
- Fix checkstyle violations (unused import, import order)

## Testing
- FE build: ✅  Checkstyle: 0 violations ✅

Closes part of apache#61860

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tem API and value objects

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: The existing FileSystem interface uses Status-based return values,
bare String paths, and Hadoop-dependent RemoteFile objects throughout the FE codebase,
making it hard to test and impossible to isolate from Hadoop at the module boundary.
Phase 1 introduces the new clean IOException-based FileSystem API with typed Location
value objects, while preserving full backward compatibility via LegacyFileSystemApi.

### Release note

None

### Check List (For Author)

- Test: Manual build verification (./build.sh --fe) passes with zero errors
- Behavior changed: No (all existing code paths preserved via LegacyFileSystemApi)
- Does this need documentation: No

New files:
- Location.java: immutable URI value object replacing bare String paths
- FileEntry.java: immutable file/dir descriptor replacing Hadoop-dependent RemoteFile
- FileIterator.java: lazy Closeable iterator interface for directory listing
- LegacyFileSystemApi.java: @deprecated copy of old FileSystem interface (Status-based)
- LegacyFileSystemAdapter.java: abstract bridge implementing new FileSystem via legacy* methods
- LegacyToNewFsAdapter.java: wraps any LegacyFileSystemApi as new FileSystem
- MemoryFileSystem.java: in-memory FileSystem for unit testing

Modified files:
- FileSystem.java: replaced with new clean IOException-based interface
- PersistentFileSystem, LocalDfsFileSystem, SwitchingFileSystem: implements LegacyFileSystemApi
- FileSystemProvider, FileSystemLookup: return LegacyFileSystemApi
- DorisInputFile, DorisOutputFile: add location() method; deprecate path()
- HdfsInputFile, HdfsOutputFile, HdfsInputStream: use Location instead of ParsedPath
- ParsedPath: @deprecated + toLocation() conversion method
- RemoteFile: @deprecated + toFileEntry()/fromFileEntry() conversion methods
- RemoteFiles, RemoteFileRemoteIterator: @deprecated
- All callers updated to use LegacyFileSystemApi type

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 26411 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f3a7bf1ae926dbd3ee8e19f1b67a417b54703057, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17616	4372	4239	4239
q2	q3	10659	770	518	518
q4	4679	355	253	253
q5	7545	1206	1016	1016
q6	171	171	146	146
q7	776	840	665	665
q8	9507	1437	1278	1278
q9	5207	4694	4679	4679
q10	6319	1865	1662	1662
q11	447	244	246	244
q12	747	587	460	460
q13	18021	2665	1940	1940
q14	221	227	211	211
q15	q16	726	749	658	658
q17	751	818	475	475
q18	5827	5386	5302	5302
q19	1128	980	627	627
q20	546	478	378	378
q21	4513	1791	1404	1404
q22	341	281	256	256
Total cold run time: 95747 ms
Total hot run time: 26411 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4749	4601	4532	4532
q2	q3	3960	4416	3849	3849
q4	886	1209	830	830
q5	4068	4377	4346	4346
q6	182	178	141	141
q7	1774	1648	1512	1512
q8	2463	2685	2532	2532
q9	7710	7476	7464	7464
q10	3711	4000	3557	3557
q11	513	461	439	439
q12	524	616	440	440
q13	2475	2994	2137	2137
q14	421	344	286	286
q15	q16	749	782	728	728
q17	1192	1387	1397	1387
q18	7252	6986	6554	6554
q19	933	973	922	922
q20	2078	2140	2022	2022
q21	3939	3490	3345	3345
q22	469	447	431	431
Total cold run time: 50048 ms
Total hot run time: 47454 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 168525 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f3a7bf1ae926dbd3ee8e19f1b67a417b54703057, data reload: false

query5	4316	638	503	503
query6	321	220	199	199
query7	4217	458	265	265
query8	337	242	230	230
query9	8721	2685	2696	2685
query10	527	396	340	340
query11	7013	5074	4909	4909
query12	182	126	122	122
query13	1275	454	349	349
query14	5801	3669	3417	3417
query14_1	2825	2805	2787	2787
query15	205	193	176	176
query16	1000	468	450	450
query17	897	723	620	620
query18	2450	457	352	352
query19	229	216	186	186
query20	129	124	123	123
query21	235	132	110	110
query22	13217	14065	14650	14065
query23	17373	16488	15929	15929
query23_1	16008	15630	15611	15611
query24	7226	1617	1214	1214
query24_1	1223	1227	1252	1227
query25	598	460	392	392
query26	1246	262	148	148
query27	2772	487	294	294
query28	4469	1856	1853	1853
query29	820	565	472	472
query30	295	219	188	188
query31	1006	936	854	854
query32	81	70	69	69
query33	512	328	276	276
query34	888	855	521	521
query35	639	672	598	598
query36	1070	1087	994	994
query37	132	94	82	82
query38	2959	2960	2860	2860
query39	859	835	799	799
query39_1	796	787	797	787
query40	231	150	135	135
query41	64	58	58	58
query42	256	258	255	255
query43	235	255	234	234
query44	
query45	192	185	183	183
query46	886	997	600	600
query47	2130	2152	2067	2067
query48	306	324	227	227
query49	627	450	375	375
query50	700	276	209	209
query51	4078	4047	4004	4004
query52	261	270	255	255
query53	297	328	279	279
query54	304	275	256	256
query55	87	86	83	83
query56	304	316	319	316
query57	1882	1707	1834	1707
query58	283	312	272	272
query59	2775	2964	2740	2740
query60	356	324	310	310
query61	156	157	157	157
query62	627	595	536	536
query63	311	272	276	272
query64	5015	1275	1044	1044
query65	
query66	1461	463	362	362
query67	24350	24244	24330	24244
query68	
query69	417	316	284	284
query70	1003	968	969	968
query71	337	309	297	297
query72	2806	2690	2443	2443
query73	539	552	312	312
query74	9534	9557	9317	9317
query75	2828	2731	2475	2475
query76	2289	1038	672	672
query77	360	377	308	308
query78	10917	11013	10482	10482
query79	3131	742	574	574
query80	1745	616	545	545
query81	579	252	224	224
query82	1012	152	116	116
query83	334	262	246	246
query84	308	125	97	97
query85	968	483	456	456
query86	500	317	313	313
query87	3144	3173	3024	3024
query88	3565	2650	2639	2639
query89	425	371	351	351
query90	2185	177	181	177
query91	170	166	136	136
query92	87	76	70	70
query93	2003	865	506	506
query94	647	309	300	300
query95	584	405	314	314
query96	645	522	230	230
query97	2435	2455	2421	2421
query98	237	221	213	213
query99	1000	992	914	914
Total cold run time: 254246 ms
Total hot run time: 168525 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 6.19% (30/485) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants