[Enhancement](be) Fix small_file_mgr to support HTTPS when FE runs in HTTPS-only mode#63918
[Enhancement](be) Fix small_file_mgr to support HTTPS when FE runs in HTTPS-only mode#63918nsivarajan wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
130baac to
9d88913
Compare
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
…ng https on small_file_mgr
9d88913 to
7acfeb8
Compare
|
run buildall |
TPC-H: Total hot run time: 31600 ms |
TPC-DS: Total hot run time: 171308 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run external |
1 similar comment
|
run external |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
I found one blocking issue: the BE fallback still uses the FE HTTP port value, so this PR by itself does not fix the stated HTTPS-only small-file download case.
Critical checkpoint conclusions:
- Goal/test: The goal is to let BE SmallFileMgr download files when FE HTTPS is enabled or HTTP is disabled. The current code does not accomplish that on this branch because FE heartbeat still sends Config.http_port, and no test/manual result is provided to prove the end-to-end HTTPS-only case.
- Scope/focus: The change is small and localized, but it depends on an FE protocol/heartbeat change that is not present in this PR.
- Concurrency/lifecycle: No new shared-state concurrency, lock-order, static initialization, or lifecycle issue found beyond the existing SmallFileMgr lock-held download behavior.
- Configuration/compatibility: The change relies on the semantics of master_fe_http_port changing to contain https_port when enable_https=true. That compatibility/protocol change is not included here.
- Parallel paths: Other BE HTTP clients are not modified; for this specific small-file path, the missing FE port propagation is the blocking parallel-side change.
- Error handling/memory/observability: Existing Status propagation is mostly preserved. The retry handles partial temp output by rewinding/truncating, though it cannot recover the intended scenario without the right port.
- Tests: No regression/unit/manual evidence was included, and the primary HTTPS-only scenario remains unverified in this PR.
- User focus: No additional user-provided review focus was present.
| HttpClient https_client; | ||
| RETURN_IF_ERROR(https_client.init(url)); | ||
| // Skip TLS cert verification: internal cluster traffic only; file integrity | ||
| // is guaranteed independently by MD5 checksum verification below. |
There was a problem hiding this comment.
This fallback still builds the HTTPS URL from master_fe_http_port, but on this branch FE heartbeats populate that field from Config.http_port (HeartbeatMgr.setMaster() calls tMasterInfo.setHttpPort(Config.http_port)). In the stated HTTPS-only case (enable_https=true, http_port=0), the first request goes to http://host:0/... and this retry goes to https://host:0/..., so small-file download still fails. Please include the FE heartbeat change that sends https_port when HTTPS is enabled, or otherwise make BE use an actual HTTPS port before adding this fallback.
There was a problem hiding this comment.
This PR is a Preparatory BE change: adds HTTPS fallback to _download_file(). When HTTP fails, it retries with HTTPS using use_untrusted_ssl() (safe for internal cluster traffic; MD5 checksum independently verifies file integrity).
This PR causes no regression:
- enable_https=false: HTTP succeeds on first try, no change.
- enable_https=true, http_port=0 (without FE PR): both attempts fail as
before — no change in outcome, ~1ms extra overhead on failure.
Activation
This fix takes effect automatically once the companion FE PR #60921 is merged. That PR changes HeartbeatMgr to send https_port to BEs when enable_https=true. From the next heartbeat
cycle, master_fe_http_port on every BE becomes 8050, the HTTP attempt fails fast, and the HTTPS fallback succeeds — no BE restart or config change needed.
There was a problem hiding this comment.
Reference HeartbeatManager fix in #60921 , which sets HTTPS scheme and HTTPS port .
public void setMaster(int clusterId, String token, long epoch) {
TMasterInfo tMasterInfo = new TMasterInfo(
new TNetworkAddress(FrontendOptions.getLocalHostAddress(), Config.rpc_port), clusterId, epoch);
tMasterInfo.setToken(token);
tMasterInfo.setHttpPort(Config.enable_https ? Config.https_port : Config.http_port);
long flags = heartbeatFlags.getHeartbeatFlags();
tMasterInfo.setHeartbeatFlags(flags);
if (Config.isCloudMode()) {
// Set the endpoint for the metadata service in cloud mode
tMasterInfo.setMetaServiceEndpoint(Config.meta_service_endpoint);
}
masterInfo.set(tMasterInfo);
}
|
@morrySnow @morningman - Thanks for the review, as mentioned this PR is a preparatory for #60921 . BE call FE in FE in HTTP in small_file_mgr only, making it fast fallback makes no harm. but when #60921 gets merged this auto takes "HTTPS" schema and "HTTPS port" - 8050/user defined any. Please help me get this PR merged and help with next PR(Followup) : #60921 |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
When Doris FE is configured with
enable_https=trueandhttp_port=0(HTTPS-only hardened deployment), the BE'sSmallFileMgrfails to download small files (SSL certificates, UDF jars, Kerberos keytabs) from the FE master.SmallFileMgris the only BE→FE path that uses HTTP rather than Thrift/RPC. It downloads files via/api/get_small_fileusing a hardcodedhttp://scheme. When the FE disables HTTP (http_port=0), this connection is refused and thedownload fails — breaking features that depend on small files, such as Routine Load with Kafka SSL certificates.
What is changed and how does it work?
_download_file()insmall_file_mgr.cppnow tries HTTP first (preserving zero-overhead behavior for existing HTTP deployments), then falls back to HTTPS if HTTP fails. The HTTPS attempt usesuse_untrusted_ssl()which skips TLS certificate chain verification.This is safe for two reasons:
Note: A companion FE PR is needed for the complete fix. The FE
HeartbeatMgrmust sendhttps_port(nothttp_port) to BEs whenenable_https=true, so thatmaster_fe_http_portcontains the correct port for both the HTTP and HTTPS attempts. Without the FE change #60921 , this BE change is safe (no regression) . With both PRs merged, the full end-to-end fix is complete.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)